Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear experts,
question: can anyone explain why
"mapping load resident where... " on big tables is much slower splitting the funcionality in two Statements:
"load * resident where... " and "mapping load resident <on extract>".
Here the question more detailed:
having some problems with a slow datamodel I experienced the following:
Table TMP_MESS: 32 Mio Rows (~1GB qvd-File)
The following statement takes about 7.5 minutes:
mapping load MEASUREMENT_ID, MEASUREMENT_VALUE resident TMP_MESS where DISTINCTIVENESS = 'GESAMT'
But making two Statements out of it takes not even 1 Minute:
TMPMESSTEST:
load MEASUREMENT_ID, MEASUREMENT_VALUE resident TMP_MESS where DISTINCTIVENESS = 'GESAMT'
mapping load MEASUREMENT_ID, MEASUREMENT_VALUE resident TMPMESSTEST
here are the according lines from the log:
30.04.2015 07:29:31: 17 fields found: %DATE, %MATCHCODE, %LOT, MEASUREMENT_TIME, MP_NAME, MEASUREMENT_ID, LOT, WAFER, STAGE, PARAMETER, DISTINCTIVENESS, Prober, Tester, Nadelkarte, Pinelek, MEASUREMENT_VALUE, IS_PASSED, 31,990,752 lines fetched
30.04.2015 07:30:40: 0037 TMPMESSTEST:
30.04.2015 07:30:40: 0038 load MEASUREMENT_ID, MEASUREMENT_VALUE resident TMP_MESS where DISTINCTIVENESS = 'GESAMT'
30.04.2015 07:30:40: 2 fields found: MEASUREMENT_ID, MEASUREMENT_VALUE, 539,562 lines fetched
30.04.2015 07:31:17: 0040 mapMessIDGES:
30.04.2015 07:31:17: 0041 mapping load MEASUREMENT_ID, MEASUREMENT_VALUE resident TMPMESSTEST
30.04.2015 07:31:17: 2 fields found: MEASUREMENT_ID, MEASUREMENT_VALUE, 539,562 lines fetched
30.04.2015 07:31:20: 0043 mapMessTest2:
30.04.2015 07:31:20: 0044 mapping load MEASUREMENT_ID, MEASUREMENT_VALUE resident TMP_MESS where DISTINCTIVENESS = 'GESAMT'
30.04.2015 07:31:20: 2 fields found: MEASUREMENT_ID, MEASUREMENT_VALUE, 539,562 lines fetched
30.04.2015 07:38:55: 0046 drop table TMPMESSTEST
30.04.2015 07:38:56: 0048 QUALIFY *
Has anyone experienced the same?
Are there any rules or advices when to use one statement and when to spilt into two Statements?
Kind regards
Philip
Hi,
I didn't see a rule and a advice on this and it's very curious to get to know this.
It's very similar (in sense of a speed) to a case of optimized/unoptimized load from qvd but here you deal with in-memory table.
Best regards,
Maxim
Seems as if Qlik suffers when doing an input-comparison (where...) and an output comparison (mapping...) in the same Load process. Would be indeed interesting if this is a general rule/learning.
I think the point from danieloberbillig goes in the right direction and in the two-step scenario the first filter-load runs in a kind of optimized load and the second mapping-load had only a few records to fetch. And within the first scenario both actions (filter + mapping) will be probably performed on record-level.
But it would be very interessting to have a deeper insight how it technically worked, maybe hic could give some details.
- Marcus
Hi,
Sounds good and if mapping works before "where" filtering that would be an answer.
Did you compare time of single mapping+load without "where" and separate load without "where" + mapping?
If the assumption is correct, their time would be the same or almost the same.
Best regards,
Maxim
I haven't it directly compared by a mapping-table. But I have had recently a similar case with a group by and where-clause in a loop and the differences in load-times was enormous. I had the best results by separating the statements.
- Marcus
An explanation could be (just a guess) that the MAPPING LOAD RESIDENT takes an entire table (first two columns of course but that doesn't make a difference since for example an optimized load of an entire QVD stays optimized if you load only specific columns) and converts it into sort of symbol table (that's what is is eventually and what makes an applymap() awfully quick) almost immediately, while a MAPPING LOAD RESIDENT WHERE does the two main operations at the same time instead of like sort of a PRECEDING LOAD. Inserting entries row by row into a symbol table is apparently much slower than converting an entire table straight into a symbol table.
Very curious if Qlik technicians can shed some light on this.
Peter
Dear all,
whether Qlik technicians will give a comment on this or not - I learned much from your comments.
Thanks a lot!
As a summary for me:
Both approaches (lingle statment or two statements) are valid. But when having problems with performance this is a good point to have a keen look on.
Kind regards
Philip