Performance with mapping load?

oddgeir · ‎2023-06-18

Hi

Is there a reason why mapping load is considerable slower than regular load?

This started with me checking a mapping load between ID and string values of some type

Mapping load with filter : 14 sec

Lines fetched: 6 179
This took 13,999996357597 seconds

Mapping load all (no where clause) : 21 sec

Lines fetched: 9 015 353
This took 21,000038855709 seconds

Regular load : 4 sec

Lines fetched: 9 015 353
This took 4,0000612847507 seconds

These numbers are consistent in my little test. So even if I only needed one type of values from this table it still is quicker for me to
- load all with regular load,
- do mapping load from resident table
- drop temporary table

Mapping from regular load: 7sec

Lines fetched: 6 179
This took 7,000041869469 seconds

I really would appreciate if anyone can clarify for me why this happens. The way I read this it is actually more efficient to do load-map-drop than direct maping load. Why can't QlikSense just use this approach directly on the mapping load?

jonashertz · ‎2023-06-19

Hi, Remember that Mapping loads from QVD files are NOT Optimized Loads. That could explain your difference? So for large mapping tables it can be faster sometimes to do an Optimized load first then do the mapping followed by a drop.

Also good to remember: "where exists(field1)" do not break Optimized loads but "where exists(field1, field2) or Where field1 = field2" equivalent does.

rwunderlich · ‎2023-06-19

Are you loading from a QVD? If you post your Where clause we may be able to offer some suggestions.

-Rob

oddgeir · ‎2023-06-19

Hi. Thanks for your responses.
I understand that mapping load is not optimized, but I guess I had expected QlikSense native handling of mapping load at least had even performance to manually creating the temporary load table.

As this was a pure difference between two load methods I didn't think my code was of any interest but here is the structure

TMPAllStringValues:
Mapping Load
		issue														
	,	stringvalue													
FROM [lib://customfieldvalue.qvd]
(qvd)
;

// => 19 sec

While simply removing "mapping" from query and do mapping afterwards almost tripples the speed

TMPAllStringValues:
Load
		issue														
	,	stringvalue													
FROM [lib://customfieldvalue.qvd]
(qvd)
;

Issue2ExpType:
Mapping Load
		issue
	,	stringvalue
Resident TMPAllStringValues;

Drop table TMPAllStringValues;

// =>7 sec

So I'm not trying to argue that mapping load should be as fast as optimized load. I'm rather wondering why the mapping load is so slow that it should be made faster by each developer doing an extra step potentially for each mapping in a load script. If it is faster to first load and then map, why doesn't QlikSense just do it that way in the first place?

rwunderlich · ‎2023-06-19

"I'm rather wondering why the mapping load is so slow"

My guess if that it's not the "mapping" prefix that makes it slow, it's the unoptimized load. You could confirm this by removing "mapping" and adding "where 1" to the end. I would expect the load time to be closer to 19 seconds.

"If it is faster to first load and then map, why doesn't QlikSense just do it that way in the first place? "

That's an interesting idea. I'm not aware of any optimizations that Qlik does like this -- but I'm not inside Qlik. Such an optimization would silently double the memory requirement which might be confusing and have a negative impact.

Beyond a curiosity (a good find at that), I would guess the potential savings and risk would not justify implementing an optimization like this in the product.

If you find that you are repeating this pattern a lot you could create a subroutine.

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

General Question