Functions that do not affect optimized QVD load - Qlik Community

Report Inappropriate Content · ‎2009-09-11

Anyone know where I can find a list of these functions?

hector · ‎2009-09-11

as i know, only the alias ("as"), doesn't affect the optimized, but is good to know.

regards

johnw · ‎2009-09-11

You can have a single WHERE EXISTS(...) as well. But no, I don't think you can do ANY other functions and still keep an optimized load. According to the help text:

"Super-fast mode can be utilised only when all fields or a subset thereof are read without any transformations (formulas acting upon the fields), though the renaming of fields is allowed."

Report Inappropriate Content · ‎2009-09-11

There should be more information about this topic because that description in the help file is not entirely true. I am noticing right now that a simple concatenate load of a QVD loses the optimization, which by my understanding of QVD's doesn't make sense.

johnw · ‎2009-09-11

Yeah, the help text I found isn't particularly good. Maybe there's a more thorough explanation somewhere. For instance, I didn't see anything that indicated you could use a single where exists. They refer to it as super-fast mode in the help text, but when you actually load it, it instead says qvd optimized (if I remember correctly), so you have to figure out that those two things mean the same thing.

Anyway, losing the optimization on a concatenate makes sense to me based on my guesses as to what's going on with the QVDs and with an optimized load. I haven't studied it carefully, but I'm guessing that the QVDs are basically stored in QlikView's internal, compressed memory format. I'm guessing that the optimized load is mostly just moving blocks of data into memory. I'm guessing that it can apply a simple filter (the where exists statement) as it works, and that the field renames are handled pretty much as if you loaded in with the original names, and then renamed them. If you try to do a concatenate, you're not just loading in the table as it exists. You're adding rows to another table. This seems like something that at least could be different enough that it can no longer load big blocks of data, but must instead evaluate each row one at a time. It wouldn't have surprised me if an optimized load still worked, but it doesn't surprise me that an optimized load doesn't work.

My guesses, of course, may be incorrect. I do a LOT of guessing about QlikView's internals when I'm developing, and in particular when I'm trying out performance optimizations. I'd probably be better off doing more research and less guessing.

prieper · ‎2009-09-12

Hi John,

it might be worth to verify, but I have the feeling that an optimized loading with subsequent filtering of the data (in memory) is not really faster then loading from qvd with a WHERE-clause (may be subject to data, but am handling normally around 1-2Mio records).

Peter

johnw · ‎2009-09-12

I think it depends on your filtering. Let's say you need something simple like this. I find that this sort of requirement is very common for my data:

// unoptimized load
[Table]:
LOAD
ID
,Type
,Status
,Weight
,Color
FROM MyQVD.qvd (QVD)
WHERE MATCH(Type,'A','B')
AND MATCH(Status,'Active','Held')
;

This could be rewritten as follows, and will likely execute MUCH more quickly:

[Table]:
// Most restrictive field
LOAD * INLINE [
Type
A
B
];
// Optimized load
INNER JOIN ([Table])
LOAD
ID
,Type
,Status
,Weight
,Color
FROM MyQVD.qvd (QVD)
WHERE EXISTS(Type)
;
// Inner join to second most restrictive field, this is VERY fast
INNER JOIN ([Table])
LOAD * INLINE [
Status
Active
Held
];

There are a couple downsides. First, because you're doing some of the filtering after the fact, it will take more memory. If you're bumping up against available RAM, and this causes you to start swapping to disk, you're toast. Second, the code is more complicated and more difficult to maintain. So you have to weigh these against the performance improvement.

For a different example, let's say you wanted this:

[Table]:
LOAD
ID
,date(date#(Date,'YYYYMMDD'),'MM/DD/YY') as Date
,Weight/2000 as Tons
FROM MyQVD.qvd (QVD)
WHERE Date > 20070301
AND ( Type = 'A'
OR Status = 'Active')
;

To get an optimized load out of it, you'd pretty much have to delay all the work to a complete second pass. While you could rewrite it like that, there's no reason to expect it to be faster. I expect it would be slower. If performance were important, I'd test it both ways using my real data, but I wouldn't be very hopeful. Note that you could do better than below by loading up a calendar with all dates > 20070301 and doing a where exists. It depends on how much older data you have in your QVD. Still, it seems very unlikely to be faster than just the simple, unoptimized load above.

[Table]:
LOAD
ID
,Date as TempDate
,Weight
FROM MyQVD.qvd (QVD)
;
INNER JOIN ([Table])
LOAD
ID
,date(date#(TempDate,'YYYYMMDD'),'MM/DD/YY') as Date
,Weight/2000 as Tons
RESIDENT

WHERE Date > 20070301
AND ( Type = 'A'
OR Status = 'Active')
;
DROP FIELDS
TempDate
,Weight
;

prieper · ‎2009-09-20

Hi John,

overlooked your long reply - it is depending on the case, which method has the best performance.

Peter