Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
So basically here is the question.
I have a fact table called "Transaction", which is quite large (~100m records), in it theres a field called "ID.#KEY", which has less than a hundred unique values and is mixed in type (so numeric & string). When i try to recreate the ID.#KEY with a join (doesn't matter if i use FIELDVALUE in combination with FIELDVALUECOUNT, or LOAD DISTINCT):
temp_ID:
LOAD DISTINCT
ID.#KEY
RESIDENT Transaction;
LEFT JOIN (Transaction)
LOAD *,
ID.#KEY AS ID_2.#KEY
RESIDENT temp_ID;
DROP TABLE temp_ID;The result is what I'd expect it to be, there are no duplication issues whatsoever, but the filesize increases 3 times. If I just do one more load and do the same, just in a resident load - no issues. Any ideas?
The main-cause is probably the behaviour of fieldvalue() to return only a single information of the field-value and not both parts - which lead then to side-effects in regard which field-value information is stored where and in which way.
It's a standard in Qlik that values are stored (if possible) as dual() values with a string- and a numeric-interpretation. By accessing a field per fieldvalue() it returns only the second (usually numeric) part without the string-interpretation which usually results in ? respectively showing a place-holder instead the real value.
By using such values as join-keys it may impact how the meta-data of the field are stored. For example, a consistent date-field is stored with 8 byte per (numeric) value and a few extra bytes of the formatting per field. If the date-field has instead n different formatting the format-information isn't stored anymore on the column-level else each the format-pattern is stored for each value against the values.
Nevertheless I wouldn't expect a round 3 times bigger file-size by just 100 unique values but in some way seems your approach trigger an unoptimized processing.
You may a bit playing with wrapping the fieldvalue() with a num() - rather not suitable by mixed types - and text() and/or changing the field-creation order - means fieldvalue() as ID2 and after that the duplicating as ID for the join ... whereby I would tend to change the logic for duplicating a field respectively switching to a mapping approach.
Hi @Bchur ,
It is quite hard to me to understand what you are trying to do, and the comparison you are doing, without code examples (scenario A vs scenario B, etc).
From what you said, I can imagine you are comparing these two scenarios:
Scenario A
temp_ID:
LOAD DISTINCT
ID.#KEY
RESIDENT Transaction;
LEFT JOIN (Transaction)
LOAD *,
ID.#KEY AS ID_2.#KEY
RESIDENT temp_ID;
DROP TABLE temp_ID;
Scenario B
LEFT JOIN (Transaction)
LOAD DISTINCT
ID.#KEY,
ID.#KEY AS ID_2.#KEY
RESIDENT Transaction;
Is that correct?
A few other questions?
One suggestion:
I highly recommend adjusting the field types, especially for key fields. Do not link fields with mixed types.
Regards,
Mark Costa
Read more at Data Voyagers - datavoyagers.net
Follow me on my LinkedIn | Know IPC Global at ipc-global.com
Hey @marksouzacosta , I'll clarify - this is the thing causing issues:
The subroutine results reflect what I see in QMC, as you can see the QVF grew approx. 3 times. Rowcount for Transaction tables stayed the same. What doesn't cause issues:
So the issue must be in that specific field, its mixed type and the join. But, what I don't understand - how can QVF size grow while rows stay the same :).
The main-cause is probably the behaviour of fieldvalue() to return only a single information of the field-value and not both parts - which lead then to side-effects in regard which field-value information is stored where and in which way.
It's a standard in Qlik that values are stored (if possible) as dual() values with a string- and a numeric-interpretation. By accessing a field per fieldvalue() it returns only the second (usually numeric) part without the string-interpretation which usually results in ? respectively showing a place-holder instead the real value.
By using such values as join-keys it may impact how the meta-data of the field are stored. For example, a consistent date-field is stored with 8 byte per (numeric) value and a few extra bytes of the formatting per field. If the date-field has instead n different formatting the format-information isn't stored anymore on the column-level else each the format-pattern is stored for each value against the values.
Nevertheless I wouldn't expect a round 3 times bigger file-size by just 100 unique values but in some way seems your approach trigger an unoptimized processing.
You may a bit playing with wrapping the fieldvalue() with a num() - rather not suitable by mixed types - and text() and/or changing the field-creation order - means fieldvalue() as ID2 and after that the duplicating as ID for the join ... whereby I would tend to change the logic for duplicating a field respectively switching to a mapping approach.