Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us at Qlik Connect 2026 in Orlando, April 13–15: Register Here!
cancel
Showing results for 
Search instead for 
Did you mean: 
Bchur
Contributor
Contributor

Joining vs resident loading

So basically here is the question.

I have a fact table called "Transaction", which is quite large (~100m records), in it theres a field called "ID.#KEY", which has less than a hundred unique values and is mixed in type (so numeric & string). When i try to recreate the ID.#KEY with a join (doesn't matter if i use FIELDVALUE in combination with FIELDVALUECOUNT, or LOAD DISTINCT):

temp_ID:
LOAD DISTINCT
ID.#KEY
RESIDENT Transaction;

LEFT JOIN (Transaction)
LOAD *,
ID.#KEY AS ID_2.#KEY
RESIDENT temp_ID;
DROP TABLE temp_ID;

The result is what I'd expect it to be, there are no duplication issues whatsoever, but the filesize increases 3 times. If I just do one more load and do the same, just in a resident load - no issues. Any ideas?


 

Labels (4)
1 Solution

Accepted Solutions
marcus_sommer

The main-cause is probably the behaviour of fieldvalue() to return only a single information of the field-value  and not both parts - which lead then to side-effects in regard which field-value information is stored where and in which way.

It's a standard in Qlik that values are stored (if possible) as dual() values with a string- and a numeric-interpretation. By accessing a field per fieldvalue() it returns only the second (usually numeric) part without the string-interpretation which usually results in ? respectively showing a place-holder instead the real value.

By using such values as join-keys it may impact how the meta-data of the field are stored. For example, a consistent date-field is stored with 8 byte per (numeric) value and a few extra bytes of the formatting per field. If the date-field has instead n different formatting the format-information isn't stored anymore on the column-level else each the format-pattern is stored for each value against the values.

Nevertheless I wouldn't expect a round 3 times bigger file-size by just 100 unique values but in some way seems your approach trigger an unoptimized processing.

You may a bit playing with wrapping the fieldvalue() with a num() - rather not suitable by mixed types - and text() and/or changing the field-creation order - means fieldvalue() as ID2 and after that the duplicating as ID for the join ... whereby I would tend to change the logic for duplicating a field respectively switching to a mapping approach.

 

View solution in original post

3 Replies
marksouzacosta

Hi @Bchur ,

It is quite hard to me to understand what you are trying to do, and the comparison you are doing, without code examples (scenario A vs scenario B, etc).

From what you said, I can imagine you are comparing these two scenarios:

Scenario A

temp_ID:
LOAD DISTINCT
ID.#KEY
RESIDENT Transaction;

LEFT JOIN (Transaction)
LOAD *,
ID.#KEY AS ID_2.#KEY
RESIDENT temp_ID;
DROP TABLE temp_ID;

 

Scenario B

LEFT JOIN (Transaction)
LOAD DISTINCT
ID.#KEY,
ID.#KEY AS ID_2.#KEY
RESIDENT Transaction;

 

Is that correct?

A few other questions?

  1. What do you mean by "filesize increases 3 times"? Are you saving Transaction as a file?
  2. What are the number of records Transaction table in both cases?
  3. Why are you doing a Join just to rename a field?


One suggestion:
I highly recommend adjusting the field types, especially for key fields. Do not link fields with mixed types. 

 

Regards,

Mark Costa

Read more at Data Voyagers - datavoyagers.net
Follow me on my LinkedIn | Know IPC Global at ipc-global.com

Bchur
Contributor
Contributor
Author

Hey @marksouzacosta , I'll clarify - this is the thing causing issues:

Bchur_0-1775712351234.png

The subroutine results reflect what I see in QMC, as you can see the QVF grew approx. 3 times. Rowcount for Transaction tables stayed the same. What doesn't cause issues:

Bchur_1-1775712505908.png

So the issue must be in that specific field, its mixed type and the join. But, what I don't understand - how can QVF size grow while rows stay the same :).

marcus_sommer

The main-cause is probably the behaviour of fieldvalue() to return only a single information of the field-value  and not both parts - which lead then to side-effects in regard which field-value information is stored where and in which way.

It's a standard in Qlik that values are stored (if possible) as dual() values with a string- and a numeric-interpretation. By accessing a field per fieldvalue() it returns only the second (usually numeric) part without the string-interpretation which usually results in ? respectively showing a place-holder instead the real value.

By using such values as join-keys it may impact how the meta-data of the field are stored. For example, a consistent date-field is stored with 8 byte per (numeric) value and a few extra bytes of the formatting per field. If the date-field has instead n different formatting the format-information isn't stored anymore on the column-level else each the format-pattern is stored for each value against the values.

Nevertheless I wouldn't expect a round 3 times bigger file-size by just 100 unique values but in some way seems your approach trigger an unoptimized processing.

You may a bit playing with wrapping the fieldvalue() with a num() - rather not suitable by mixed types - and text() and/or changing the field-creation order - means fieldvalue() as ID2 and after that the duplicating as ID for the join ... whereby I would tend to change the logic for duplicating a field respectively switching to a mapping approach.