Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I am trying to understand how Distinct works in Qlik or how Distinct works in general. I am writing a query where, I am creating a dummy user with a field (Reduce) from a Table1 and then concatenating values from another Table2. The below is the query:
NoConcatenate
Bridge:
LOAD
'abcdefg' as USERID,
REDUCE
Resident Table1;
Concatenate(Bridge)
LOAD DISTINCT
USERID,
REDUCE
Resident Table2;
Problem: The first part of the code where we are loading for the the dummy 'abcdefg' user there are a total of 3.7 million records in Table1 and there are around 3 million records coming in from the Table2 table, so Bridge table should load around 6.7 million records. but the issue here is when I use Distinct while concatenating the Table2 to Table1, the dummy user 'abcdefg' is only fetching 11,175 records which is reducing the overall data i.e., is only giving me around 3.1 million rows of data. But when I remove the DISTINCT from the below concatenate table. I am able to fetch the required 6.7 rows of data. Can you please help me with this? I am not sure why applying DISTINCT to the below table is reducing/affecting the number of rows in the first table/query. Need your help urgently as this is an issue we are facing in production.
Thank you for taking a look at this issue.
Hey,
What you're running into is due to how DISTINCT works in Qlik — it applies to all fields in the load, not just one.
This will only keep unique combinations of USERID and REDUCE. Since you're assigning 'abcdefg' as the USERID for all rows in Table1, and assuming similar values exist in Table2, the DISTINCT is collapsing what might be millions of rows down to just the unique REDUCE values for that USERID.
That’s why you're seeing only ~11k rows instead of the full expected volume. When you remove DISTINCT, Qlik keeps all the rows, including duplicates — which is why it works as expected in that case.
If your goal is to keep all rows (even if REDUCE values repeat), then definitely remove the DISTINCT.
Hope this clears it up! Let me know if you’re trying to dedupe only by REDUCE or have another goal in mind — happy to help further.
Hey,
What you're running into is due to how DISTINCT works in Qlik — it applies to all fields in the load, not just one.
This will only keep unique combinations of USERID and REDUCE. Since you're assigning 'abcdefg' as the USERID for all rows in Table1, and assuming similar values exist in Table2, the DISTINCT is collapsing what might be millions of rows down to just the unique REDUCE values for that USERID.
That’s why you're seeing only ~11k rows instead of the full expected volume. When you remove DISTINCT, Qlik keeps all the rows, including duplicates — which is why it works as expected in that case.
If your goal is to keep all rows (even if REDUCE values repeat), then definitely remove the DISTINCT.
Hope this clears it up! Let me know if you’re trying to dedupe only by REDUCE or have another goal in mind — happy to help further.
@nevopotokcloudinary Excellent response!
"What you're running into is due to how DISTINCT works in Qlik — it applies to all fields in the load, not just one."
Did you mean to say "..all tables in the load..." rather than "fields"?
-Rob
Hi @rwunderlich
Great catch — I actually did mean fields in this case. In Qlik, DISTINCT applies to the full combination of fields in the LOAD statement, not just a single field like REDUCE. So if multiple fields are present, DISTINCT filters for unique combinations across all those fields.
@nevopotokcloudinary , that makes sense. Thank you for your help on this.