Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
blaise
Partner - Specialist
Partner - Specialist

Left Join LOAD distinct remove rows!

Hello

After 1h of investation why some rows dissapear when I add a dimension table to my fact table i came across a rather nasty "knowledge".

If i do a normal load (no distinct) for my fact table and later add (with a left join) a dimension table with a distinct load the whole table gets distinct!. F.ex;


FactTable:
LOAD *,id as key_ID from facttable.qvd (qvd);
left join (FactTable)
LOAD DISTINCT *,No as key_ID from dimtable.qvd (qvd);


Has anyone of you experienced this issue, is it just a wad (work as design), any other input ??

1 Solution

Accepted Solutions
Not applicable

Hi,

Yes, I've seen it and I've had it confirmed as a "work as design" in the past. A few things to note wrt DISTINCT keyword:

1. When you first create a table, if you use the DISTINCT keyword after your load statement (e.g. LOAD DISTINCT .... FROM ...) then the table will always be distinct, even if you try to explicitly add duplicates to this table post this point.

2. As you've noticed, when performing LEFT JOIN LOAD DISTINCT operations, again the table targeted by the JOIN will remain distinct from this point onwards. The same applies for CONCATENATE LOAD DISTINCT.

If you need to retain duplicates, avoid LOAD DISTINCT when first defining a table and also avoid LEFT JOIN LOAD DISTINCT and CONCATENATE LOAD DISTINCT statements. You can work around this by performing your loads in stages. E.g.

...

tmp:
LOAD DISTINCT * FROM mySource.qvd (qvd);

LEFT JOIN (targetTable)
LOAD * RESIDENT tmp;

DROP TABLE tmp;

Alternatively, if you do not want to incur the extra processing time you could re-design your loads with GROUP BY statements rather than DISTINCT keywords to de-dupe your data.

View solution in original post

5 Replies
Not applicable

Hi,

Yes, I've seen it and I've had it confirmed as a "work as design" in the past. A few things to note wrt DISTINCT keyword:

1. When you first create a table, if you use the DISTINCT keyword after your load statement (e.g. LOAD DISTINCT .... FROM ...) then the table will always be distinct, even if you try to explicitly add duplicates to this table post this point.

2. As you've noticed, when performing LEFT JOIN LOAD DISTINCT operations, again the table targeted by the JOIN will remain distinct from this point onwards. The same applies for CONCATENATE LOAD DISTINCT.

If you need to retain duplicates, avoid LOAD DISTINCT when first defining a table and also avoid LEFT JOIN LOAD DISTINCT and CONCATENATE LOAD DISTINCT statements. You can work around this by performing your loads in stages. E.g.

...

tmp:
LOAD DISTINCT * FROM mySource.qvd (qvd);

LEFT JOIN (targetTable)
LOAD * RESIDENT tmp;

DROP TABLE tmp;

Alternatively, if you do not want to incur the extra processing time you could re-design your loads with GROUP BY statements rather than DISTINCT keywords to de-dupe your data.

blaise
Partner - Specialist
Partner - Specialist
Author

I've already worked-around it with a tmp table creation. I think this will be the fastest way (faster than group by's, autonumberhash128 and rowno() functions etc.)

Not applicable

Got the same isue this day and it took me hours of discover the reason and eaven beleave this is possible. I was nearly shocked on this behaviour of QlikView. Finaly good to know and perhaps in some cases good to use DISTINCT in QV works this way.

Happened to me on a very, very big and important application on a little change this afternoon. You can imagine this one took years of my life after safing the file and realize my change was not so little I expected :-).

Greetings.

Not applicable

Even more 'strange' and point to be cautious for:

Adding a rownum in the join makes every row in the final table unique. This way there can be no duplicates in the final table, and the distinct can be maintained.

However, the distinct statement is still 'active' later on in the script. So once you decide to drop the rownum field later on in the script, a loss of rows can occur!

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Just to add to the conversation, there is an excellent article on this 'feature' by BarryHarmsen

http://www.qlikfix.com/2013/07/30/distinct-can-be-deceiving/

Steve