Solved: Re: Remove duplicates - Qlik Community

victor_greffet · ‎2017-11-15

Hi community,

I think I have a consequent challenge...

I work on big database extracted on Excel file.

I discovered the LOAD DISTINCT --> remove my duplications. But, this time it's different.

After removing my duplications with LOAD distinct, I thought I wouldn't find duplication in my field Ref (The field i'd like to be my unique key). But surprise after looking at one reference, I still have duplications because the 'line' looks like this :

Ref	Field 1	Field 2	Field 3	Field 4	Field 5	Field n
Reference 1	data			data
Reference 1	data				data
Reference 1	data	data	data		data	data
Reference 1	data
Reference 1	data				data

And I'd like to keep the less empty line (here the yellow one).

Is there an awesome manipulation for that ?

Thank you

victor_greffet · ‎2017-11-20

Hi !

for people who follow the stream,

i found a solution for the case above (It works for date type only):

What I did :

DATA:

LOAD Distinct

Reference,

Concat(date1) as dateone,

Concat(date2 as datetwo

FROM [lib://test/Microsoft Excel Worksheet.xlsx]

(ooxml, embedded labels, table is Feuil1)

Group by [Reference];

left join DATA:

Load

[Reference],

if(len(trim(dateone))>10,date(right(dateone,10)),date(dateone)) as date1,

if(len(trim(datetwo))>10,date(right(datetwo,10)),date(datetwo)) as date2

Resident DATA;

Drop field dateone, datetwo from DATA;

store DATA into [lib://test/test.csv] (txt, delimiter is ';');

View solution in original post

Gysbert_Wassenaar · ‎2017-11-15

Nope. Only unawesome ones. Like for each record checking each field if it contains a value and keep count of the number of fields in the record that have a value. Then keep only the records with the largest counts per Ref value. And then keep only the first records per Ref with the largest counts since there could still be multiple records with the same count.

talk is cheap, supply exceeds demand

victor_greffet · ‎2017-11-16

Thank you for your reply,

So the unawesome ones are the only solution...

after reviewing my database, i can simplify my problem :

Ref	date 1	date 2
Reference 1	date
Reference 1	date
Reference 1	date
Reference 1	date	date
Reference 1	date
Reference 1	date

I'd like to keep the line in Yellow and remove other ones, but if there isn't data in the date 2 field keep the first record. All that for all my References... How would you do ?

Thanks.

victor_greffet · ‎2017-11-20