Qlik Community

Qlik Sense App Development

Discussion board where members can learn more about Qlik Sense App Development and Usage.

victor_greffet
New Contributor III

Remove duplicates

Hi community,

I think I have a consequent challenge...

I work on big database extracted on Excel file.

I discovered the LOAD DISTINCT --> remove my duplications. But, this time it's different.

After removing my duplications with LOAD distinct, I thought I wouldn't find duplication in my field Ref (The field i'd like to be my unique key). But surprise after looking at one reference, I still have duplications because the 'line' looks like this :

RefField 1Field 2Field 3Field 4Field 5......Field n
Reference 1datadata
Reference 1datadata
Reference 1

data

datadatadatadata
Reference 1data
Reference 1datadata

And I'd like to keep the less empty line (here the yellow one).

Is there an awesome manipulation for that ?

Thank you

Tags (1)
1 Solution

Accepted Solutions
victor_greffet
New Contributor III

Re: Remove duplicates

Hi !

for people who follow the stream,

i found a solution for the case above (It works for date type only):

What I did :

DATA:

LOAD Distinct

    Reference,

    Concat(date1) as dateone,

    Concat(date2 as datetwo

FROM [lib://test/Microsoft Excel Worksheet.xlsx]

(ooxml, embedded labels, table is Feuil1)

Group by [Reference];

left join DATA:

Load

[Reference],

if(len(trim(dateone))>10,date(right(dateone,10)),date(dateone)) as date1,

if(len(trim(datetwo))>10,date(right(datetwo,10)),date(datetwo)) as date2

Resident DATA;

Drop field dateone, datetwo from DATA;

store DATA into [lib://test/test.csv] (txt, delimiter is ';');

3 Replies

Re: Remove duplicates

Nope. Only unawesome ones. Like for each record checking each field if it contains a value and keep count of the number of fields in the record that have a value. Then keep only the records with the largest counts per Ref value. And then keep only the first records per Ref with the largest counts since there could still be multiple records with the same count.

victor_greffet
New Contributor III

Re: Remove duplicates

Thank you for your reply,

So the unawesome ones are the only solution...

after reviewing my database, i can simplify my problem :

Refdate 1date 2
Reference 1date
Reference 1date
Reference 1date
Reference 1datedate
Reference 1date
Reference 1date

I'd like to keep the line in Yellow and remove other ones, but if there isn't data in the date 2 field keep the first record. All that for all my References... How would you do ?

Thanks.

victor_greffet
New Contributor III

Re: Remove duplicates

Hi !

for people who follow the stream,

i found a solution for the case above (It works for date type only):

What I did :

DATA:

LOAD Distinct

    Reference,

    Concat(date1) as dateone,

    Concat(date2 as datetwo

FROM [lib://test/Microsoft Excel Worksheet.xlsx]

(ooxml, embedded labels, table is Feuil1)

Group by [Reference];

left join DATA:

Load

[Reference],

if(len(trim(dateone))>10,date(right(dateone,10)),date(dateone)) as date1,

if(len(trim(datetwo))>10,date(right(datetwo,10)),date(datetwo)) as date2

Resident DATA;

Drop field dateone, datetwo from DATA;

store DATA into [lib://test/test.csv] (txt, delimiter is ';');