Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
nicolai_moller
Contributor
Contributor

Delete duplicate row

Hi

I'm trying to delete duplicate rows (activities) where the fields Start, End and Status are all empty. So if an activity exists more than once (like Activity A), the empty ones should be deleted (like Activity_ID 001). But only if all 3 fields are empty

ActivityStartEndStatusActivity_ID

A

001
A01-01-201405-05-2014Completed002
B003
C004
D05-05-2015Waiting005

Thanks

1 Solution

Accepted Solutions
jonathandienst
Partner - Champion III
Partner - Champion III

Hi

This works, but makes the assumptions that there is only one populated line, but there may be one or more unpopulated lines:

Data:

LOAD Activity,

  Start,

  End,

  Status,

  Activity_ID,

  If(Len(Start) = 0 And Len(End) = 0 And Len(Status) = 0, Null(), Activity_ID) As TActivity_ID

Inline

[

  Activity,Start,End,Status,Activity_ID

  A,,,,001

  A,01-01-2014,05-05-2014,Completed,002

  B,,,,003

  C,,,,004

  D,05-05-2015,,Waiting,005

];

Results:

NoConcatenate

LOAD Activity,

  Max(Start) As Start,

  Max(End) As End,

  MaxString(Status) As Status,

  Num(Max(Alt(TActivity_ID, Activity_ID), '000')) As Activity_ID

Resident Data

Group By Activity;

DROP Table Data;

The TActivity_ID field stores the Activity_ID of the populated field. The Alt() selects this value if it exists.

See attached

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

View solution in original post

7 Replies
anbu1984
Master III
Master III

Load * From Table

Where Len(Trim(Start)) = 0 And Len(Trim(End)) = 0 And Len(Trim(Status)) = 0

Not applicable

Hi

Have a look this one.

Thanks

nicolai_moller
Contributor
Contributor
Author

your suggestion deletes all empty rows. Only rows that duplicate AND empty should be deleted.

Activity A exists twice and one row (Activity_ID 001) is completely empty, so that row should be deleted.

anbu1984
Master III
Master III

Temp:

Load * From Table

Join(Temp)

Load Activity,Count(Activity) As Cnt Resident Temp;

Final:

NoConcatenate

Load * Resident Temp

Where Len(Trim(Start)) = 0 And Len(Trim(End)) = 0 And Len(Trim(Status)) = 0 And Cnt > 1;

Drop Table Temp;



Anonymous
Not applicable

modify Where statement.

Where Cnt=1 or (Len(Trim(Start))>0 AND Len(Trim(End))>0 AND Len(Trim(Status))>0)

jonathandienst
Partner - Champion III
Partner - Champion III

Hi

This works, but makes the assumptions that there is only one populated line, but there may be one or more unpopulated lines:

Data:

LOAD Activity,

  Start,

  End,

  Status,

  Activity_ID,

  If(Len(Start) = 0 And Len(End) = 0 And Len(Status) = 0, Null(), Activity_ID) As TActivity_ID

Inline

[

  Activity,Start,End,Status,Activity_ID

  A,,,,001

  A,01-01-2014,05-05-2014,Completed,002

  B,,,,003

  C,,,,004

  D,05-05-2015,,Waiting,005

];

Results:

NoConcatenate

LOAD Activity,

  Max(Start) As Start,

  Max(End) As End,

  MaxString(Status) As Status,

  Num(Max(Alt(TActivity_ID, Activity_ID), '000')) As Activity_ID

Resident Data

Group By Activity;

DROP Table Data;

The TActivity_ID field stores the Activity_ID of the populated field. The Alt() selects this value if it exists.

See attached

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Not applicable

Hi,

PFA,

By using staight table, you can acheive your requirement.

Warm Regards,

Joshmi