Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Most recent Data - Duplicates

Hi,

I have loaded some data (around 800.000 x 4) in my application Qlik but I have some duplicates for each person (example : Address_1 with validity from 01/01/14 - 31/12/14, Address_2 with validity from 01/01/15 - 31/12/99). In other to have unique data, I would like to select only the most recent data for each ID person. How I can do it ?

Best.

Brahim.

1 Solution

Accepted Solutions
fvelascog72
Partner - Specialist
Partner - Specialist

Hi,

You don´t have to group by all the fields.

Try this:

Adresses:

LOAD

[Business Partner Person],
[Adresses.Address Type],
Adresses.Source,
Adresses.VF,

Adresses.VT,
[Adresses.Conflict Code],
[Adresses.Location Code],
[Adresses.Number From],
[Adresses.Number To],
[Adresses.PO Box],
Adresses.Validity_01_01_2015
FROM Adresses.qvd (qvd);


Right Join(Adresses)

LOAD

[Business Partner Person],

Max(Adresses.VT) as MaxDate_VT_Adresses

Resident Adresses

Group by [Business Partner Person];

View solution in original post

6 Replies
fvelascog72
Partner - Specialist
Partner - Specialist

Hi,

Try this:

Data:

LOAD

*

Inline [

Person_id, Adress, Validity_Start, Validity_End

1, Adress_1, 01/01/2014, 31/12/2014

1, Adress_2, 01/01/2015, 31/12/9999

];

Right Join(Data)

LOAD

  Person_id,

  LastValue(Validity_Start) as "Validity_Start"

Resident Data

Group by Person_id

Order by Validity_Start asc;

Not applicable
Author

Hi Brahim,

You can always use aggregation, in this case MAX(StartDate) for the address.

I will give you an example using Excel in a moment.

Hope helps,

David Sugito

Mobile: + 62 878 0888 9871
Phone: + 62 21 569 823 85 / 86
Email: me@davidshuang.com
Site: davidshuang.com

Not applicable
Author

Whoops Federico,

Great, now I know that QlikView has it!

Never used LastValue() before!

Thanks for sharing!

And btw, I've attached your code too.


David Sugito

Mobile: + 62 878 0888 9871
Phone: + 62 21 569 823 85 / 86
Email: me@davidshuang.com
Site: davidshuang.com

Not applicable
Author

Hello Frederico,

Thanks a lot for your answer...but what if I want to keep all data fields (as in the code) and keep only the max of the field date for each person id ?

I have tried a group by but this only work when I have this case:

Work Fine

Adresses:

LOAD [Business Partner Person],

Max(Adresses.VT) as MaxDate_VT_Adresses,

FROM
Adresses.qvd
(
qvd)

Group By [Business Partner Person]

Doesn't work

Adresses:

LOAD[Business Partner Person],
[Adresses.Address Type],
Adresses.Source,
Adresses.VF,

Max(Adresses.VT) as MaxDate_VT_Adresses,
[Adresses.Conflict Code],
[Adresses.Location Code],
[Adresses.Number From],
[Adresses.Number To],
[Adresses.PO Box],
Adresses.Validity_01_01_2015
FROM
Adresses.qvd
(
qvd)

GroupBy[Business Partner Person],[Adresses.Address Type],
Adresses.Source,
Adresses.VF,
[Adresses.Conflict Code],
[Adresses.Location Code],
[Adresses.Number From],
[Adresses.Number To],
[Adresses.PO Box],
Adresses.Validity_01_01_2015
;

Best Regards

Brahim.

fvelascog72
Partner - Specialist
Partner - Specialist

Hi,

You don´t have to group by all the fields.

Try this:

Adresses:

LOAD

[Business Partner Person],
[Adresses.Address Type],
Adresses.Source,
Adresses.VF,

Adresses.VT,
[Adresses.Conflict Code],
[Adresses.Location Code],
[Adresses.Number From],
[Adresses.Number To],
[Adresses.PO Box],
Adresses.Validity_01_01_2015
FROM Adresses.qvd (qvd);


Right Join(Adresses)

LOAD

[Business Partner Person],

Max(Adresses.VT) as MaxDate_VT_Adresses

Resident Adresses

Group by [Business Partner Person];

Not applicable
Author

Frederico,

Excellent use of the Join....it works...Thanks a lot and have a nice day.

Brahim.