Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Deduping data where date periods overlap

Hi there

I have searched the community for a solution to my problem, but have not been able to find a solution, so hopefully somebody out there can help.

I have a table tracking what promotions are sent to which members. One member can be sent multiple promtions and one member can therefore be part of more than one promotion at the same time. I am trying to figure out a way in which I can associate the member with one promotion only at any given point in time.

the fields and test data I have at my disposal are:

Member     PromotionID     StartDate               EndDate

1                    1                    2011/01/01               2011/01/15

1                    2                    2011/01/10               2011/01/30

1                    3                    2011/01/20               2011/01/25

1                    4                    2011/01/29               2011/02/25

1                    5                    2011/02/10               2011/02/15

1                    6                    2011/02/10               2011/02/10

2                    2                    2011/01/10               2011/01/31

2                    4                    2011/01/29               2011/02/25

    

A couple of rules that need to be kept in mind:

If 2 promotions have the same start date, we use the one with the highest PromotionID

The ordering of these promotions need to be done per member, and the combination of promotions are unique per member

After the data manipulation, the results need to read as follows:

Member     PromotionID     StartDate               EndDate

1                    1                   2011/01/01               2011/01/09

1                    2                   2011/01/10               2011/01/19

1                    3                   2011/01/20               2011/01/25

1                    2                   2011/01/26               2011/01/28

1                    4                   2011/01/29               2011/02/09

1                    6                   2011/02/10               2011/02/10

1                    5                   2011/02/11               2011/02/15

2                    2                   2011/01/10               2011/01/28

2                    4                   2011/01/29               2011/02/25

The result of the data manipulation is therefor that a member is associated with 1 promotion only at any given point in time, and that it is possibile for the member to be associatiod with the same promotion for multiple time periods if there are time overlaps between promotions

              

I hope I have explained myself well enough

thanking you in advance

Paul

1 Solution

Accepted Solutions
rbecher
MVP
MVP

Maybe to much joins.

I've made a new example (see attached) file. Now with the correct result, I think... 😉

This is the way I would implement it. Only problem is that all data are 3 times in memory in some place. But this could be probably fixed..

- Ralf

Astrato.io Head of R&D

View solution in original post

21 Replies
IAMDV
Master II
Master II

Hi Paul,

I am trying to understand the requirement. Please bear with me if I am not getting it right. Only difference I can see between the two datasets is the following...

Dataset1

1                    5                    2011/02/10               2011/02/15

Dataset2

1                    5                   2011/02/11               2011/02/15

The difference is the date jumped to the next date. When two dates are same in the first dataset... Is this right? Is this what you need or do I need to understand anything else? I'll wait to hear from you.

Thanks - DV

rbecher
MVP
MVP

Hi Paul,

hope this is what you want:

raw_data:

LOAD * INLINE [

    Member,     PromotionID,     StartDate,               EndDate,

    1,                    1,                    2011/01/01,               2011/01/15

    1,                    2,                    2011/01/10,               2011/01/30

    1,                    3,                    2011/01/20,               2011/01/25

    1,                    4,                    2011/01/29,               2011/02/25

    1,                    5,                    2011/02/10,               2011/02/15

    1,                    6,                    2011/02/10,               2011/02/10

    2,                    2,                    2011/01/10,               2011/01/31

    2,                    4,                    2011/01/29,               2011/02/25

];

promotions:

NoConcatenate LOAD Member

, PromotionID

, StartDate

, EndDate

Resident raw_data

Where previous(StartDate)<>StartDate

Order By Member, PromotionID desc, StartDate;

Drop Table raw_data;

- Ralf

Astrato.io Head of R&D
IAMDV
Master II
Master II

Perfect Ralf! I like this

Cheers - DV

rbecher
MVP
MVP

..it's just data gym 😉

Cheers!

- Ralf

Astrato.io Head of R&D
Not applicable
Author

Hi guys

Maybe I did not explain myself well enough in my original request so I thought I would explain it visually using a timeline

INPUT:
MemberPromotion2011/01/012011/01/102011/01/152011/01/202011/01/252011/01/292011/01/302011/01/312011/02/102011/02/152011/02/25
11Prom1Prom1Prom1
12 Prom2Prom2Prom2Prom2Prom2Prom2
13 Prom3Prom3
14 Prom4Prom4Prom4Prom4Prom4Prom4
15 Prom5Prom5
16 Prom6
22 Prom2Prom2Prom2Prom2Prom2Prom2
24 Prom4Prom4Prom4Prom4Prom4Prom4
REQUIRED OUTPUT:
Member2011/01/012011/01/102011/01/152011/01/202011/01/252011/01/292011/01/302011/01/312011/02/102011/02/152011/02/25
1Prom1Prom2Prom2Prom3Prom3Prom4Prom4Prom4Prom6Prom5Prom4
2 Prom2Prom2Prom2Prom2Prom4Prom4Prom4Prom4Prom4Prom4

So as you can see, we need to consider all promotions per member at any given point in time and associate the correct promotion with each member.It is also possible that a very old promotion can once again be referred to if no other more recent promotion is valid

Please shout if there are any other uncertainties

Many thanks!

Paul

Not applicable
Author

Hi guys

Maybe I did not explain myself well enough in my original request so I thought I would explain it visually using a timeline

QV promotions explained visually.jpg

So as you can see, we need to consider all promotions per member at any given point in time and associate the correct promotion with each member.It is also possible that a very old promotion can once again be referred to if no other more recent promotion is valid

Please shout if there are any other uncertainties

Many thanks!

Paul

Not applicable
Author

Hey there Ralph

Not sure if you saw my reply today, but your proposed solution did not really do the trick. I reckon I have a solution in SQL, but that would require me to create a night run, to prep the data so all that QV needs to do is read the manipulated reslult set. I would have preferred to do this using Qlikview as I have never been in a situation where I could not do someting in QV. Do you have any final thoughts before I go with my night run?

thanks

Paul

prieper
Master II
Master II

What is the rule to show "Prom6" for Member #1 on the 2011/02/10? Is it the freshest one? Or the highest Number in promotion?

Peter

Not applicable
Author

Hi there

It will be the highest numbered promotion

thanks

Paul