Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Deduping data where date periods overlap

Hi there

I have searched the community for a solution to my problem, but have not been able to find a solution, so hopefully somebody out there can help.

I have a table tracking what promotions are sent to which members. One member can be sent multiple promtions and one member can therefore be part of more than one promotion at the same time. I am trying to figure out a way in which I can associate the member with one promotion only at any given point in time.

the fields and test data I have at my disposal are:

Member     PromotionID     StartDate               EndDate

1                    1                    2011/01/01               2011/01/15

1                    2                    2011/01/10               2011/01/30

1                    3                    2011/01/20               2011/01/25

1                    4                    2011/01/29               2011/02/25

1                    5                    2011/02/10               2011/02/15

1                    6                    2011/02/10               2011/02/10

2                    2                    2011/01/10               2011/01/31

2                    4                    2011/01/29               2011/02/25

    

A couple of rules that need to be kept in mind:

If 2 promotions have the same start date, we use the one with the highest PromotionID

The ordering of these promotions need to be done per member, and the combination of promotions are unique per member

After the data manipulation, the results need to read as follows:

Member     PromotionID     StartDate               EndDate

1                    1                   2011/01/01               2011/01/09

1                    2                   2011/01/10               2011/01/19

1                    3                   2011/01/20               2011/01/25

1                    2                   2011/01/26               2011/01/28

1                    4                   2011/01/29               2011/02/09

1                    6                   2011/02/10               2011/02/10

1                    5                   2011/02/11               2011/02/15

2                    2                   2011/01/10               2011/01/28

2                    4                   2011/01/29               2011/02/25

The result of the data manipulation is therefor that a member is associated with 1 promotion only at any given point in time, and that it is possibile for the member to be associatiod with the same promotion for multiple time periods if there are time overlaps between promotions

              

I hope I have explained myself well enough

thanking you in advance

Paul

21 Replies
rbecher
MVP
MVP

Hi Paul,

this was my fault of understanding on a late post 😉

Hope this helps:

promotions:

NoConcatenate LOAD Member

, PromotionID

, StartDate

, date(nummin(EndDate, previous(StartDate) -1)) as EndDate

Resident raw_data

Where previous(StartDate)<>StartDate

Order By Member, PromotionID desc, StartDate desc;

- Ralf

Astrato.io Head of R&D
Not applicable
Author

Hi there Ralph

I am going to have to try this at work tomorrow, but if I am understanding the code correctly it will not provide for screnario where Prom 4 is associated with member 1 the 2nd time around on the 2011/02/25, or am I missing something here?

I'll get back once I have tried this all the same

thanks

Paul

rbecher
MVP
MVP

You're right! We have to consider the Member change also:

promotions:

NoConcatenate LOAD Member

, PromotionID

, StartDate

, date(nummin(EndDate, if(previous(Member) = Member, previous(StartDate) -1))) as EndDate

Resident raw_data

Where previous(StartDate)<>StartDate

Order By Member, PromotionID desc, StartDate desc;

But, look how easy this is with QV script!

- Ralf

Astrato.io Head of R&D
Not applicable
Author

Hey there Ralph

It seems like we are making progress, but that we are still not entirely there:

I have attached an image below to explain what data we still require in order for this to work correctly. After promotion 6 is no longer valid, we need to see if there are any other earlier promotions, that are still valid. In Member 1's case, there are 2 promotions that are still active: Promotion 4 and 5. We therefore now need to associated the next most recently added promotion to the member, which is Promotion 5. Once Promotion 5 expires, we once again need to see if there are any other promotions that are still active, and in the case of member 1 we can re-assign promotion 4 to the member (yes for a 2nd time). The result at the end of the day will look as follows: QV promotions explained visually - img 2.jpg

So as you can see, we are required to account for promotions that were added earlier in the member's lifetime (possibly for a second time), if they are still have a active promotion date after the most current promotion has expired.

Thanks for your continued interest in getting this problem cracked

Paul

rbecher
MVP
MVP

Hi Paul,

you have changed the initial data set. At the begining there was Promotion 5 starting at 2011/02/10 (as same as Prom. 6). Therefor Prom. 6 was overloaded.

Now you want some thing else. Why Prom. 5 starts again on 2011/02/15? In the first data set it ends on 2011/02/15...

- Ralf

Astrato.io Head of R&D
rbecher
MVP
MVP

After a while of thinking about the problem I recognize that this is very complex if you have more than two overlapping Promotions. This could lead into multiple segments of lower Promotions in different combinations. I wonder how you have made this with SQL. Maybe it is not complete either..

Astrato.io Head of R&D
Not applicable
Author

Hi there Ralph

See below for SQL code:

One thing to note - In the script below, I break the date range associated with a member and promotion into individual days. After the entire job has run, associating the days with members and promotions, I plan on amalgamating sucessive days for the same member and promotion, to get the result I am looking for. I have tried doing this in QV using an interval match but the volumes of data doesn't allow for more than 9 months worth of transactions, which is not good enough for the business

Heres the code anyway:

thanks

Paul

create table #t1

(

ID        int identity,

PromID    int,

StartDate datetime,

EndDate   datetime,

IdNumber  varchar(10)

)

insert #t1

     (

      PromID,

      StartDate,

      EndDate,

      IdNumber

      )

select a.PromID,

       a.StartDate,

       a.EndDate,

       b.IdNumber

  from PROMOTIONS2 a,

       PROM2_MEMBER b

where a.PromID        = b.PromID

   and a.StartDate    >= '1 Jan 2010 00:00'

   and ContactTypeCD  != 'OUTBPHONE'

   and b.IdNumber     in ('0866-04004','1332-02168')

order by b.IdNumber, a.StartDate, a.PromID

select distinct IdNumber, 'N' as DoneYN

  into #DistinctIdNumber

  from PROMOTIONS2 a,

       PROM2_MEMBER b

where a.PromID        = b.PromID

   and ContactTypeCD  != 'OUTBPHONE'

   and a.StartDate    >= '1 Jan 2010 00:00'

   and b.IdNumber  in ('0866-04004','1332-02168')

create clustered index id on #DistinctIdNumber(IdNumber)

create table #output

(

PromID   int,

Day      date,

IdNumber varchar(10)

)

declare @start    date,

        @end      date,

        @promID   int,

        @idNumber varchar(10)

select @idNumber = IdNumber

  from #DistinctIdNumber

where DoneYN = 'N'

select @idNumber

while @@ROWCOUNT > 0 AND @idNumber != '' begin

        select @start    = '1 Jan 2010',

               @end      = '31 dec 2011'

        while @start <= @end begin

           select @promID = null

           select @promID   = max(PromID)

             from #t1 a

            where @start between a.StartDate and a.EndDate

              and IdNumber     = @idNumber

            --order by a.StartDate asc

           if @promID != null begin

              insert #output

              select @promID,

                     @start,

                     @idNumber

           end

           select @start = dateadd( dd, 1, @start)

        end -- while

    update a

       set DoneYN = 'Y'

      from #DistinctIdNumber a

     where IdNumber = @idNumber

    select @idNumber = IdNumber

      from #DistinctIdNumber

     where DoneYN = 'N'

end--WHILE

Not applicable
Author

Hi there Ralph

Not sure if you still need to answer your question: "Why Prom. 5 starts again on 2011/02/15? In the first data set it ends on 2011/02/15...", but my answer would be that after Prom 6 expires on the 14th, Prom 5 is the most recently added active promotion and for that reason I need to use that one.

Not sure if you saw the sql I posted, but I am assuming that we have hit a dead end and that I should proceed with the night run approach?

Many thanks for your interest and help

Kind regards

Paul

rbecher
MVP
MVP

Hi Paul,

I do understand your problem but have no quick QV solution yet. Maybe: join out all days, then loop over grouping by date, getting max Promotion per day. Then aggregate StartDate and EndDate by Promotion to build the splitted segments. But, this would be some more lines of code.. 😉 Also, could be an performance issue on a high row count.

Btw., I'm not sure if your SQL code covers all possible overlapping situations, especially if high overlapping and segmentation occurs...

- Ralf

Astrato.io Head of R&D
Not applicable
Author

Hi Ralph

We have written some QV code on this side, similar to your description above, but the performance is just not acceptable, resulting in the model only handling information for 9 months or less. Not good enough....

//Determine all promotions

"TPromotions":

select a.IdNumber,

convert(Date,StartDate) as PromoFromDate,

convert(Date,EndDate) as PromoToDate,

a.PromID

from RCI_MAIN.dbo.PROM2_MEMBER        a,

RCI_MAIN.dbo.PROMOTIONS2         b,

RCI_MAIN.dbo.PROM2_CONTACT_TYPES c  

where b.StartDate             is not null

and b.EndDate               is not null

and b.ContactTypeCD             != 'OUTBPHONE'

and a.PromID                     = b.PromID

and b.ContactTypeCD              = c.ContactTypeCD

and ISNULL(a.IdNumber,SPACE(0)) != SPACE(0)

and ControlGroupYN               = 'N'

and StartDate                   >= dateadd(MM,-11,getdate())

order by 1, 2 desc, 4 desc;

"DistinctMemberBase":

load distinct IdNumber as IdNumber

resident TPromotions;

// The below would of course have to be a date could create a year of dates..

TTimePeriods:

select CONVERT(date,CalendarDate) as TimePeriod

from RCI_DW.dbo.DimCalendar

where CalendarID between convert(varchar(10),dateadd(MM,-11,getdate()),112) and convert(varchar(10),getdate(),112);

join

load IdNumber

resident DistinctMemberBase;

join (TTimePeriods)

intervalmatch (TimePeriod, IdNumber)

load PromoFromDate,

PromoToDate,

IdNumber

resident TPromotions;

drop table DistinctMemberBase;

// Now we create 1 table

join (TPromotions)

load *

resident TTimePeriods;

// Now take the Max Promo

"PreRS":

load max(PromID) as MaxPromID,

IdNumber as FinalIdNumber,

TimePeriod as TP

resident TPromotions

Group by IdNumber,TimePeriod;

drop table TTimePeriods;

drop table TPromotions;