Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
marko_rakar
Creator
Creator

Finding intervals in data stream

I am trying to create gantt chart based on great article by Henric Cronström (see: http://community.qlik.com/docs/DOC-3113).

Now, my data stream is different from the data Henric used and I just have data stream of events (basically I am trying to analyse marketing events).

For example I have data stream like this:

  1. 31-03-2013, TV ad
  2. 01-04-2013, TV ad
  3. 02-04-2013, TV ad
  4. 03-04-2013, TV ad
  5. 01-07-2013, TV ad
  6. 02-07-2013, TV ad
  7. 03-07-2013, TV ad
  8. 04-07-2013, TV ad

Now, if you look at this, if I just search for min and max dates I will get (incorrect) conclusion that my campaign started on 31-3-2013 and ended on 04-07-2013.

But, in truth, I had two campaigns one which lasted from 31-03 till 03-04 and then gap all the way to 01-07 when my second campaign started and ended on 04-07-2013.

Is there a (smart) way to try to identify periods, and period can be defined as advertising which is happening daily and gap between one campaign and the next is more then x days?

Any ideas?

1 Solution

Accepted Solutions
hic
Former Employee
Former Employee

So, a campaign has daily events? Then you could use

Load

     Date,

     If(Date=Peek(Date)+1, Peek(CampaignID), RangeSum(Peek(CampaignID),1) ) as CampaignID,

     ...

This will bump the campaignID by one of there is a gap. But you need to have your data sorted after Date for this to work.

HIC

View solution in original post

3 Replies
hic
Former Employee
Former Employee

So, a campaign has daily events? Then you could use

Load

     Date,

     If(Date=Peek(Date)+1, Peek(CampaignID), RangeSum(Peek(CampaignID),1) ) as CampaignID,

     ...

This will bump the campaignID by one of there is a gap. But you need to have your data sorted after Date for this to work.

HIC

marko_rakar
Creator
Creator
Author

huh, good question

it is little more complex then that; I have more then 400.000 events in the database and I have to sort it out not only by date, but also by customer, distribution channel and product as well

also, it might happen that you have campaign just weekdays and not over the weekend, so dates might not be consequtive and my idea was to try to find gap between campaigns by playing with number of days between two ads

marko_rakar
Creator
Creator
Author

Great thinking, this was not entirely what I wanted but I ended up with following solution which proved to be good enough:

New:

LOAD *,

If(Dates=Peek(Dates)+1 OR Dates=Peek(Dates) OR Dates=Peek(Dates)+2 OR Dates=Peek(Dates) + 3, Peek(CampaignID), ProductName & ' ' & Dates ) as CampaignID

Resident Old Order by AdvertiserName, ProductName, Dates asc;

Drop table Old;

Now, the thing is that we need to somehow recognize continuous campaign and I do this by my IF statement where I look for all ads which were aired either today, yesterday or up to 3 days before (there are some troubles with this approach but they can also be resolved).

Then, if codintions are met, then I take vale from Campaign ID from the previous row and copy it to current row.

If we do not meet criteria, then it means that this is first ad in our new campaign and we create unique ID by using product name which is advertised and first date of ad airing (also not fool proof but good enough).

(this is fun)

p.s. note that special problem is sorting of my data, and for this I load my data into memory and reload the table as resident in proper order so I have aligned my rows in sequence which best follows my intentions (loading from file does not allow sorting so I had to reload from already resident table)