3 Replies Latest reply: May 6, 2014 1:03 PM by Marko Rakar RSS

    Finding intervals in data stream

    Marko Rakar

      I am trying to create gantt chart based on great article by hic (see: http://community.qlik.com/docs/DOC-3113).

       

      Now, my data stream is different from the data Henric used and I just have data stream of events (basically I am trying to analyse marketing events).

       

      For example I have data stream like this:

      1. 31-03-2013, TV ad
      2. 01-04-2013, TV ad
      3. 02-04-2013, TV ad
      4. 03-04-2013, TV ad
      5. 01-07-2013, TV ad
      6. 02-07-2013, TV ad
      7. 03-07-2013, TV ad
      8. 04-07-2013, TV ad

       

      Now, if you look at this, if I just search for min and max dates I will get (incorrect) conclusion that my campaign started on 31-3-2013 and ended on 04-07-2013.

      But, in truth, I had two campaigns one which lasted from 31-03 till 03-04 and then gap all the way to 01-07 when my second campaign started and ended on 04-07-2013.

       

      Is there a (smart) way to try to identify periods, and period can be defined as advertising which is happening daily and gap between one campaign and the next is more then x days?

       

      Any ideas?

        • Re: Finding intervals in data stream
          Henric Cronström

          So, a campaign has daily events? Then you could use

           

          Load

               Date,

               If(Date=Peek(Date)+1, Peek(CampaignID), RangeSum(Peek(CampaignID),1) ) as CampaignID,

               ...

           

          This will bump the campaignID by one of there is a gap. But you need to have your data sorted after Date for this to work.

           

          HIC

            • Re: Finding intervals in data stream
              Marko Rakar

              huh, good question

               

              it is little more complex then that; I have more then 400.000 events in the database and I have to sort it out not only by date, but also by customer, distribution channel and product as well

               

              also, it might happen that you have campaign just weekdays and not over the weekend, so dates might not be consequtive and my idea was to try to find gap between campaigns by playing with number of days between two ads

              • Re: Re: Finding intervals in data stream
                Marko Rakar

                Great thinking, this was not entirely what I wanted but I ended up with following solution which proved to be good enough:

                 

                New:

                LOAD *,

                 

                If(Dates=Peek(Dates)+1 OR Dates=Peek(Dates) OR Dates=Peek(Dates)+2 OR Dates=Peek(Dates) + 3, Peek(CampaignID), ProductName & ' ' & Dates ) as CampaignID

                 

                Resident Old Order by AdvertiserName, ProductName, Dates asc;

                 

                Drop table Old;

                 

                Now, the thing is that we need to somehow recognize continuous campaign and I do this by my IF statement where I look for all ads which were aired either today, yesterday or up to 3 days before (there are some troubles with this approach but they can also be resolved).

                 

                Then, if codintions are met, then I take vale from Campaign ID from the previous row and copy it to current row.

                 

                If we do not meet criteria, then it means that this is first ad in our new campaign and we create unique ID by using product name which is advertised and first date of ad airing (also not fool proof but good enough).

                 

                (this is fun)

                 

                p.s. note that special problem is sorting of my data, and for this I load my data into memory and reload the table as resident in proper order so I have aligned my rows in sequence which best follows my intentions (loading from file does not allow sorting so I had to reload from already resident table)