17 Replies Latest reply: Apr 17, 2018 9:56 PM by Tamil arasu RSS

    Concatenate missing weeks with resident and where not exists

    Ben Goldberg

      Hey everyone, I'm trying to add missing weeks for each query & page combo to the main table so that I can do some calculations on it. I've created a master calendar with all weeks, joined with each query & page combo to get each query & page with every week, but when I try to concatenate that onto the main table where it doesn't already exist, I get duplicates for the weeks that do exist.

       

      Source data example:

      weekstartquerypageavg_pos
      1/3/2016AAApage_16
      2/14/2016AAApage_1123
      3/27/2016AAApage_196
      4/17/2016AAApage_181
      4/24/2016AAApage_173

       

      Desired output:

      weekstartquerypageavg_pos
      1/3/2016AAApage_16
      1/10/2016AAApage_1-
      1/17/2016AAApage_1-
      1/24/2016AAApage_1-
      1/31/2016AAApage_1-
      2/7/2016AAApage_1-
      2/14/2016AAApage_1123
      2/21/2016AAApage_1-
      2/28/2016AAApage_1-
      3/6/2016AAApage_1-
      3/13/2016AAApage_1-
      3/20/2016AAApage_1-
      3/27/2016AAApage_196
      4/17/2016AAApage_181
      4/24/2016AAApage_173

       

      Current (wrong) output:

      weekstartquerypageavg_pos
      1/3/2016AAApage_16
      1/3/2016AAApage_1-
      1/10/2016AAApage_1-
      1/17/2016AAApage_1-
      1/24/2016AAApage_1-
      1/31/2016AAApage_1-
      2/7/2016AAApage_1-
      2/14/2016AAApage_1123
      2/14/2016AAApage_1-
      2/21/2016AAApage_1-
      2/28/2016AAApage_1-
      3/6/2016AAApage_1-
      3/13/2016AAApage_1-
      3/20/2016AAApage_1-
      3/27/2016AAApage_196
      3/27/2016AAApage_1-
      4/17/2016AAApage_181
      4/17/2016AAApage_1-
      4/24/2016AAApage_173
      4/24/2016AAApage_1-

       

      Anyone know why duplicate rows are being created even when I use the where not exists() clause? Here's my script:

       

      gwt:
      LOAD weekstart, 
           query, 
           page, 
           avg_pos,
           page&'|'&query&'|'&weekstart as pqw_key
      FROM
      test.xlsx
      (ooxml, embedded labels, table is Sheet1);
      
      // master calendar to generate all weeks
      Temp:  
      Load  
                     min(weekstart) as minDate,  
                     max(weekstart) as maxDate  
      Resident gwt;  
      Let varMinDate = Num(Peek('minDate', 0, 'Temp'));  
      Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));  
      DROP Table Temp;  
      TempCalendar:  
      LOAD  
                     Date($(varMinDate) + IterNo() - 1) as TempDate  
                     AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate);  
      
      cal_temp:
      LOAD distinct
      page,
      query
      RESIDENT gwt;
        
      join(cal_temp)
      Load DISTINCT 
                     weekstart(TempDate) as weekstart  
      Resident TempCalendar  
      ;  
      
      concatenate(gwt)
      LOAD *,
      page&'|'&query&'|'&weekstart as pqw_key
      RESIDENT cal_temp
      WHERE NOT exists(pqw_key, page&'|'&query&'|'&weekstart)
      ;
      
      
      Drop Tables TempCalendar, cal_temp;  
      drop fields pqw_key;
      

       

      Thank you!