Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey everyone, I'm trying to add missing weeks for each query & page combo to the main table so that I can do some calculations on it. I've created a master calendar with all weeks, joined with each query & page combo to get each query & page with every week, but when I try to concatenate that onto the main table where it doesn't already exist, I get duplicates for the weeks that do exist.
Source data example:
weekstart | query | page | avg_pos |
1/3/2016 | AAA | page_1 | 6 |
2/14/2016 | AAA | page_1 | 123 |
3/27/2016 | AAA | page_1 | 96 |
4/17/2016 | AAA | page_1 | 81 |
4/24/2016 | AAA | page_1 | 73 |
Desired output:
weekstart | query | page | avg_pos |
1/3/2016 | AAA | page_1 | 6 |
1/10/2016 | AAA | page_1 | - |
1/17/2016 | AAA | page_1 | - |
1/24/2016 | AAA | page_1 | - |
1/31/2016 | AAA | page_1 | - |
2/7/2016 | AAA | page_1 | - |
2/14/2016 | AAA | page_1 | 123 |
2/21/2016 | AAA | page_1 | - |
2/28/2016 | AAA | page_1 | - |
3/6/2016 | AAA | page_1 | - |
3/13/2016 | AAA | page_1 | - |
3/20/2016 | AAA | page_1 | - |
3/27/2016 | AAA | page_1 | 96 |
4/17/2016 | AAA | page_1 | 81 |
4/24/2016 | AAA | page_1 | 73 |
Current (wrong) output:
weekstart | query | page | avg_pos |
1/3/2016 | AAA | page_1 | 6 |
1/3/2016 | AAA | page_1 | - |
1/10/2016 | AAA | page_1 | - |
1/17/2016 | AAA | page_1 | - |
1/24/2016 | AAA | page_1 | - |
1/31/2016 | AAA | page_1 | - |
2/7/2016 | AAA | page_1 | - |
2/14/2016 | AAA | page_1 | 123 |
2/14/2016 | AAA | page_1 | - |
2/21/2016 | AAA | page_1 | - |
2/28/2016 | AAA | page_1 | - |
3/6/2016 | AAA | page_1 | - |
3/13/2016 | AAA | page_1 | - |
3/20/2016 | AAA | page_1 | - |
3/27/2016 | AAA | page_1 | 96 |
3/27/2016 | AAA | page_1 | - |
4/17/2016 | AAA | page_1 | 81 |
4/17/2016 | AAA | page_1 | - |
4/24/2016 | AAA | page_1 | 73 |
4/24/2016 | AAA | page_1 | - |
Anyone know why duplicate rows are being created even when I use the where not exists() clause? Here's my script:
gwt:
LOAD weekstart,
query,
page,
avg_pos,
page&'|'&query&'|'&weekstart as pqw_key
FROM
test.xlsx
(ooxml, embedded labels, table is Sheet1);
// master calendar to generate all weeks
Temp:
Load
min(weekstart) as minDate,
max(weekstart) as maxDate
Resident gwt;
Let varMinDate = Num(Peek('minDate', 0, 'Temp'));
Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));
DROP Table Temp;
TempCalendar:
LOAD
Date($(varMinDate) + IterNo() - 1) as TempDate
AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate);
cal_temp:
LOAD distinct
page,
query
RESIDENT gwt;
join(cal_temp)
Load DISTINCT
weekstart(TempDate) as weekstart
Resident TempCalendar
;
concatenate(gwt)
LOAD *,
page&'|'&query&'|'&weekstart as pqw_key
RESIDENT cal_temp
WHERE NOT exists(pqw_key, page&'|'&query&'|'&weekstart)
;
Drop Tables TempCalendar, cal_temp;
drop fields pqw_key;
Thank you!
Try
cal_temp:
Noconcatenate LOAD distinct
page,
query
RESIDENT gwt;
join(cal_temp)
Load DISTINCT
weekstart(TempDate) as weekstart
Resident TempCalendar
;
tamilarasu thank you for your reply. I tried your script with a larger set of my data and it resulted in nulls for weekstart. Perhaps the various query and page combos are causing it to not work? I've attached the script and a spreadsheet with a larger sample of what the real data looks like.
youssefbelloum I also used your suggestion of weekstart(TempDate,1) which works for the first instance of duplicated week, but the other weeks after that are still duplicated.
sasiparupudi1 I tried the NoConcatenate addition, but that didn't work either.
Thank you all for your suggestions, I really appreciate it! Just not sure why it keeps duplicating...
@bgoldberg I think the issue is in your first load
gwt:
LOAD Date(Date#(weekstart,'MM/DD/YYYY'),'MM/DD/YYYY') as weekstart,
query,
page,
avg_pos,
page&'|'&query&'|'&Date(Date#(weekstart,'MM/DD/YYYY'),'MM/DD/YYYY') as pqw_key
FROM
[https://community.qlik.com/thread/298448]
(html, codepage is 1252, embedded labels, table is @1);
// master calendar to generate all weeks
Temp:
Load
min(weekstart) as minDate,
max(weekstart) as maxDate
Resident gwt;
Let varMinDate = Num(Peek('minDate', 0, 'Temp'));
Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));
DROP Table Temp;
TempCalendar:
LOAD
Date($(varMinDate) + IterNo() - 1) as TempDate
AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate);
cal_temp:
LOAD distinct
page,
query
RESIDENT gwt;
join(cal_temp)
Load DISTINCT
weekstart(TempDate) as weekstart
Resident TempCalendar
;
concatenate(gwt)
LOAD *,
page&'|'&query&'|'&weekstart as pqw_key
RESIDENT cal_temp
WHERE NOT exists(pqw_key, page&'|'&query&'|'&weekstart)
;
Drop Tables TempCalendar, cal_temp;
drop fields pqw_key;
Ben - Here you go.
Youssef - Sorry for the late reply. I could not get time to check the community after my last reply. The answer is no. Actually the problem has occurred while creating the "pqw_key" field.
While creating pqw_key in gwt table, weekstart is automatically converted to number format.
To understand this better, I have loaded only the above part and the result looks like below.
We got a proper date format while creating a calendar.
I have loaded only the above part (dropped gwt table) and the result looks like below.
So the where exists did not find any match in the gwt table (as you can see the pqw_key values are different) and it concatenate everything which results duplicate rows.
To avoid this, I have used date function in all the places (including where clause. It is not necessary to add in calendar script since it is showing correct date format but to avoid confusion and to make it clear, I just added this in all the three places). Have a look at the second attachment.
page&'|'&query&'|'& Date(weekstart) as pqw_key
Hope I have explained it well .
Nailed it! Thanks so much for your help, it works as expected now. I hadn't even thought of the date formats causing the issue! Thank you everyone else for your help as well.
Thank you very much for your time tamilarasu it was very well explained
Like you can see above, first thing I've done it was formatting the date field BUT I haven't think about format it again after creating weekstart (it is a Guru Reflex ).
Have a nice day
Have a nice day too!