Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey everyone, I'm trying to add missing weeks for each query & page combo to the main table so that I can do some calculations on it. I've created a master calendar with all weeks, joined with each query & page combo to get each query & page with every week, but when I try to concatenate that onto the main table where it doesn't already exist, I get duplicates for the weeks that do exist.
Source data example:
weekstart | query | page | avg_pos |
1/3/2016 | AAA | page_1 | 6 |
2/14/2016 | AAA | page_1 | 123 |
3/27/2016 | AAA | page_1 | 96 |
4/17/2016 | AAA | page_1 | 81 |
4/24/2016 | AAA | page_1 | 73 |
Desired output:
weekstart | query | page | avg_pos |
1/3/2016 | AAA | page_1 | 6 |
1/10/2016 | AAA | page_1 | - |
1/17/2016 | AAA | page_1 | - |
1/24/2016 | AAA | page_1 | - |
1/31/2016 | AAA | page_1 | - |
2/7/2016 | AAA | page_1 | - |
2/14/2016 | AAA | page_1 | 123 |
2/21/2016 | AAA | page_1 | - |
2/28/2016 | AAA | page_1 | - |
3/6/2016 | AAA | page_1 | - |
3/13/2016 | AAA | page_1 | - |
3/20/2016 | AAA | page_1 | - |
3/27/2016 | AAA | page_1 | 96 |
4/17/2016 | AAA | page_1 | 81 |
4/24/2016 | AAA | page_1 | 73 |
Current (wrong) output:
weekstart | query | page | avg_pos |
1/3/2016 | AAA | page_1 | 6 |
1/3/2016 | AAA | page_1 | - |
1/10/2016 | AAA | page_1 | - |
1/17/2016 | AAA | page_1 | - |
1/24/2016 | AAA | page_1 | - |
1/31/2016 | AAA | page_1 | - |
2/7/2016 | AAA | page_1 | - |
2/14/2016 | AAA | page_1 | 123 |
2/14/2016 | AAA | page_1 | - |
2/21/2016 | AAA | page_1 | - |
2/28/2016 | AAA | page_1 | - |
3/6/2016 | AAA | page_1 | - |
3/13/2016 | AAA | page_1 | - |
3/20/2016 | AAA | page_1 | - |
3/27/2016 | AAA | page_1 | 96 |
3/27/2016 | AAA | page_1 | - |
4/17/2016 | AAA | page_1 | 81 |
4/17/2016 | AAA | page_1 | - |
4/24/2016 | AAA | page_1 | 73 |
4/24/2016 | AAA | page_1 | - |
Anyone know why duplicate rows are being created even when I use the where not exists() clause? Here's my script:
gwt:
LOAD weekstart,
query,
page,
avg_pos,
page&'|'&query&'|'&weekstart as pqw_key
FROM
test.xlsx
(ooxml, embedded labels, table is Sheet1);
// master calendar to generate all weeks
Temp:
Load
min(weekstart) as minDate,
max(weekstart) as maxDate
Resident gwt;
Let varMinDate = Num(Peek('minDate', 0, 'Temp'));
Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));
DROP Table Temp;
TempCalendar:
LOAD
Date($(varMinDate) + IterNo() - 1) as TempDate
AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate);
cal_temp:
LOAD distinct
page,
query
RESIDENT gwt;
join(cal_temp)
Load DISTINCT
weekstart(TempDate) as weekstart
Resident TempCalendar
;
concatenate(gwt)
LOAD *,
page&'|'&query&'|'&weekstart as pqw_key
RESIDENT cal_temp
WHERE NOT exists(pqw_key, page&'|'&query&'|'&weekstart)
;
Drop Tables TempCalendar, cal_temp;
drop fields pqw_key;
Thank you!
Youssef - Sorry for the late reply. I could not get time to check the community after my last reply. The answer is no. Actually the problem has occurred while creating the "pqw_key" field.
While creating pqw_key in gwt table, weekstart is automatically converted to number format.
To understand this better, I have loaded only the above part and the result looks like below.
We got a proper date format while creating a calendar.
I have loaded only the above part (dropped gwt table) and the result looks like below.
So the where exists did not find any match in the gwt table (as you can see the pqw_key values are different) and it concatenate everything which results duplicate rows.
To avoid this, I have used date function in all the places (including where clause. It is not necessary to add in calendar script since it is showing correct date format but to avoid confusion and to make it clear, I just added this in all the three places). Have a look at the second attachment.
page&'|'&query&'|'& Date(weekstart) as pqw_key
Hope I have explained it well .
Hi,
everything seems to be ok..
no duplicate lines for me.
Maybe you modify a little bit the script and you didn't verify ?
Hey Youssef, thank you for the quick response. I looked at your script and noticed that our default SET statements are different, specifically the day that weekstart() references so that's why it appeared correct, but really it was still duplicating. In yours, SET FirstWeekDay=0; but in mine SET FirstWeekDay=1; so when I set your script variables to match mine, I got the same wrong result. I attached your script to show.
I See..
don't know what is the issue here.
one workaround would be this (let the FistWeekDay variables to 0)
gwt:
LOAD Date(Date#(weekstart,'MM/DD/YYYY'),'MM/DD/YYYY') as weekstart,
query,
page,
avg_pos,
page&'|'&query&'|'&weekstart as pqw_key
FROM
[https://community.qlik.com/thread/298448]
(html, codepage is 1252, embedded labels, table is @1);
// master calendar to generate all weeks
Temp:
Load
min(weekstart) as minDate,
max(weekstart) as maxDate
Resident gwt;
Let varMinDate = Num(Peek('minDate', 0, 'Temp'));
Let varMaxDate = Num(Peek('maxDate', 0, 'Temp'));
DROP Table Temp;
TempCalendar:
LOAD
Date($(varMinDate) + IterNo() - 1) as TempDate
AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate);
cal_temp:
LOAD distinct
page,
query
RESIDENT gwt;
join(cal_temp)
Load DISTINCT
weekstart(TempDate,1) as weekstart
Resident TempCalendar ;
concatenate(gwt)
LOAD *,
page&'|'&query&'|'&weekstart as pqw_key
RESIDENT cal_temp
WHERE NOT exists(pqw_key, page&'|'&query&'|'&weekstart) ;
Drop Tables TempCalendar, cal_temp;
drop fields pqw_key;
Maybe stalwar1 can take a look here ?
Sunny, the problem here is when OP set FirstWeekDay variable to 6, he have duplicate lines.. otherwise it works.
Hi,
here you added the missing Weekstarts on the table, but without values on the rest of the sub key fields, which are QUERY and PAGE.
Ok got it. I misunderstood the question.
I was exactly like you, why the concatenate, but then I was stuck changing the way to do this..
it is working with the work around that I add above, but it is a bit weird that is duplicating lines only because of the variable value..
Hi Ben,
PFA.
It was my mistake. I haven't read the question properly. Thanks for pointing out.
tamilarasu so the problem was the EXTRA step of creating a temp table before joining ??