Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a sales dataset with columns CustomerID, ProductCode, and SalesDate. Each line has a different combination of these variables.
Each ProductCode can be categorized into one of two groups. Let's call them Group A and Group B.
I want to organize a new table with four columns: CustomerID, isGroupA, isGroupB, and isBoth.
Each line should be associated with a unique CustomerID, with counts of total purchases of Group A products in isGroupA, a count of total purchases of Group B products in isGroupB, and, if the values of isGroupA and isGroupB are both greater than zero, a boolean value of 1 in isBoth.
So far, we have tried using the Distinct keyword in association with the CustomerID column, using the variables as dimensions (rather than measures), as well as a number of more complicated recommendations for similar puzzles found on the Qlik community boards.
Current Table:
CustomerID | ProductCode | SalesDate |
1 | x | 2020 |
1 | x | 2021 |
1 | y | 2022 |
1 | x | 2022 |
2 | x | 2019 |
2 | p | 2021 |
2 | y | 2022 |
3 | q | 2005 |
3 | q | 2021 |
3 | p | 2022 |
4 | y | 2021 |
4 | p | 2021 |
Group A: [x,y]
Group B: [p,q]
Desired Table:
CustomerID | isGroupA | isGroupB | isBoth |
1 | 4 | 0 | 0 |
2 | 2 | 1 | 1 |
3 | 0 | 3 | 0 |
4 | 1 | 1 | 1 |
How do I count and consolidate the multiple sales records for each customer into one row?
Hi, one way would be like this:
data:
LOAD *,
CustomerID & '|' & SalesDate as cust_date.#key,
ProductCode & '|' & SalesDate as counter
INLINE [
CustomerID|ProductCode|SalesDate
1|x|2020
1|x|2021
1|y|2022
1|x|2022
2|x|2019
2|p|2021
2|y|2022
3|q|2005
3|q|2021
3|p|2022
4|y|2021
4|p|2021
] (DELIMITER IS '|');
temp_calc:
LOAD
CustomerID,
[cust_date.#key],
ProductCode,
counter
RESIDENT data;
LEFT JOIN (temp_calc)
LOAD * INLINE [
ProductCode|GroupCode
x|A
y|A
p|B
q|B
] (DELIMITER IS '|');
temp_calc2:
LOAD
CustomerID,
GroupCode,
COUNT(DISTINCT counter) AS count_of_counter
RESIDENT temp_calc
GROUP BY CustomerID,GroupCode;
DROP TABLE temp_calc;
temp_conc:
LOAD DISTINCT
CustomerID
RESIDENT temp_calc2;
JOIN (temp_conc)
LOAD
SUBFIELD('A|B','|') AS GroupCode
AUTOGENERATE 1;
CONCATENATE (temp_calc2)
LOAD *,
0 AS count_of_counter
RESIDENT temp_conc;
DROP TABLE temp_conc;
temp_final:
LOAD DISTINCT
CustomerID
RESIDENT temp_calc2;
LEFT JOIN (temp_final)
LOAD
CustomerID,
RANGESUM(SUM(count_of_counter),0) AS isGroupA
RESIDENT temp_calc2
WHERE MATCH(GroupCode,'A')
GROUP BY CustomerID;
LEFT JOIN (temp_final)
LOAD
CustomerID,
RANGESUM(SUM(count_of_counter),0) AS isGroupB
RESIDENT temp_calc2
WHERE MATCH(GroupCode,'B')
GROUP BY CustomerID;
DROP TABLE temp_calc2;
final:
LOAD *,
PICK(MATCH(RANGESUM(isGroupA*isGroupB,0),0)+1,1,0) AS isBoth
RESIDENT temp_final;
DROP TABLE temp_final;
DROP FIELDS counter,[cust_date.#key];
As below
temp:
load *,recno() as rowID inline [
CustomerID,ProductCode,SalesDate
1,x,2020
1,x,2021
1,y,2022
1,x,2022
2,x,2019
2,p,2021
2,y,2022
3,q,2005
3,q,2021
3,p,2022
4,y,2021
4,p,2021
];
Summary:
Load
*,
if(GroupA>0 and GroupB>0,1,0) as isBoth
;
Load CustomerID
,count(if(WildMatch(ProductCode,'x','y'),1)) as GroupA
,count(if(WildMatch(ProductCode,'p','q'),1)) as GroupB
Resident temp
Group by CustomerID;
exit Script;