Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi experts,
I have a table with data like below,
ProdId | Date | Category | Amount | |
101 | 3/31/2020 | A | 100 | |
102 | 3/31/2020 | B | 150 | |
103 | 3/31/2020 | A | 200 | |
105 | 3/31/2020 | A | 120 | |
101 | 4/30/2020 | A | 150 | |
102 | 4/30/2020 | A | 200 | |
103 | 4/30/2020 | B | 110 | |
104 | 4/30/2020 | A | 180 |
Basically, a prodId can be in category A or B on a date, but the category can change from date to date. For example, in above table, ProdId 102 was categorized to B on 3/31 but moved to category A on 4/30. Similarly, 103 was in category A but moved to category B. Also, 104 is a new product which only exists on 4/30.
Now I need to show a table on the dashboard,
Group | Category Change | Count (No. of products) | Total Amount (on 4/30) |
1 | From Category A to Other Categories | 1 (because ProdId 103 was moved from category A to other category) | 110 |
2 | From other categories to category A | 1 (because ProdId 102 was moved from other category to category A) | 200 |
3 | New to category A | 1 (because ProdId 104 is added to Category on 4/30 only and did not exists on 3/31) | 180 |
4 | Removed from category A | 1 (because ProdId 105 was in category A on 3/31 but it does not exist on 4/30) | 0 |
Basically, I need to group the data into 4 groups,
Group 1 is for products that were in category A on 3/31 but moved to other categories on 4/30;
Group 2 is for products that were in category A on 4/30 but in other categories on 3/31;
Group 3 is for products that are newly added on 4/30 to category A, and they did not exist on 3/31;
Group 4 is for products that were in category A on 3/31 but do not exist anymore on 4/30.
How can I do this using expressions?
Thanks,
Michael
Attached my version :
I assume from the statement that there are only two categories A and B
Loading script :
Data:
LOAD rowno() as ID,* INLINE [
ProdId, Date, Category, Amount
101, 3/31/2020, A, 100
102, 3/31/2020, B, 150
103, 3/31/2020, A, 200
105, 3/31/2020, A, 120
101, 4/30/2020, A, 150
102, 4/30/2020, A, 200
103, 4/30/2020, B, 110
104, 4/30/2020, A, 180
];
output:
load purgechar(Concat_Cat,'0123456789') as Concat_Categorie,*;
load ProdId, concat(ID&Category,',') as Concat_Cat, FirstSortedValue(Amount,(Date#(Date,'M/DD/YYYY'))) as FirstAmount,FirstSortedValue(Amount,-(Date#(Date,'M/DD/YYYY'))) as LastAmount,Date(Max(Date#(Date,'M/DD/YYYY'))) as MaxDate resident Data Group by ProdId order by ID ;
drop table Data;
drop field Concat_Cat;
2- Create Dimension:
Group:
=ValueList('1','2','3','4')
3- create Mesure :
Category Change :
=PICK(MATCH(VALUELIST('1','2','3','4'),'1','2','3','4'),
'From Category A to Other Categories',
'From other categories to category A',
'New to category A',
'Removed from category A'
)
Count (No. of products) :
PICK(MATCH(VALUELIST('1','2','3','4'),'1','2','3','4'),
count({<Concat_Categorie={'A,B'}>} ProdId),
count({<Concat_Categorie={'B,A'}>} ProdId),
count({<Concat_Categorie={'A'},MaxDate={"$(=Date#(Max(MaxDate),'M/DD/YYYY'))"}>} ProdId),
count({<Concat_Categorie={'A'},MaxDate={"$(=Date#(Min(MaxDate),'M/DD/YYYY'))"}>} ProdId)
)
='Total Amount (on '& Date(Max(MaxDate),'M/DD')&')' :
PICK(MATCH(VALUELIST('1','2','3','4'),'1','2','3','4'),
Sum({<Concat_Categorie={'A,B'}>} LastAmount),
Sum({<Concat_Categorie={'B,A'}>} LastAmount),
Sum({<Concat_Categorie={'A'},MaxDate={"$(=Date#(Max(MaxDate),'M/DD/YYYY'))"}>} LastAmount),
0
)
Results :
One solution is.
tab1:
LOAD RowNo() As RowID,* INLINE [
ProdId, Date, Category, Amount
101, 3/31/2020, A, 100
102, 3/31/2020, B, 150
103, 3/31/2020, A, 200
105, 3/31/2020, A, 120
101, 4/30/2020, A, 150
102, 4/30/2020, A, 200
103, 4/30/2020, B, 110
104, 4/30/2020, A, 180
];
Left Join(tab1)
LOAD ProdId, Concat(Category,'', Date) As Str, FirstSortedValue(Category, Date) As Val1, FirstSortedValue(Category, -Date) As Val2
Resident tab1
Group By ProdId
;
Left Join(tab1)
LOAD Date(Min(Date)) As MinDt, Date(Max(Date)) As MaxDt
Resident tab1;
tab2:
LOAD *, Pick(Match(-1, Str Like 'A?*' And Val2<>'A', Str Like '?*A' And Val1<>'A', Str='A' And Date=MaxDt, Str='A' And Date=MinDt), 'Group1', 'Group2', 'Group3', 'Group4') As Group
, If(Date=MaxDt, Amount, 0) As FinalAmount
Resident tab1;
tab3:
LOAD Group, Concat(DISTINCT ProdId) As Group_Str, Count(DISTINCT ProdId) As Group_Cnt
, Sum(FinalAmount) As [Total Amount (on 4/30)]
Resident tab2
Group By Group
;
Drop Table tab1, tab2;
Attached my version :
I assume from the statement that there are only two categories A and B
Loading script :
Data:
LOAD rowno() as ID,* INLINE [
ProdId, Date, Category, Amount
101, 3/31/2020, A, 100
102, 3/31/2020, B, 150
103, 3/31/2020, A, 200
105, 3/31/2020, A, 120
101, 4/30/2020, A, 150
102, 4/30/2020, A, 200
103, 4/30/2020, B, 110
104, 4/30/2020, A, 180
];
output:
load purgechar(Concat_Cat,'0123456789') as Concat_Categorie,*;
load ProdId, concat(ID&Category,',') as Concat_Cat, FirstSortedValue(Amount,(Date#(Date,'M/DD/YYYY'))) as FirstAmount,FirstSortedValue(Amount,-(Date#(Date,'M/DD/YYYY'))) as LastAmount,Date(Max(Date#(Date,'M/DD/YYYY'))) as MaxDate resident Data Group by ProdId order by ID ;
drop table Data;
drop field Concat_Cat;
2- Create Dimension:
Group:
=ValueList('1','2','3','4')
3- create Mesure :
Category Change :
=PICK(MATCH(VALUELIST('1','2','3','4'),'1','2','3','4'),
'From Category A to Other Categories',
'From other categories to category A',
'New to category A',
'Removed from category A'
)
Count (No. of products) :
PICK(MATCH(VALUELIST('1','2','3','4'),'1','2','3','4'),
count({<Concat_Categorie={'A,B'}>} ProdId),
count({<Concat_Categorie={'B,A'}>} ProdId),
count({<Concat_Categorie={'A'},MaxDate={"$(=Date#(Max(MaxDate),'M/DD/YYYY'))"}>} ProdId),
count({<Concat_Categorie={'A'},MaxDate={"$(=Date#(Min(MaxDate),'M/DD/YYYY'))"}>} ProdId)
)
='Total Amount (on '& Date(Max(MaxDate),'M/DD')&')' :
PICK(MATCH(VALUELIST('1','2','3','4'),'1','2','3','4'),
Sum({<Concat_Categorie={'A,B'}>} LastAmount),
Sum({<Concat_Categorie={'B,A'}>} LastAmount),
Sum({<Concat_Categorie={'A'},MaxDate={"$(=Date#(Max(MaxDate),'M/DD/YYYY'))"}>} LastAmount),
0
)
Results :
One solution is.
tab1:
LOAD RowNo() As RowID,* INLINE [
ProdId, Date, Category, Amount
101, 3/31/2020, A, 100
102, 3/31/2020, B, 150
103, 3/31/2020, A, 200
105, 3/31/2020, A, 120
101, 4/30/2020, A, 150
102, 4/30/2020, A, 200
103, 4/30/2020, B, 110
104, 4/30/2020, A, 180
];
Left Join(tab1)
LOAD ProdId, Concat(Category,'', Date) As Str, FirstSortedValue(Category, Date) As Val1, FirstSortedValue(Category, -Date) As Val2
Resident tab1
Group By ProdId
;
Left Join(tab1)
LOAD Date(Min(Date)) As MinDt, Date(Max(Date)) As MaxDt
Resident tab1;
tab2:
LOAD *, Pick(Match(-1, Str Like 'A?*' And Val2<>'A', Str Like '?*A' And Val1<>'A', Str='A' And Date=MaxDt, Str='A' And Date=MinDt), 'Group1', 'Group2', 'Group3', 'Group4') As Group
, If(Date=MaxDt, Amount, 0) As FinalAmount
Resident tab1;
tab3:
LOAD Group, Concat(DISTINCT ProdId) As Group_Str, Count(DISTINCT ProdId) As Group_Cnt
, Sum(FinalAmount) As [Total Amount (on 4/30)]
Resident tab2
Group By Group
;
Drop Table tab1, tab2;
Thank you both very much!
One more version.
tab1:
LOAD * INLINE [
ProdId, Date, Category, Amount
101, 3/31/2020, A, 100
102, 3/31/2020, B, 150
103, 3/31/2020, A, 200
105, 3/31/2020, A, 120
101, 4/30/2020, A, 150
102, 4/30/2020, A, 200
103, 4/30/2020, B, 110
104, 4/30/2020, A, 180
];
tab2:
LOAD ProdId Resident tab1;
Right Join (tab2)
LOAD Date Resident tab1;
Join(tab1)
LOAD * Resident tab2;
Left Join(tab1)
LOAD ProdId, Concat(DISTINCT If(IsNull(Category),'X',Category),'', Date) As Str
Resident tab1
Group By ProdId
;
Left Join(tab1)
LOAD Date(Max(Date)) As MaxDt
Resident tab1;
tab3:
LOAD *, 'Group'&Match(Str, 'AB', 'BA', 'XA', 'AX') As Group
, If(Date=MaxDt, Amount, 0) As FinalAmount
Resident tab1;
tab4:
LOAD Group, Concat(DISTINCT ProdId) As Group_Str, Count(DISTINCT ProdId) As Group_Cnt
, Sum(DISTINCT FinalAmount) As [Total Amount (on 4/30)]
Resident tab3
Where Group <> 'Group0'
Group By Group
;
Drop Table tab1, tab2, tab3;