Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
In the script, I have to calculate a cumulative sum with a condition. The conditions is something like
“cumulate column VALUE within each YEAR but for each variant of CONDITION put only the one into the cumulation, that was loaded last (or with the highest ORDER number equivalently)”.
My input and desired output looks like this, INPUT is in Table1. (sorry for the ugly format. I add a PNG with a formatted table picture.
INPUT | INPUT | INPUT | INPUT | INPUT | OUTPUT | |
YEAR | ORDER | ITEM | CONDITION | VALUE | CUMULATED | |
2020 | 1 | A | XX | 3 | 3 | |
2020 | 2 | B | YY | 5 | 8 | |
2020 | 3 | C | ZZ | 4 | 12 | |
2021 | 1 | A | XX | 3 | 3 | |
2021 | 2 | B | YY | 5 | 8 | |
2021 | 3 | C | ZZ | 4 | 12 | |
2021 | 4 | D | UU | 7 | 19 | |
2021 | 5 | E | TT | 5 | 24 | |
2022 | 1 | C | ZZ | 4 | 4 | |
2022 | 2 | D | UU | 7 | 11 | |
2022 | 3 | E | TT | 5 | 16 | |
2022 | 4 | F | UU | 8 | 17 | 16+8-7 |
2022 | 5 | G | ZZ | 5 | 18 | 17+5-4 |
2022 | 6 | H | TT | 9 | 22 | 18+9-5 |
2022 | 7 | J | UU | 9 | 23 | 22+9-8 |
After sorting the input Table1 by YEAR and ORDER, I used a cumulation with peek. That worked but did not consider the condition.
Table2:
Load
YEAR,
ORDER,
ITEM,
CONDITION,
VALUE,
if(YEAR =Previous(YEAR) ,
//then
VALUE+peek(VALUE_cumulative),
//else
VALUE
)
as VALUE_cumulative
Resident Table1;
How can I apply the condition as reflected in the last 4 rows of the table above?
Hi,
You can optimize the code. I left it long enough to explain the approach.
But it's a nice challenge, thank you.
Data:
LOAD (YEAR&'_'&CONDITION&'_'&ORDER) as autoID,* inline [
YEAR, ORDER, ITEM, CONDITION, VALUE
2020, 1, A, XX, 3
2020, 2, B, YY, 5
2020, 3, C, ZZ, 4
2021, 1, A, XX, 3
2021, 2, B, YY, 5
2021, 3, C, ZZ, 4
2021, 4, D, UU, 7
2021, 5, E, TT, 5
2022, 1, C, ZZ, 4
2022, 2, D, UU, 7
2022, 3, E, TT, 5
2022, 4, F, UU, 8
2022, 5, G, ZZ, 5
2022, 6, H, TT, 9
2022, 7, J, UU, 9
];
t:
noconcatenate
load * resident Data order by autoID;
Table2:
noconcatenate
Load *,
if(CONDITION =Previous(CONDITION) ,VALUE-peek(VALUE),VALUE) as VALUE_
Resident t;
drop table t;
left join load * resident Data;
drop table Data;
tempoutput:
noconcatenate
load rowno() as ID,* resident Table2 order by YEAR,ORDER;
drop table Table2;
output:
noconcatenate
Load *,
if(YEAR =Previous(YEAR) ,VALUE_+peek(VALUE_cumulative),VALUE_) as VALUE_cumulative
Resident tempoutput;
drop table tempoutput;
output:
Hi,
You can optimize the code. I left it long enough to explain the approach.
But it's a nice challenge, thank you.
Data:
LOAD (YEAR&'_'&CONDITION&'_'&ORDER) as autoID,* inline [
YEAR, ORDER, ITEM, CONDITION, VALUE
2020, 1, A, XX, 3
2020, 2, B, YY, 5
2020, 3, C, ZZ, 4
2021, 1, A, XX, 3
2021, 2, B, YY, 5
2021, 3, C, ZZ, 4
2021, 4, D, UU, 7
2021, 5, E, TT, 5
2022, 1, C, ZZ, 4
2022, 2, D, UU, 7
2022, 3, E, TT, 5
2022, 4, F, UU, 8
2022, 5, G, ZZ, 5
2022, 6, H, TT, 9
2022, 7, J, UU, 9
];
t:
noconcatenate
load * resident Data order by autoID;
Table2:
noconcatenate
Load *,
if(CONDITION =Previous(CONDITION) ,VALUE-peek(VALUE),VALUE) as VALUE_
Resident t;
drop table t;
left join load * resident Data;
drop table Data;
tempoutput:
noconcatenate
load rowno() as ID,* resident Table2 order by YEAR,ORDER;
drop table Table2;
output:
noconcatenate
Load *,
if(YEAR =Previous(YEAR) ,VALUE_+peek(VALUE_cumulative),VALUE_) as VALUE_cumulative
Resident tempoutput;
drop table tempoutput;
output:
Thank you so much !!!
The solution is really nice and with the script copied easy to understand. You made my day 🙂
PS:
One open question... What do we need the rowNo() as ID for in the final table? Is it just to control the sequence of rows visually?
Yes
In my opinion your solution doesn't care the case where the condition remain the same exactly when the year change !
Hy Michael_Klix,
In my opinion the problem is this instruction:
if(CONDITION = Previous(CONDITION) ,VALUE-peek(VALUE),VALUE) as VALUE_
Actually the value of field VALUE_, becomes VALUE-peek(VALUE) if CONDITION = Previous(CONDITION), even if the year changes too.
if the condition remain the same when the year changes, the value of field VALUE_ should be VALUE and not VALUE-peek(VALUE) because as you said, the years are independent !