Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
TimurKhayrullin
Contributor II
Contributor II

row-by-row filtered Difference calculation with advanced Aggregation in load script

Hello, recently I've run into issue where I have a dataset with 2 info fields and 1 measure field, and for each row I need to calculate the difference between the current row's measure and the row with the same info fields but the NEXT BIGGEST measure, if no next biggest measure exists, the cell is left empty. This is an example the data I'm starting with:

TimurKhayrullin_0-1660947655436.png

and this is what I'm aiming for:

TimurKhayrullin_1-1660947753972.png

For example, the row "A, D, 3" has Diff value 1 because the next biggest row value filtered for fields 1 and 2 is "A, D, 4", and 4-3 is 1. 

My Approach has been to try and generate an intermediary column "nextMeasure" that finds the next biggest measure in accordance to the filters, so then the desired column is a simple subtraction:

TimurKhayrullin_2-1660948079940.png

EDIT: my apologies, I made a small mistake that invalidates this example. The following is an ammended version with proper difference values (shown in red):

TimurKhayrullin_1-1661193057237.png

After digging extensively online I have come up a "pseudo-expression" that (almost) works to generate this "nextMeasure" column:  FirstSortedValue(Measure, Aggr(if(Measure > [current row's measure], 1, 0), Field 2, Field 1)) as nextMeasure

In my view there are 4 problems with my expression

1) Aggr can't be used in a load script

2) I don't know how to properly reference the current row's measure in an if statement

3) I don't know if the IF statement is the appropriate to handle the aggregation

4) I don't know how to leave the cell blank if no next biggest measure is found

Could anyone help me generate the desired column? Any advice helps, it doesn't have to use my methods.

I've attached a QVD file with my input data as well.

 

Labels (3)
1 Solution

Accepted Solutions
TimurKhayrullin
Contributor II
Contributor II
Author

After editing the proposed solution I came up with a load script that fixes the error. It ended up being much simpler. Nonetheless I must thank @Digvijay_Singh for the approach! Without it I'd be lost. Here's the load script in full:

source:
Load * inline [
Field1 Field2 Measure
A D 4
B E 10
A D 3
A F 1
C D 8
B E 5
B F 4
C D 3
C D 9
C F 15
A F 13
](delimiter is spaces)
;

NoConcatenate

Final1:
Load *
resident source
Order By Field1, Field2, Measure DESC
;

NoConcatenate
Final2:
Load *,
if(Field1=Previous(Field1) and Field2=Previous(Field2) and Previous(Measure)>Measure, Previous(Measure), 'NA') as nextMeasure,
if(Field1=Previous(Field1) and Field2=Previous(Field2) and Previous(Measure)>Measure, Previous(Measure)-Measure, 'NA') as Diff
resident Final1;

drop table source,Final1;

And here is the result:

TimurKhayrullin_0-1661194638197.png

I've also attached the .qvf with the results.

View solution in original post

4 Replies
Digvijay_Singh

If you need to do this in script then may be something like this, sorting records in a specific order helps in this approach.

source:
Load * inline [
Field1 Field2 Measure
A D 4
B E 10
A D 3
A F 1
C D 8
B E 5
B F 4
C D 3
C D 9
C F 15
A F 13
](delimiter is spaces)
;

NoConcatenate

Final1:
Load *
resident source
Order By Field1, Field2, Measure DESC
;

NoConcatenate
Final2:
Load *,
if(Field1=Previous(Field1) and Field2=Previous(Field2) and peek(MaxMeasure)>Measure, peek(MaxMeasure)-Measure, 'NA') as Diff
;
Load *,
if(Field1=Previous(Field1) and Field2=Previous(Field2) and len(peek(MaxMeasure))<>0, Peek(MaxMeasure),Measure) as MaxMeasure
resident Final1;
;


drop table source,Final1;

 

Digvijay_Singh_0-1661036781435.png

 

TimurKhayrullin
Contributor II
Contributor II
Author

Hi Digvijay, after trying your solution I realized I does not do quite what I need. However, this is completely my fault as I made a mistake in my example. Here is the ammended final table:

TimurKhayrullin_0-1661193169782.png

 

your solution is close, but it relates the current to the MAX of the required filters. I just need the NEXT BIGGEST. In my example, "C,D,3" should have diff value 5, not 6. Yours gives the value of 6. How could the solution be changed to give the desired outcomes? 

TimurKhayrullin
Contributor II
Contributor II
Author

After editing the proposed solution I came up with a load script that fixes the error. It ended up being much simpler. Nonetheless I must thank @Digvijay_Singh for the approach! Without it I'd be lost. Here's the load script in full:

source:
Load * inline [
Field1 Field2 Measure
A D 4
B E 10
A D 3
A F 1
C D 8
B E 5
B F 4
C D 3
C D 9
C F 15
A F 13
](delimiter is spaces)
;

NoConcatenate

Final1:
Load *
resident source
Order By Field1, Field2, Measure DESC
;

NoConcatenate
Final2:
Load *,
if(Field1=Previous(Field1) and Field2=Previous(Field2) and Previous(Measure)>Measure, Previous(Measure), 'NA') as nextMeasure,
if(Field1=Previous(Field1) and Field2=Previous(Field2) and Previous(Measure)>Measure, Previous(Measure)-Measure, 'NA') as Diff
resident Final1;

drop table source,Final1;

And here is the result:

TimurKhayrullin_0-1661194638197.png

I've also attached the .qvf with the results.

Digvijay_Singh

Great! In fact my first test had this output from my script but than I updated it further to match with your output 🙂