Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
sjhussain
Partner - Creator II
Partner - Creator II

Group by and difference of previous record

Hello,

We are working on a project where we are required to find the difference of previous values of a column group by TUN_ID and sort by Date

1. We have about 400 Million rows and increasing

2. We would like to just load the Table from SQL as it is - and then apply the ETL in Qlikview (Group and difference)

2. First we need to group by log_tun_id and then sort by log_datetimeindata

3. we would like to add another field by the name of diff_distance which would contain the difference between the mileage from previous row. 

For example the table with the group  is as below

log_tun_idlog_datetimeindatalog_speedlog_mileage
16027201501081725127016311.3
16027201501081726127416312.51
16027201501081727127816313.79
16027201501081728127416315.07
16027201501081729127516316.35
16027201501081730127416317.6
16027201501081731126816318.77
16027201501081732125616319.89

The resultant table is as follows with the difference of mileage from the previous row.  First row of each TUN_ID will be zero.

     

log_tun_idlog_datetimeindatalog_speedlog_mileagediff_distance
16027201501081725127016311.30
16027201501081726127416312.511.21
16027201501081727127816313.791.28
16027201501081728127416315.071.28
16027201501081729127516316.351.28
16027201501081730127416317.61.25
16027201501081731126816318.771.17
16027201501081732125616319.891.12

Will appreciate if someone can assist us with this.

Thanks.

1 Solution

Accepted Solutions
sjhussain
Partner - Creator II
Partner - Creator II
Author

Massimo,

Thank you very much for your detailed response.

"The script is simple but the execution time could be your problem as you already have 400M rows and you need to read it (SQL from a db?, a lot of time) and then you need a resident load for order by (other time)"


It took me about 1hours and 40 min to load the 400M rows as it is and the generated QVD is about 14GB.  I did not apply any ETL while getting the data.  I need to check with the customer the frequency of reload.  Maybe every 30 mins for which I would need to do incremental loading.


If we do direct load then would it help? can we perform etl on direct load data? or we need to use set analysis?


I will try your solution and see how it works.


if(log_tun_id=peek(log_tun_id), log_mileage-Peek(log_mileage), 0) as diff_distance

Resident source

order by log_tun_id, log_datetimeindata;

Thanks again.

View solution in original post

9 Replies
fvelascog72
Partner - Specialist
Partner - Specialist

Hi,

Try this:

Data:

LOAD

*,

if(RowNo()=1,0,log_mileage-Peek(log_mileage)) as "diff_distance"

Inline [

log_tun_id, log_datetimeindata, log_speed, log_mileage

16027, 20150108172512, 70, 16311.3

16027, 20150108172612, 74, 16312.51

16027, 20150108172712, 78, 16313.79

16027, 20150108172812, 74, 16315.07

16027, 20150108172912, 75, 16316.35

16027, 20150108173012, 74, 16317.6

16027, 20150108173112, 68, 16318.77

16027, 20150108173212, 56, 16319.89

];

MayilVahanan

Hi

Try like this

Load *, Num(if(log_tun_id = Previous(log_tun_id), log_mileage- Previous(log_mileage), 0),'#.##') as diff_distance

from datasource;

Hope you need to get the "diff_distance" field alone.

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
sjhussain
Partner - Creator II
Partner - Creator II
Author

Federico and Mayil,

Thank you for your prompt response.  Following is the original Table.  First i need to group the log_tun_id and then i need to apply the previous distance formula.

The example I gave above was for 16027.  I need to apply it for all the log_tun_id.  Hope it is much clearer now?

log_tun_idlog_datetimeindatalog_speedlog_mileage
16027201501081725127016311.3
163402015010817251206251.9
108172015010817021833142542.37
5211201501081725100209899.84
228452015010816535508629.42
143812015010817183703428.19
54302015010817251029312919.84
1176201501081725080222902.56
maxgro
MVP
MVP

1. We have about 400 Million rows and increasing

2. We would like to just load the Table from SQL as it is - and then apply the ETL in Qlikview (Group and difference)

2. First we need to group by log_tun_id and then sort by log_datetimeindata

3. we would like to add another field by the name of diff_distance which would contain the difference between the mileage from previous row.


Regarding 2 if you group by log_tun_id the result is 1 row for each log_tun_id, it seems different from your result table


To order by log_datetimeindata you need a resident loada (see below)


The script is simple but the execution time could be your problem as you already have 400M rows and you need to read it (SQL from a db?, a lot of time) and then you need a resident load for order by (other time)


source:

LOAD

*

Inline [

log_tun_id, log_datetimeindata, log_speed, log_mileage

16027, 20150108172512, 70, 16311.3

16027, 20150108172612, 74, 16312.51

16027, 20150108172712, 78, 16313.79

16027, 20150108172812, 74, 16315.07

16027, 20150108172912, 75, 16316.35

16027, 20150108173012, 74, 16317.6

16027, 20150108173112, 68, 16318.77

16027, 20150108173212, 56, 16319.89

1, 20150108172512, 70, 16311.3

1, 20150108172612, 74, 16312.51

1, 20150108172712, 78, 16313.79

1, 20150108172812, 74, 16315.07

1, 20150108172912, 75, 16316.35

1, 20150108173012, 74, 16317.6

1, 20150108173112, 68, 16318.77

1, 20150108173212, 56, 16319.89

];

final:

load

  *,

  if(log_tun_id=peek(log_tun_id), log_mileage-Peek(log_mileage), 0) as diff_distance

Resident source

order by log_tun_id, log_datetimeindata;

DROP Table source;

MayilVahanan

Hi

You can use

Result:

LOAD log_tun_id,log_datetimeindata,log_speed,log_mileage, Num(if(log_tun_id = Previous(log_tun_id), log_mileage- Previous(log_mileage), 0),'#.##') as diff_distance

Resident DataSource

Order by log_tun_id,log_datetimeindata;

Drop table DataSource;

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
sjhussain
Partner - Creator II
Partner - Creator II
Author

Massimo,

Thank you very much for your detailed response.

"The script is simple but the execution time could be your problem as you already have 400M rows and you need to read it (SQL from a db?, a lot of time) and then you need a resident load for order by (other time)"


It took me about 1hours and 40 min to load the 400M rows as it is and the generated QVD is about 14GB.  I did not apply any ETL while getting the data.  I need to check with the customer the frequency of reload.  Maybe every 30 mins for which I would need to do incremental loading.


If we do direct load then would it help? can we perform etl on direct load data? or we need to use set analysis?


I will try your solution and see how it works.


if(log_tun_id=peek(log_tun_id), log_mileage-Peek(log_mileage), 0) as diff_distance

Resident source

order by log_tun_id, log_datetimeindata;

Thanks again.

sjhussain
Partner - Creator II
Partner - Creator II
Author

Mayil,

Thanks for the response.  I will try the solution which you mentioned and will let you know.

BTW is there a difference between Peek() and Previous() ? - as Massimo suggested to use Peek()

Thanks again.

maxgro
MVP
MVP

I don't know direct load.

Do you mean incremental load? Usually yes, if you can do it.

Or optimized load (qvd)? Yes.