Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
We are working on a project where we are required to find the difference of previous values of a column group by TUN_ID and sort by Date
1. We have about 400 Million rows and increasing
2. We would like to just load the Table from SQL as it is - and then apply the ETL in Qlikview (Group and difference)
2. First we need to group by log_tun_id and then sort by log_datetimeindata
3. we would like to add another field by the name of diff_distance which would contain the difference between the mileage from previous row.
For example the table with the group is as below
log_tun_id | log_datetimeindata | log_speed | log_mileage |
16027 | 20150108172512 | 70 | 16311.3 |
16027 | 20150108172612 | 74 | 16312.51 |
16027 | 20150108172712 | 78 | 16313.79 |
16027 | 20150108172812 | 74 | 16315.07 |
16027 | 20150108172912 | 75 | 16316.35 |
16027 | 20150108173012 | 74 | 16317.6 |
16027 | 20150108173112 | 68 | 16318.77 |
16027 | 20150108173212 | 56 | 16319.89 |
The resultant table is as follows with the difference of mileage from the previous row. First row of each TUN_ID will be zero.
log_tun_id | log_datetimeindata | log_speed | log_mileage | diff_distance |
16027 | 20150108172512 | 70 | 16311.3 | 0 |
16027 | 20150108172612 | 74 | 16312.51 | 1.21 |
16027 | 20150108172712 | 78 | 16313.79 | 1.28 |
16027 | 20150108172812 | 74 | 16315.07 | 1.28 |
16027 | 20150108172912 | 75 | 16316.35 | 1.28 |
16027 | 20150108173012 | 74 | 16317.6 | 1.25 |
16027 | 20150108173112 | 68 | 16318.77 | 1.17 |
16027 | 20150108173212 | 56 | 16319.89 | 1.12 |
Will appreciate if someone can assist us with this.
Thanks.
Massimo,
Thank you very much for your detailed response.
"The script is simple but the execution time could be your problem as you already have 400M rows and you need to read it (SQL from a db?, a lot of time) and then you need a resident load for order by (other time)"
It took me about 1hours and 40 min to load the 400M rows as it is and the generated QVD is about 14GB. I did not apply any ETL while getting the data. I need to check with the customer the frequency of reload. Maybe every 30 mins for which I would need to do incremental loading.
If we do direct load then would it help? can we perform etl on direct load data? or we need to use set analysis?
I will try your solution and see how it works.
if(log_tun_id=peek(log_tun_id), log_mileage-Peek(log_mileage), 0) as diff_distance
Resident source
order by log_tun_id, log_datetimeindata;
Thanks again.
Hi,
Try this:
Data:
LOAD
*,
if(RowNo()=1,0,log_mileage-Peek(log_mileage)) as "diff_distance"
Inline [
log_tun_id, log_datetimeindata, log_speed, log_mileage
16027, 20150108172512, 70, 16311.3
16027, 20150108172612, 74, 16312.51
16027, 20150108172712, 78, 16313.79
16027, 20150108172812, 74, 16315.07
16027, 20150108172912, 75, 16316.35
16027, 20150108173012, 74, 16317.6
16027, 20150108173112, 68, 16318.77
16027, 20150108173212, 56, 16319.89
];
Hi
Try like this
Load *, Num(if(log_tun_id = Previous(log_tun_id), log_mileage- Previous(log_mileage), 0),'#.##') as diff_distance
from datasource;
Hope you need to get the "diff_distance" field alone.
Federico and Mayil,
Thank you for your prompt response. Following is the original Table. First i need to group the log_tun_id and then i need to apply the previous distance formula.
The example I gave above was for 16027. I need to apply it for all the log_tun_id. Hope it is much clearer now?
log_tun_id | log_datetimeindata | log_speed | log_mileage |
16027 | 20150108172512 | 70 | 16311.3 |
16340 | 20150108172512 | 0 | 6251.9 |
10817 | 20150108170218 | 33 | 142542.37 |
5211 | 20150108172510 | 0 | 209899.84 |
22845 | 20150108165355 | 0 | 8629.42 |
14381 | 20150108171837 | 0 | 3428.19 |
5430 | 20150108172510 | 29 | 312919.84 |
1176 | 20150108172508 | 0 | 222902.56 |
1. We have about 400 Million rows and increasing
2. We would like to just load the Table from SQL as it is - and then apply the ETL in Qlikview (Group and difference)
2. First we need to group by log_tun_id and then sort by log_datetimeindata
3. we would like to add another field by the name of diff_distance which would contain the difference between the mileage from previous row.
Regarding 2 if you group by log_tun_id the result is 1 row for each log_tun_id, it seems different from your result table
To order by log_datetimeindata you need a resident loada (see below)
The script is simple but the execution time could be your problem as you already have 400M rows and you need to read it (SQL from a db?, a lot of time) and then you need a resident load for order by (other time)
source:
LOAD
*
Inline [
log_tun_id, log_datetimeindata, log_speed, log_mileage
16027, 20150108172512, 70, 16311.3
16027, 20150108172612, 74, 16312.51
16027, 20150108172712, 78, 16313.79
16027, 20150108172812, 74, 16315.07
16027, 20150108172912, 75, 16316.35
16027, 20150108173012, 74, 16317.6
16027, 20150108173112, 68, 16318.77
16027, 20150108173212, 56, 16319.89
1, 20150108172512, 70, 16311.3
1, 20150108172612, 74, 16312.51
1, 20150108172712, 78, 16313.79
1, 20150108172812, 74, 16315.07
1, 20150108172912, 75, 16316.35
1, 20150108173012, 74, 16317.6
1, 20150108173112, 68, 16318.77
1, 20150108173212, 56, 16319.89
];
final:
load
*,
if(log_tun_id=peek(log_tun_id), log_mileage-Peek(log_mileage), 0) as diff_distance
Resident source
order by log_tun_id, log_datetimeindata;
DROP Table source;
Hi
You can use
Result:
LOAD log_tun_id,log_datetimeindata,log_speed,log_mileage, Num(if(log_tun_id = Previous(log_tun_id), log_mileage- Previous(log_mileage), 0),'#.##') as diff_distance
Resident DataSource
Order by log_tun_id,log_datetimeindata;
Drop table DataSource;
Massimo,
Thank you very much for your detailed response.
"The script is simple but the execution time could be your problem as you already have 400M rows and you need to read it (SQL from a db?, a lot of time) and then you need a resident load for order by (other time)"
It took me about 1hours and 40 min to load the 400M rows as it is and the generated QVD is about 14GB. I did not apply any ETL while getting the data. I need to check with the customer the frequency of reload. Maybe every 30 mins for which I would need to do incremental loading.
If we do direct load then would it help? can we perform etl on direct load data? or we need to use set analysis?
I will try your solution and see how it works.
if(log_tun_id=peek(log_tun_id), log_mileage-Peek(log_mileage), 0) as diff_distance
Resident source
order by log_tun_id, log_datetimeindata;
Thanks again.
Mayil,
Thanks for the response. I will try the solution which you mentioned and will let you know.
BTW is there a difference between Peek() and Previous() ? - as Massimo suggested to use Peek()
Thanks again.
I don't know direct load.
Do you mean incremental load? Usually yes, if you can do it.
Or optimized load (qvd)? Yes.