Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am trying to figure out a solution to one of the data transformation problems we are facing.
We have 30-min scan periods in a day at which the data is collected. However, the time values in the our source tables has time duplicates which we need to fix in Qlik.
We earlier suspected that there are 2 records per time entry, in which case we can increment the second scan time by 30-min to overcome the duplicate issue. I could use the PEEK() function to increment the TS value.
However, we noticed that some sample times are not incrementing by an hour and repeat the same value for 4 scans (or more) instead of 2, e.g) at time 12:03. This is random and can occur at any time for a groupcode.
I thought I could use the autonumber() function to create a counter to calculate the ts_expected value, but since there are close to 2 million records, it’s taking about an hour to transform the data for one day which we cannot afford.
Can you think of a better way to do this?
groupcode | est_date | time | ts_raw | ts_expected | Traffic |
T-CVIL-GGPE-SA | 20181022 | 0002 | 1 | 1 | 2.5 |
T-CVIL-GGPE-SA | 20181022 | 0002 | 1 | 2 | 2.37 |
T-CVIL-GGPE-SA | 20181022 | 0102 | 3 | 3 | 0.78 |
T-CVIL-GGPE-SA | 20181022 | 0102 | 3 | 4 | 0.48 |
T-CVIL-GGPE-SA | 20181022 | 0202 | 5 | 5 | 0.53 |
T-CVIL-GGPE-SA | 20181022 | 0202 | 5 | 6 | 0.51 |
T-CVIL-GGPE-SA | 20181022 | 0300 | 7 | 7 | 0.43 |
T-CVIL-GGPE-SA | 20181022 | 0300 | 7 | 8 | 0.47 |
T-CVIL-GGPE-SA | 20181022 | 0401 | 9 | 9 | 0.47 |
T-CVIL-GGPE-SA | 20181022 | 0401 | 9 | 10 | 0.33 |
T-CVIL-GGPE-SA | 20181022 | 0500 | 11 | 11 | 0.57 |
T-CVIL-GGPE-SA | 20181022 | 0500 | 11 | 12 | 0.46 |
T-CVIL-GGPE-SA | 20181022 | 0600 | 13 | 13 | 0.44 |
T-CVIL-GGPE-SA | 20181022 | 0600 | 13 | 14 | 0.64 |
T-CVIL-GGPE-SA | 20181022 | 0702 | 15 | 15 | 0.51 |
T-CVIL-GGPE-SA | 20181022 | 0702 | 15 | 16 | 0.91 |
T-CVIL-GGPE-SA | 20181022 | 0803 | 17 | 17 | 0.72 |
T-CVIL-GGPE-SA | 20181022 | 0803 | 17 | 18 | 0.47 |
T-CVIL-GGPE-SA | 20181022 | 0901 | 19 | 19 | 0.51 |
T-CVIL-GGPE-SA | 20181022 | 0901 | 19 | 20 | 0.42 |
T-CVIL-GGPE-SA | 20181022 | 1001 | 21 | 21 | 0.5 |
T-CVIL-GGPE-SA | 20181022 | 1001 | 21 | 22 | 0.87 |
T-CVIL-GGPE-SA | 20181022 | 1203 | 25 | 23 | 0.65 |
T-CVIL-GGPE-SA | 20181022 | 1203 | 25 | 24 | 1.66 |
T-CVIL-GGPE-SA | 20181022 | 1203 | 25 | 25 | 1.77 |
T-CVIL-GGPE-SA | 20181022 | 1203 | 25 | 26 | 0.6 |
T-CVIL-GGPE-SA | 20181022 | 1403 | 29 | 27 | 0.81 |
T-CVIL-GGPE-SA | 20181022 | 1403 | 29 | 28 | 0.98 |
T-CVIL-GGPE-SA | 20181022 | 1403 | 29 | 29 | 1.22 |
T-CVIL-GGPE-SA | 20181022 | 1403 | 29 | 30 | 1.95 |
T-CVIL-GGPE-SA | 20181022 | 1500 | 31 | 31 | 2.17 |
T-CVIL-GGPE-SA | 20181022 | 1500 | 31 | 32 | 1.48 |
T-CVIL-GGPE-SA | 20181022 | 1603 | 33 | 33 | 1.54 |
T-CVIL-GGPE-SA | 20181022 | 1603 | 33 | 34 | 1.38 |
T-CVIL-GGPE-SA | 20181022 | 1703 | 35 | 35 | 2.21 |
T-CVIL-GGPE-SA | 20181022 | 1703 | 35 | 36 | 2.28 |
T-CVIL-GGPE-SA | 20181022 | 1803 | 37 | 37 | 1.06 |
T-CVIL-GGPE-SA | 20181022 | 1803 | 37 | 38 | 1.14 |
T-CVIL-GGPE-SA | 20181022 | 1900 | 39 | 39 | 0.98 |
T-CVIL-GGPE-SA | 20181022 | 1900 | 39 | 40 | 1.53 |
T-CVIL-GGPE-SA | 20181022 | 2100 | 43 | 41 | 1.25 |
T-CVIL-GGPE-SA | 20181022 | 2100 | 43 | 42 | 1.26 |
T-CVIL-GGPE-SA | 20181022 | 2100 | 43 | 43 | 1.87 |
T-CVIL-GGPE-SA | 20181022 | 2100 | 43 | 44 | 1.46 |
T-CVIL-GGPE-SA | 20181022 | 2200 | 45 | 45 | 2.34 |
T-CVIL-GGPE-SA | 20181022 | 2200 | 45 | 46 | 2.05 |
Thanks
Hi Shashank,
May be something like this:
A:
Load
groupcode & est_date & time as Key,
groupcode,
est_date,
time,
ts_raw,
ts_expected,
Traffic
from <>;
left join
Load
groupcode & est_date & time as Key,
count(Traffic) as Counter
resident A
group by
groupcode & est_date & time;
C:
Load
*,
if(counter=1,time,time+ (counter\60)) as NewTime
resident A;
Drop table A;
Hi KC,
Thanks for your valuable time. This seems like a good approach, but I tried this and the formula
if(counter=1,time,time+ (counter\60)) as NewTime
and it is calculating the NewTime value as Time exactly .
Also, I forgot to clarify earlier that 'ts_raw' is 30-min timeslot and there would be total 48 values. I am using these because it makes the further calculations easier. ts_Expected are my expected timeslot values.