Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
s2016
Contributor III
Contributor III

Data transformation issue

Hi, 

I am trying to figure out a solution to one of the data transformation problems we are facing. 

We have 30-min scan periods in a day at which the data is collected. However, the time values in the our source tables has time duplicates which we need to fix in Qlik.

We earlier suspected that there are 2 records per time entry, in which case we can increment the second scan time by 30-min to overcome the duplicate issue. I could use the PEEK() function to increment the TS value.

 However, we noticed that some sample times are not incrementing by an hour and repeat the same value for 4 scans (or more) instead of 2, e.g) at time 12:03.  This is random and can occur at any time for a groupcode.

I thought I could use the autonumber() function to create a counter to calculate the ts_expected value, but since there are close to 2 million records, it’s taking about an hour to transform the data for one day which we cannot afford.

Can you think of a better way to do this?

groupcode

est_date

time

ts_raw

ts_expected

Traffic

T-CVIL-GGPE-SA

20181022

0002

1

1

2.5

T-CVIL-GGPE-SA

20181022

0002

1

2

2.37

T-CVIL-GGPE-SA

20181022

0102

3

3

0.78

T-CVIL-GGPE-SA

20181022

0102

3

4

0.48

T-CVIL-GGPE-SA

20181022

0202

5

5

0.53

T-CVIL-GGPE-SA

20181022

0202

5

6

0.51

T-CVIL-GGPE-SA

20181022

0300

7

7

0.43

T-CVIL-GGPE-SA

20181022

0300

7

8

0.47

T-CVIL-GGPE-SA

20181022

0401

9

9

0.47

T-CVIL-GGPE-SA

20181022

0401

9

10

0.33

T-CVIL-GGPE-SA

20181022

0500

11

11

0.57

T-CVIL-GGPE-SA

20181022

0500

11

12

0.46

T-CVIL-GGPE-SA

20181022

0600

13

13

0.44

T-CVIL-GGPE-SA

20181022

0600

13

14

0.64

T-CVIL-GGPE-SA

20181022

0702

15

15

0.51

T-CVIL-GGPE-SA

20181022

0702

15

16

0.91

T-CVIL-GGPE-SA

20181022

0803

17

17

0.72

T-CVIL-GGPE-SA

20181022

0803

17

18

0.47

T-CVIL-GGPE-SA

20181022

0901

19

19

0.51

T-CVIL-GGPE-SA

20181022

0901

19

20

0.42

T-CVIL-GGPE-SA

20181022

1001

21

21

0.5

T-CVIL-GGPE-SA

20181022

1001

21

22

0.87

T-CVIL-GGPE-SA

20181022

1203

25

23

0.65

T-CVIL-GGPE-SA

20181022

1203

25

24

1.66

T-CVIL-GGPE-SA

20181022

1203

25

25

1.77

T-CVIL-GGPE-SA

20181022

1203

25

26

0.6

T-CVIL-GGPE-SA

20181022

1403

29

27

0.81

T-CVIL-GGPE-SA

20181022

1403

29

28

0.98

T-CVIL-GGPE-SA

20181022

1403

29

29

1.22

T-CVIL-GGPE-SA

20181022

1403

29

30

1.95

T-CVIL-GGPE-SA

20181022

1500

31

31

2.17

T-CVIL-GGPE-SA

20181022

1500

31

32

1.48

T-CVIL-GGPE-SA

20181022

1603

33

33

1.54

T-CVIL-GGPE-SA

20181022

1603

33

34

1.38

T-CVIL-GGPE-SA

20181022

1703

35

35

2.21

T-CVIL-GGPE-SA

20181022

1703

35

36

2.28

T-CVIL-GGPE-SA

20181022

1803

37

37

1.06

T-CVIL-GGPE-SA

20181022

1803

37

38

1.14

T-CVIL-GGPE-SA

20181022

1900

39

39

0.98

T-CVIL-GGPE-SA

20181022

1900

39

40

1.53

T-CVIL-GGPE-SA

20181022

2100

43

41

1.25

T-CVIL-GGPE-SA

20181022

2100

43

42

1.26

T-CVIL-GGPE-SA

20181022

2100

43

43

1.87

T-CVIL-GGPE-SA

20181022

2100

43

44

1.46

T-CVIL-GGPE-SA

20181022

2200

45

45

2.34

T-CVIL-GGPE-SA

20181022

2200

45

46

2.05

 

 Thanks

2 Replies
jyothish8807
Master II
Master II

Hi Shashank,

May be something like this:

A:

Load

groupcode & est_date & time as Key,

groupcode,

est_date,

time,

ts_raw,

ts_expected,

Traffic

from <>;

left join

Load

groupcode & est_date &  time as Key,

count(Traffic) as Counter

resident A

group by 

groupcode & est_date & time;

C:

Load

*,

if(counter=1,time,time+ (counter\60)) as NewTime

resident A;

Drop table A;

 

Best Regards,
KC
s2016
Contributor III
Contributor III
Author

Hi KC,

Thanks for your valuable time. This seems like a good approach, but I tried this and the formula

if(counter=1,time,time+ (counter\60)) as NewTime

and it is calculating the NewTime value as Time exactly . 

 

Also, I forgot to clarify earlier that 'ts_raw' is 30-min timeslot and there would be total 48 values. I am using these because it makes the further calculations easier. ts_Expected are my expected timeslot values.