Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello Experts,
I will need some help for the ETL process of some dataset. I can't share the actual data , but I have the sample dataset like blow:
Input data : abc.csv
ID | process | timestamp | level | businessdate | |
1001 | ID | process | timestamp | level | businessdate |
1001 | 1001 | AAA | 31-10-2023 20:10 | 3 | 31-10-2023 |
1001 | 1001 | BBB | 31-10-2023 20:25 | 4 | 31-10-2023 |
1001 | 1001 | CCC | 31-10-2023 20:30 | 6 | 31-10-2023 |
1001 | 1001 | DDD | 31-10-2023 20:14 | 5 | 31-10-2023 |
1001 | 1001 | EEE | 31-10-2023 20:22 | 5 | 31-10-2023 |
1001 | 1005 | AAA | 30-10-2023 19:35 | 5 | 30-10-2023 |
1001 | 1005 | BBB | 01-11-2023 20:15 | 4 | 01-11-2023 |
1001 | 1005 | CCC | 01-11-2023 21:20 | 6 | 01-11-2023 |
1001 | DDD | 30-10-2023 20:30 | 4 | 30-10-2023 | |
1001 | EEE | 31-10-2023 20:22 | 5 | 31-10-2023 | |
1001 | EEE | 29-10-2023 21:16 | 3 | 29-10-2023 | |
1001 | EEE | 30-10-2023 20:30 | 4 | 30-10-2023 | |
1001 | EEE | 28-10-2023 22:10 | 2 | 28-10-2023 | |
1005 | AAA | 30-10-2023 19:35 | 5 | 30-10-2023 | |
1005 | BBB | 31-10-2023 20:15 | 3 | 31-10-2023 | |
1005 | BBB | 01-11-2023 20:15 | 4 | 01-11-2023 | |
1005 | BBB | 30-10-2023 20:10 | 2 | 30-10-2023 | |
1005 | CCC | 01-11-2023 21:20 | 6 | 01-11-2023 | |
1005 | CCC | 31-10-2023 20:13 | 5 | 31-10-2023 |
first step is to sort the data based on 4 columns,
ID: Asc
Process: Asc
Timestamp : newest to oldest
level : Asc
ID | process | timestamp | level | businessdate |
1001 | AAA | 31-10-2023 20:10 | 3 | 31-10-2023 |
1001 | AAA | 30-10-2023 20:30 | 4 | 30-10-2023 |
1001 | AAA | 29-10-2023 23:40 | 3 | 29-10-2023 |
1001 | BBB | 31-10-2023 20:25 | 4 | 31-10-2023 |
1001 | BBB | 30-10-2023 20:15 | 5 | 30-10-2023 |
1001 | CCC | 31-10-2023 20:30 | 6 | 31-10-2023 |
1001 | CCC | 30-10-2023 20:22 | 5 | 30-10-2023 |
1001 | CCC | 29-10-2023 23:42 | 4 | 29-10-2023 |
1001 | DDD | 31-10-2023 20:14 | 5 | 31-10-2023 |
1001 | DDD | 30-10-2023 20:30 | 4 | 30-10-2023 |
1001 | EEE | 31-10-2023 20:22 | 5 | 31-10-2023 |
1001 | EEE | 30-10-2023 20:30 | 4 | 30-10-2023 |
1001 | EEE | 29-10-2023 21:16 | 3 | 29-10-2023 |
1001 | EEE | 28-10-2023 22:10 | 2 | 28-10-2023 |
1005 | AAA | 30-10-2023 19:35 | 5 | 30-10-2023 |
1005 | BBB | 01-11-2023 20:15 | 4 | 01-11-2023 |
1005 | BBB | 31-10-2023 20:15 | 3 | 31-10-2023 |
1005 | BBB | 30-10-2023 20:10 | 2 | 30-10-2023 |
1005 | CCC | 01-11-2023 21:20 | 6 | 01-11-2023 |
1005 | CCC | 31-10-2023 20:13 | 5 | 31-10-2023 |
The next step is to get the final output, by getting only the latest timestamp for the records where ID and process are the same.
ID | process | timestamp | level | businessdate |
1001 | AAA | 31-10-2023 20:10 | 3 | 31-10-2023 |
1001 | BBB | 31-10-2023 20:25 | 4 | 31-10-2023 |
1001 | CCC | 31-10-2023 20:30 | 6 | 31-10-2023 |
1001 | DDD | 31-10-2023 20:14 | 5 | 31-10-2023 |
1001 | EEE | 31-10-2023 20:22 | 5 | 31-10-2023 |
1005 | AAA | 30-10-2023 19:35 | 5 | 30-10-2023 |
1005 | BBB | 01-11-2023 20:15 | 4 | 01-11-2023 |
1005 | CCC | 01-11-2023 21:20 | 6 | 01-11-2023 |
Output:
LOAD
ID,
process,
timestamp,
level,
business_date
FROM [abc.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
NoConcatenate
Final_Data:
Load * ,
rowno() as linenum
Resident Output
Order By ID asc, process asc,timestamp desc, level asc
;
Drop Table Output;
NoConcatenate
Final_table:
load Min(linenum),
ID,
process,
level,
timestamp,
resident Final_Data
group by ID,process;
drop table Final_Data;
The final step is not working and showing invalid expression error during load. the first of sorting is working. it might be a very novice mistake, but thanks in advance in case you can provide me a solution.
sorry, I didn't mention I have more columns in the actual table and I can't use all of those in the group by clause
Assuming you r raw data has been loaded into a table named "Data", I think all you need is:
Inner Join (Data)
LOAD ID, process,
max(timestamp) as timestamp
Resident Data
Group by ID, process;
-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com
sorry, I didn't mention I have more columns in the actual table and I can't use all of those in the group by clause
"can't use all of those in the group by clause"
Why not?
-Rob