Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey all,
i am facing the following challange. Given is the following sample data:
contract_id | contract_year | payment_year | payment |
ABC | 2019 | 2019 | $ 1000 |
ABC | 2019 | 2020 | $ 250 |
ABC | 2019 | 2021 | $ 2000 |
ABC | 2019 | 2023 | $ 250 |
ABC | 2019 | 2024 | $ 2500 |
XYZ | 2021 | 2021 | $ 100 |
XYZ | 2021 | 2022 | $ 250 |
XYZ | 2021 | 2024 | $ 1200 |
These are contracts and in almost every year there is a payment regarding a given contract (>> payment year).
I want to create a pivot table that looks like this (the columns after Year ist the field payment_year):
contract_id | contract_year | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 |
ABC | 2019 | $ 1000 | $ 250 | $ 2000 | $ 2000 | $ 250 | $ 2500 | $ 2500 |
XYZ | 2021 | $ 0 | $ 0 | $ 100 | $ 250 | $ 250 | $ 1200 | $ 1200 |
The challange is as follows:
Is there any way to do this with minimal scripting effort ? i have built a solution by creating fake records for every missing payment year per contract but this a very imperformant (> 90M contracts) and resulted in a lot of script which is hard to maintain. Is there a lean way of achieving what i am after ?
Thanks in advance!
Pretty sure what I'd do is:
1) Get the max year for each contract (append it to the original table as max_year)
left join(Original)
Load contract_id, max(payment_year) as max_year
Resident Original
Group by contract_Id;
2) Use a While loop to make one record per year:
NewTable:
Load contract_Id, contract_year, contract_year + iterno() -1 as payment_year, 0 as payment
Resident Original
While contract_year + iterno() - 1 <= max_year; // Double check this condition, I can't test it at the moment
3) Group by the entire dimension list and sum the payments
Final:
Noconcatenate Load contract_id, contract_year, payment_year, sum(payment) as payment
Resident Original
Group by contract_id, contract_year, payment_year;
4) Drop the tables we don't need anymore:
Drop table Original;
Drop Table NewTable;
It shouldn't be too complicated to fill in the gaps, and it shouldn't do too much to performance given how small this table is even if the number of rows is massive. What approach have you used to do this?
Thanks @Or for your reply. The actual data set has appr. 25 columns and > 90M contracts and a multiple of this in records. I tried, without knowing best practice for this use case, to
a) identify those contracts that have gaps and store them in a temp table
b) in a loop generate dummy records for every missing contract and payment_year based on the temp table created in a) and bringing it back to the original table
I am simply wondering if there is an easier / cleander way to do this...or how other users would aproach this exercise.
I'm not great a Qlik scripting. I would do this sort of thing in a database sql call. Are you using a database to store this data?
I am including below how I would script this in a SQL statement. The particular syntax is specific to the database that I am using now, but many database engines support this kind of query today. Perhaps you could translate this in a manner that your database engine could use? If you have multi-year gaps in your data, you could expand each case statement to cover multiple years of missing data.
Maintenance is easy. Just add a new case statement each year to cover the new year.
The example attached creates a temp table and loads it with the sample data that you show above. The big honking select statement pulls that temp data and formats as per your requested output.
Syntax in the attached file.
Don't let those SQL skills atrophy!
Pretty sure what I'd do is:
1) Get the max year for each contract (append it to the original table as max_year)
left join(Original)
Load contract_id, max(payment_year) as max_year
Resident Original
Group by contract_Id;
2) Use a While loop to make one record per year:
NewTable:
Load contract_Id, contract_year, contract_year + iterno() -1 as payment_year, 0 as payment
Resident Original
While contract_year + iterno() - 1 <= max_year; // Double check this condition, I can't test it at the moment
3) Group by the entire dimension list and sum the payments
Final:
Noconcatenate Load contract_id, contract_year, payment_year, sum(payment) as payment
Resident Original
Group by contract_id, contract_year, payment_year;
4) Drop the tables we don't need anymore:
Drop table Original;
Drop Table NewTable;