Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello all. The application I am working on is going to have around 125 million rows and a bunch (like 50) fields. I'm curious what the best way to combine incremental loading (Monthly transaction logs) and calculation of additional fields.
To explain it, imagine a customer transaction database with TransactionID, CustomerID, and a few date fields such as TransactionType, TransactionDate, SubscriptionStart, SubscriptionEnd. Essentially I want to group by CustomerID and add on additional fields to the dataset such as CurrentSubEnd (What is the customers CURRENT subscription end date), OriginalSubEnd (What was the customers ORIGINAL subscription end date), CancelDate (What was date when a Cancellation Transaction Type was processed), OriginalSubDate (What was the date when a New or Renew Subscription Transaction Type was processed). I also want to add a flag (CancelFlag) that flags CustomerIDs that have cancelled their subscription.
So we have a slowly changing dimension situation where I need to group by CustomerID and then use either the old value of the extra field or the new value depending on conditions, and I would probably have to use some combination of FirstSortedValue and FirstValue while also having to deal with Null values. I could create a temporary table that maps something like CancelDateOld, CancelDateNew and then write something like If( XXXX, CancelDateOld, CancelDateNew) for each field.
I've done something very similar for a smaller data set, but for hundreds of millions of rows I'm just not sure the most efficient way to handle this. I'm assuming temporary tables with ~125 million transactions, even grouped to 20-30 million CustomerIDs, isn't going to be super efficient when using functions like Match or SubstringCount or IsNull.
Any thoughts or resources to look into would be helpful.
Apologies for not being direct help, but hopefully the following links may get you on the right track:
Help:
Design Blog on slowly changing dimension:
https://community.qlik.com/t5/Qlik-Design-Blog/Slowly-Changing-Dimensions/ba-p/1464187
Hopefully it helps, if not, shout back with further details, and hopefully someone else may be able to then add something more for you.
Regards,
Brett