Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Independent validation for trusted, AI-ready data integration. See why IDC named Qlik a Leader: Read the Excerpt!
cancel
Showing results for 
Search instead for 
Did you mean: 
Jagsfan82
Partner - Contributor III
Partner - Contributor III

Best performance for Joining Fields and adding Flags

Hello all.  The application I am working on is going to have around 125 million rows and a bunch (like 50) fields.  I'm curious what the best way to combine incremental loading (Monthly transaction logs) and calculation of additional fields.  

To explain it, imagine a customer transaction database with TransactionID, CustomerID, and a few date fields such as TransactionType, TransactionDate, SubscriptionStart, SubscriptionEnd. Essentially I want to group by CustomerID and add on additional fields to the dataset such as CurrentSubEnd (What is the customers CURRENT subscription end date), OriginalSubEnd (What was the customers ORIGINAL subscription end date), CancelDate (What was date when a Cancellation Transaction Type was processed), OriginalSubDate (What was the date when a New or Renew Subscription Transaction Type was processed).  I also want to add a flag (CancelFlag) that flags CustomerIDs that have cancelled their subscription.

So we have a slowly changing dimension situation where I need to group by CustomerID and then use either the old value of the extra field or the new value depending on conditions, and I would probably have to use some combination of FirstSortedValue and FirstValue while also having to deal with Null values.  I could create a temporary table that maps something like CancelDateOld, CancelDateNew and then write something like If( XXXX, CancelDateOld, CancelDateNew) for each field.

I've done something very similar for a smaller data set, but for hundreds of millions of rows I'm just not sure the most efficient way to handle this.  I'm assuming temporary tables with ~125 million transactions, even grouped to 20-30 million CustomerIDs, isn't going to be super efficient when using functions like Match or SubstringCount or IsNull.

Any thoughts or resources to look into would be helpful.

 

 

Labels (1)
1 Reply
Brett_Bleess
Former Employee
Former Employee

Apologies for not being direct help, but hopefully the following links may get you on the right track:

Help:

http://help.qlik.com/en-US/qlikview/April2019/Subsystems/Client/Content/QV_QlikView/LoadData/best-pr...

Design Blog on slowly changing dimension:

https://community.qlik.com/t5/Qlik-Design-Blog/Slowly-Changing-Dimensions/ba-p/1464187

Hopefully it helps, if not, shout back with further details, and hopefully someone else may be able to then add something more for you.

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.