Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have three tables that capture information on annuity contracts my company sells:
Customer (include Customer ID field, and various other customer-related fields)
Sales (has Customer ID, Subaccount ID, and various deposit transaction amounts, dates, etc.)
Account Values (has Customer ID, Subaccount ID, and various daily/monthly account value amounts)
The Sales table and the Account Values tables each have millions of records in them, and the Customer table has around 1 million records.
In my app, I'd like users to be able to select a specific subaccount (such as Vanguard S&P 500 Fund), and then be able to see total deposits into that subaccount, account values at various points in time, and finally, which customers have any money/activity in that subaccount.
If I don't do any aliasing, loading these tables as is creates a synthetic key between the Sales and Account Values tables, on Customer ID and Subaccount ID. That's what I've chosen to allow, based on research I've done where people have mentioned that synthetic keys aren't necessarily "bad", as long as you expect them to happen. However, I want to confirm that, and make sure I'm not creating an inefficient data model.
I know I can create my own key - such as: autonumberhash128("Customer ID"&'-'&"Subaccount ID") as Customer_Subaccount_Key.
However, I still need the Sales and Account Values tables to join to the Customer table, so it seems that I would still have synthetic keys somewhere.
Am I OK using a synthetic key here, or is there a better solution?
Synthetic keys are sometimes ok, sometimes bad & sometimes disastrous.
I once heard an interesting discussion about synthetic keys by 2 Qlik Masters. One arguing they are a sign of sloppy data modelling and the other arguing they are sign of sloppy scripting. They both agreed they are the sign of something sloppy though.
I would suggest getting rid of synthetic keys where possible.
[The only truly ok synthetic key in my opinion is the one created by the IntervalMatch() function.]
The "Sales" and "Account Values" table have 2 variables in common and Qlik automatically associates those 2 by forming a synthetic key. However, you can force those 2 tables to be associated ONLY on Customer ID.
Let's assume you have already loaded the "Customer" and "Sales" table. Now when you load the "Account Values" table,
use the Qualify and Unqualify statements as shown below:
-------------------------------------------------------------------------------------------
QUALIFY subaccount_id;
account_values:
LOAD
customer_id,
subaccount_id
FROM ...account_values... ;
UNQUALIFY *;
-------------------------------------------------------------------------------------------
What this does is, it forces the tables to be joined on customer_id by adding the table name as prefix to the subaccount_id. So the subaccount_id from "Account Values" table will be renamed as account_values.subaccount_id.
Hope this helps.
Ashwin - this is a good point. In fact, in our first app, I actually did this very thing: I aliased the Sales table Subaccount field as "Subaccount (Sales)", and the Account Value Subaccount field as "Subaccount (Acct Val)".
However, this forces the user to have to choose from among two different Subaccount fields, depending on whether they want to view Sales totals or Account Value totals. This was confusing to users - especially when they would filter on the sales-based Subaccount field and try to look at a sheet with account values on it.
In subsequent apps, I decided to just allow the synthetic key to happen, so users would only need to worry about one Subaccount field.
By the way, this generally seems to have worked fine: no issues with app loads or app performance. So based on my personal experience, I see nothing wrong with the synthetic key approach. I'm just wanting more feedback from other developers to see if there's any "gotchas" that may arise, or better solutions.
I don't think there is anything wrong with this synthetic key, as it appears to be legitimate from a data model perspective. As long as you are getting decent performance, I would leave it as is.
If you wanted to get rid of the synkey, a classic approach would be to concatenate Sales and Account into a single table and then only link to the Customer table. However, then you may have to deal with propagating dimension values, extra work. So if the Syn Key is working for you, I would move on to the next project.
-Rob