Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I wanted to get opinions and best practices around combining tables. Specifically, I have 8 tables currently in my data model that all share the same key field. Each of these tables represents facts about the key field so I was thinking about combining them all into a single table. Doing this would mean instead of say 20M rows in each of the 8 tables, I would have 20M rows in a single table, but that table would have around 150 fields. The key field is only listed once per table in each of the 8 tables so I am not worried about inflating rows.
Is there a point at which it makes more sense to keep these as separate tables?
Let me know what you think.
Juraj,
Thanks for the reply. I dont have any dates in these tables. Each table represents an algorithm that creates risk factors for that are then used to come up with a risk score for each key.
While I am creating a score, each risk factor is still needed in the final application and could be layered together to look at different groups of people.
I am thinking that I want to combine all these tables because my data model is already getting large with many tables and I am trying to increase the performance of the application as a whole.
It seems this might have been the answer I was looking for:
https://community.qlik.com/t5/Qlik-Design-Blog/To-Join-or-not-to-Join/ba-p/1463102
I guess I will need to evaluate if I need to combine them.
Thanks for all that responded.
So the memory engine stores a table by creating a unique list of values for a field and an index with pointers to those values. So the more unique values you have the more memory footprint your taking up, the more tables with more unique values in your model the more memory you're taking up.
I would consider a couple of things. One are all the fields actually necessary? If not, drop those that aren't from your applications. Is the unique key actually necessary, because it's unique it will inflate the size of the index and pointers so if you don't actually need it then combine the tables and drop that unique key after you combine them. While it may be a pretty wide table you may see a reduction in memory footprint because of the deduping. Basically anywhere you can alleviate unique values do so and you'll generally see your overall memory footprint reduce. If you have timestamps and don't need timestamps definitely drop those. If your date fields have timestamps in them, change it to only the date. If you still need timestamps break them out into an hour, minute, second field so that they have very low cardinality. Addresses, chop them up into their elements so that they have lower cardinality. Do the same with phone numbers.
The caveat here is that you might need seperate tables for certain measure calculations to work properly without tons of set analysis. You have to weight the tradeoff in set analysis performance vs. memory consumption.
Thanks for the reply.
Most of what you spelled out I am already aware of and while I get where you are coming from to increase performance, those scenarios don't really apply here.
In its simplest form, I have 8 tables that each have a different number of fields. Most of the values in those fields are either Y/N or 1/0. Others are amounts or numbers. The key is required because it joins out to all my other tables that are needed in the application as it doesn't just join these 8 tables together.
My original thinking here was that if I combined these 8 tables it would make the data model simpler, and increase performance of the application as it would potentially reduce the number of hops between tables for some calculations.
My concern is that it won't be easy to put together all 8 tables especially those that have 10+ fields when I have rows into the many millions for data. It is possible that I might be able to get rid of some of these fields but the majority of them are used in the final application so they would be required in the final table.
Considering what you just mentioned about your data, I don't think combining them would add any significant performance impact vs. not having them combined. So if it simplifies things from a maintenance perspective you are likely better off just combining them. I would try it and verify there is no significant sign of performance degradation and move on.