Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
cbushey1
Creator III
Creator III

1 table or many tables

Hi all,

I wanted to get opinions and best practices around combining tables. Specifically, I have 8 tables currently in my data model that all share the same key field. Each of these tables represents facts about the key field so I was thinking about combining them all into a single table. Doing this would mean instead of say 20M rows in each of the 8 tables, I would have 20M rows in a single table, but that table would have around 150 fields. The key field is only listed once per table in each of the 8 tables so I am not worried about inflating rows. 

Is there a point at which it makes more sense to keep these as separate tables?

Let me know what you think. 

 

Labels (1)
14 Replies
juraj_misina
Luminary Alumni
Luminary Alumni

Hi,
I remember seeing a recommendation to keep your tables rather narrow, so at first thought I'd go with separate tables in your case. But the answer is, of course, a bit more complex and depends on what else do you want to do with that data. Are there any other common dimensions? Maybe a date field in few of those tables for which you'd might like to create a common filter (so that end users are not overwhelmed with Order Year, Order Month, Order Date, Invoice Year, Invoice Month...). If yes, you should consider combining the tables , possibly - but not necessarily - into one.
Hope this helps.
Juraj
dplr-rn
Partner - Master III
Partner - Master III

theoretically single table would be best for performance but take more disk space. as juraj said it depends on scenario.
but if there is no other dimension tables which increases hops between tables when filtering i dont believe it will not make too much difference
cbushey1
Creator III
Creator III
Author

Juraj,

Thanks for the reply. I dont have any dates in these tables. Each table represents an algorithm that creates risk factors for that are then used to come up with a risk score for each key.

While I am creating a score, each risk factor is still needed in the final application and could be layered together to look at different groups of people.

I am thinking that I want to combine all these tables because my data model is already getting large with many tables and I am trying to increase the performance of the application as a whole. 

juraj_misina
Luminary Alumni
Luminary Alumni

If complexity of data model is becoming an issue than do yourself a favour and combine the tables 😉
cbushey1
Creator III
Creator III
Author

It seems this might have been the answer I was looking for:

https://community.qlik.com/t5/Qlik-Design-Blog/To-Join-or-not-to-Join/ba-p/1463102

I guess I will need to evaluate if I need to combine them. 

Thanks for all that responded.

swallace104
Contributor III
Contributor III

So the memory engine stores a table by creating a unique list of values for a field and an index with pointers to those values. So the more unique values you have the more memory footprint your taking up, the more tables with more unique values in your model the more memory you're taking up. 

I would consider a couple of things. One are all the fields actually necessary? If not, drop those that aren't from your applications. Is the unique key actually necessary, because it's unique it will inflate the size of the index and pointers so if you don't actually need it then combine the tables and drop that unique key after you combine them. While it may be a pretty wide table you may see a reduction in memory footprint because of the deduping. Basically anywhere you can alleviate unique values do so and you'll generally see your overall memory footprint reduce. If you have timestamps and don't need timestamps definitely drop those. If your date fields have timestamps in them, change it to only the date. If you still need timestamps break them out into an hour, minute, second field so that they have very low cardinality. Addresses, chop them up into their elements so that they have lower cardinality. Do the same with phone numbers. 

The caveat here is that you might need seperate tables for certain measure calculations to work properly without tons of set analysis. You have to weight the tradeoff in set analysis performance vs. memory consumption.  

cbushey1
Creator III
Creator III
Author

Thanks for the reply. 

Most of what you spelled out I am already aware of and while I get where you are coming from to increase performance, those scenarios don't really apply here. 

In its simplest form, I have 8 tables that each have a different number of fields. Most of the values in those fields are either Y/N or 1/0. Others are amounts or numbers. The key is required because it joins out to all my other tables that are needed in the application as it doesn't just join these 8 tables together. 

My original thinking here was that if I combined these 8 tables it would make the data model simpler, and increase performance of the application as it would potentially reduce the number of hops between tables for some calculations.

My concern is that it won't be easy to put together all 8 tables especially those that have  10+ fields when I have rows into the many millions for data. It is possible that I might be able to get rid of some of these fields but the majority of them are used in the final application so they would be required in the final table. 

dplr-rn
Partner - Master III
Partner - Master III

So your data model contains only 8 tables?
is it possible to share a picture?
swallace104
Contributor III
Contributor III

Considering what you just mentioned about your data, I don't think combining them would add any significant performance impact vs. not having them combined. So if it simplifies things from a maintenance perspective you are likely better off just combining them. I would try it and verify there is no significant sign of  performance degradation and move on.