Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey There,
I have a bit of a conceptual question that I was hoping someone might have some insight on. Suppose I have a table such as the below:
Data:
Load [KeyField1],
[KeyField2],
[KeyField3]
Resident Whatever;
And to this table I want to join a very wide table, let's say sixty columns wide, but with only 50 rows. Would it be more efficient in general to have a single join such as:
Left Join(Data)
Load *
Resident WideTable;
Or, would it be better to do multiple joins, like so:
Left Join(Data)
Load
[KeyField1],
[KeyField2],
[KeyField3],
NormalColumn1,
NormalColumn2,
...
NormalColumn20
Resident WideTable;
Left Join(Data)
Load
[KeyField1],
[KeyField2],
[KeyField3],
NormalColumn20
NormalColumn21,
...
NormalColumn40
Resident WideTable;
etc.
In other words, how does joining a very wide table impact performance as opposed to joining multiple tables that are not as wide (I'm working in Qlikview 11.20 SR 12).
Data:
Load DISTINTC
[KeyField1],
[KeyField2],
[KeyField3],
Autonumberhash128([KeyField1]&'-'&[KeyField2]&'-'&[KeyField3]) as Key
Resident Whatever;
Left Join(Data)
Load
*,
Autonumberhash128([KeyField1]&'-'&[KeyField2]&'-'&[KeyField3]) as Key
Resident WideTable;
The only way to really know is to benchmark it by trying.
My gut feel is that if you broke it down into parts then each time you did the join it would get slower and slower, as there would be more data in memory and more columns for the engine to consider.
In the main really wide tables are to be avoided. If you can CROSSTABLE your data you may fine massive performance and usability benefits anyway.
Also, I see you have RESIDENT statements. Try not to load data into memory and then shunt it around there. If you can load from source or (better still) QVD directly to where you need the data to be that is better than re-parsing it.
Good luck!
Steve
These questions are typical cases for 'it behaves differently than expected' & 'I have experienced something else' so I would suggest that you run some tests with your data.
There might be big differences between measurements, due to e.g.
- relationship between primary keys in both tables 1:1 or different
- cardinality of your 60 fields (for example, 60 binary flag fields would probably not be considered a wide table)
- your environment (CPU, RAM, cores)
How many rows in Data table?
Do you need to join the tables?
Just create a compound key using
autonumber ([KeyField1] + [KeyField2] + [KeyField3], 'KeyField') as KeyField
in each table and let the association engine in Qlik match the relevant data. No need for any joins.
In most cases there is no need to use joins in QlikView.
And if you use the Data table only to filter the second table, you can also create a combined Key like
[KeyField1] & [KeyField2] & [KeyField3 AS CombinedKey
in the Date table, then use a WHERE clause in your wide table load:
LOAD ...
FROM ...
WHERE EXISTS(CombinedKey, [KeyField1] & [KeyField2] & [KeyField3]);
why do you want to join the wider table anyway? why not keeping as own table with an unique key to first table?
how many rows exists in your first table?
if you Need to join both table, as Steve said, try it both ways
Hi Steve,
Thanks a lot for the response. The "Resident" statements were just my attempt at some Pseudo code to illustrate the problem. I'm working on an older model I didn't create. In the original model, the developer used a series of IF statements (90 of them!) to create new Flag columns which were then aggregated and joined to another table. To save on the reload time, I've created a new table to replace the hard-coded if statements (with all the same columns that they created in their IF statements). This saved a lot of time on the reload, but got me wondering in general about Qlikview's handling of wide tables and whether there was a good "rule of thumb" that could be followed.
My initial instinct was to create a Cross Table (and may still do so if I can get buy-in), but I'd then likely need to make a number of changes to the front-end to accommodate the new way the flags are being handled.
Hi Colin,
Thanks a lot for the response. There are +- 150 million rows in the first table. In the current design it's necessary to join the tables as the resulting table is joined to yet another table. As mentioned above to Steve, this is an attempt to replace a lot of hard-coded ifs in the app without making major changes to the front-end.
Hi Swuehl,
Thanks a lot for the response. As mentioned to Steve, I'm looking to replace a whole bunch of hard-coded ifs with a single table. The first table is very large (+- 150 million rows). All of the sixty fields are either 1 or zero.
Obviously the best way to check the impact would be to Benchmark various scenarios, but I was wondering in general if QV's joins become less efficient when the table joined from is quite wide (although you mention above that for binary fields that might not be that many columns).