Performance Tuning issues - Qlik Community

Anonymous · ‎2014-07-07

Hey folks,

I'm looking for some ideas to improve performance in a particular QlikView app. I've tried all the usual suspects, but getting this thing to perform well has been an elusive goal.

I'll start with what we started with: A fairly complex problem, and a data model with maybe a dozen separate tables all connected by an entity ID with about 16 million unique values. Most tables did not have records for all entities, but at least one did. Performance was deemed unacceptably slow. I'll call this one the "wagon wheel" document.

Next, we created a flat document: We merged all of those dozen tables into one giant flattened table, like QlikView is supposed to like! We got rid of the high-cardinality ID field as we no longer needed it to link anything - Now we have one row per entity.

No improvement in performance. ??

We also moved the document to a better server. In production it's on Server A (for "atrophied"?) which has 6 cores and 64GB of RAM. We moved the document to server B (for "Better"?) which is our own. It also has 64GB of RAM, but has 24 CPU cores. They're also newer, so despite a somewhat lower clock speed, their CPUMark score is about 60% higher than Server A's.

No improvement - In fact, most things are WORSE on the "Better" server.

For a short time, we had a version of the flat document that appeared to be performing much better - Then we realized it did not have a complete set of data. When all the data was put back in, it went right back to performing poorly.

I'm really not sure what strategy to employ next for performance tuning. More tables would allow for less data, but requires traversing multiple tables which is supposed to slow QlikView down. Flattening even further would result in slightly more data, but could possibly remove any necessary table linking.

What really causes qlikview to slow down more - Traversing tables or having more data?

rwunderlich · ‎2014-07-07

In general, Traversing tables is slower. But really depends on how the data is aggregated and how the expressions are coded.

From your description, it sounds like your expressions are inefficient. Make multiple copies of the slow chart(s) and eliminate expressions to identify which (or all) of the expressions are the source of the slowness. Post the slow expression(s) here for help.

-Rob

Anonymous · ‎2014-07-25

There wasn't much to the expressions, most of the data was calculated during the load script.

We finally got it working better, with several strategies:

1) There were actually two high-cardinality ID fields, one unique per entity (about 17 million values), and one that was only unique in a smaller grouping with up to 1 million values. We had originally split both into two fields, but that resulted in a calculated dimension being necessary to do a lot of things. Saved us on RAM, killed us on CPU, thus no performance gain. We were able to determine that the business can work without displaying the larger field for the most part, and by restoring the smaller field in combination with using the grouping fields to create a combination of dimensions that are unique, we were able to eliminate the CPU load of the calculated dimensions.

2) In many cases, the business needs were refined to allow us to place calculation conditions on objects that would not restrict the normal use cases for the business. That makes things perform better for the user by allowing them to clear selections and make new selections without having the "heavy" objects try to calculate across all data. In many cases, it also allowed us to switch to the smaller ID field for a dimension by restricting calculation to require only one group to be selected at a time, making the smaller ID field unique for the selected group of entities.

3) Some of the slowest objects were scatter plots. The above changes helped, but did not help these enough. These scatter plots had expression groups so the users could compare multiple attributes of the entities. However, even with the above changes, we were plotting a dot per entity, and in many cases those dots were simply being plotted essentially on top of each other. So, I created "round" tables where I used rounding factors that would move the dots only imperceptibly and simply contained unique combinations of those expressions, using an autonumberhash as the key field. By using that key field as the dimension, I was able to plot only one dot rather than plotting many dots on top of each other, and the size of this data was reduced by a factor of about 17.

Now, the user experience is screaming fast compared to before. Unfortunately, the reload process is fairly slow - A full reload from the database takes ~3 hours, a load that uses QVDs for everything except today's new data takes a bit over an hour. It appears that one of the main bottlenecks during the reload is disk I/O performance. We may have to go back to using Resident loads instead of dropping QVD files and reading them back in and see if that helps.

Anonymous · ‎2014-07-25

Are all your loads from QVD's optimised loads ?

I am not sure exactly why you say "We may have to go back to using Resident loads instead of dropping QVD files and reading them back in", but at a guess maybe you could use preceding loads to avoid re-reading previously read data.

Could you attach the log file from the qvw reload ?

sebastiandperei · ‎2014-07-25

Why the Reload time is a problem? Are you making Staging in the front-end application? You need to reload more than one time/day ?

Could you send a picture of your datamodel?

rwunderlich · ‎2014-07-26

Thanks for the update Kent. Those are some creative solutions.

-Rob