Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
warfollowmy_ver
Creator III
Creator III

The best storage solution

Hi, all.

I have long found that the best option to download data is to use qvd. But in qvd has a number of significant shortcomings.

1) In case of failures during the recording of the HDD, data is lost. Yes, you can set up a monitoring tool to understand where the failure occurred, but it does not solve the problem. You can sometimes be not at work and then it will be a disaster.

2) A terrible etl process. For efficiency, you have to do either complex keys in qvd or a few iterations with "where exists".

3) Update the data. If the update is frequent, you have to read the entire qvd file, write down even one line into it, and then write the all file to the disk again. This is a huge load on the disk and long time.

From sql to qlik, data is very slow. For example, 10,000,000 rows inside the sql from the physical table to the temporary table are unloaded for 4 seconds. the load on the processor is usually one thread. The same 10,000,000 rows of clicks in the qlik are unloaded for about 20 seconds. load on the processor is 4-5 times higher. The size of the package is not affected.

That is, everywhere there are drawbacks.


Question.

What mechanisms do you use? How do you solve these or other problems?

3 Replies
marcus_sommer

In my opinion are QVD's a very smart solution and I don't see taht there really any drawbacks. I don't know much other tools (at least not so well like Qlik) and therefore not if any other tools provide better or at least similar approaches of storing data temporary to speed up the overall loading times. Personally I doubt it.

Your point 1) isn't really specific to Qlik else to each kind of data/tool and means that you need (enough and current) backup's of your rawdata and the deduced QVD's. If you really want to ensure that there are nearly no failures you will need to mirror your environment or at least the important parts to reduce the costs and efforts. Regarded to the storage will RAID systems be useful and of course a switch from HDD to PCI SSD will speed up the performance significantly.

To 2) I don't think it's a terrible ETL process because in each system you will need to use multi-layer approaches and slice down many of the tasks to get an optimum of stability and performance. I know there are tools in which you could use wizards and which have some automatism features but they are not mandatory necessary to create a complex ETL - by quite large environments will the time to adapt them to a qlik environment and their costs probably lesser then their benefits but probably only then.

By 3) I suggest to slice the QVD's down maybe to a monthly level. Of course it needs a bit more overhead to handle multiple files but it reduced the overall amount of data which needs to read and stored again.

- Marcus

warfollowmy_ver
Creator III
Creator III
Author

I do everything absolutely as you wrote. 😃

It was just that I was interested in others. Maybe there are better options.

I can only disagree on paragraph 2 of the etl. After the experience with sql, I can not say that the qlik means are normal. 😃 https://ivan-shamaev.ru/wp-content/uploads/2015/09/ETL_in_qlikview.jpg

SQL has indexing and etc... Aggregation is many times faster in SQL than in a qlik in the script.

If the qlik from SQL could have got the data as quickly as the QVDwithout doubt, I think few would have used QVD. In my opinion, the only advantage QVD is speed.

Also there is still a question there can be someone knows, loading of data in qlik from SQL is similar does not depend on type of connection and the ver. driver (ODBC and OLEDB)?

The general question for all the others remains in force, what is the best solution for the storage facility in your opinion?

marcus_sommer

Of course Qlik is quite different to SQL and IMO this has more benefits as disadvantages. Maybe the biggest disadvantage is the performance of aggregation-loadings especially if they are combined with where-clauses and order by statements mainly caused through the fact that the aggregation itself is performed only single-threaded - but if you split these tasks you could speed up the loading significantly. See as an example: Optimize Group By Performance.

Further important is to use an incremental load-approach not only by loading the raw-data from the database else also by each following transformation-level and this includes also any needed aggreations.

Beside this I suggest to consider if you really needs all these aggregations especially if the rate of records between non-aggregated to aggregated is rather small. Personally I try to avoid them nowadays because I have often just a rate of 1:2 to 1:4 and the bit more of RAM consumption and UI calculation times are not significantly noticeable so that the more efforts within the script doesn't to be worthwhile. The reason for it is that Qlik doesn't store the data in a rather sequential way like in SQL else in symboltables (only distinct fieldvalues) and datatables with a bit-stuffed pointer and this means that a dataset which is originally 10 times larger may need approximately just twice of storage respectively RAM (it depends essentially on the number of distinct fieldvalues).

AFAIK the used driver to load the data from the database is quite important - not only in regard to the provided features else also by the peformance. This means it's worth to check if there is any more suitable driver available (usually OLEDB are faster than ODBC) and by really large data it might be considerable to customize one or to build an own. Also it might be an alternatively not to access the database directly from Qlik else to export the data from the database into various text-files on your server or a preferred share. This gives you the opportunity to use different time-frames (if there is a low workload on the database) as when Qlik queries theses data within any task-chain. Further I assume (I never tested it) that the loading-times from the text-files will be lesser as by querying the database directly (even if you estimate the same network performance).

This are just my experiences and until now I'm quite happy with it. I must admit that I'm not very experienced with databases especially not with modern releases within the last 10 years but if I look on the performance by our ERP system and SAP BO / BW provided from a lot of capable IT guys in our company then I'm just quite more happy that I have QlikView and could build my own datamodels/views instead of waiting for results.

- Marcus