Re: How to avoid loading all QVD into memory - Page 2 - Qlik Community

trust_okoroego1 · ‎2018-04-06

Hi,

I am running a test, and need to read just the first 100 rows from a 40 Million rows qvd file. Using the FIRST key word, the whole data will be loaded into memory first before qlikview will select the first 100 and likely drop the rest.

Is there a way like in SQL select statement to apply the limit and only load 100 record into memory?

Thanks for your support.

marcus_sommer · ‎2018-05-28

In my short test (QV 11.2 SR 12., various QVD's ca. 25 / 168 / 460 MB, run-time checking with now() in a variable directly before/after the load) I also tested on recno() and it was not really a difference to rowno() or the other methods. Are you sure that it is really significantly faster as an optimized load with the whole data?

- Marcus

Miguel_Angel_Baeyens · ‎2018-05-28

It was at least, we recently migrated to 12.10 and some things have changed drastically for better and some for worse.

The reason why we were doing this is to make sure that there was data in those QVDs and it has certain columns and rows, and dates as dates, etc. It's an extraction from SAP, rather sensitive, and we are using this as a sanity check of the load, and since there are developments and upgrades on the SAP side too, it's not surprising that the comma separate decimals appear instead of thousands, or the dates are DD.MM.YYYY instead of MMMMYYDD, and so on.

Also, the length of the file varies, from no lines at all to dozens of millions of lines. If after running a check on this "reduced" QVDs everything looks good, other tasks are triggered and eventually the full QVD set is loaded.

We could be loading the QVDs in their entirety, but when they are big, and the server is running another 12 tasks, resources can be an issue, therefore the "make-it-smaller" thing.

But again, that's our case. Loading a QVD and dropping it will not make any harm in other cases and in terms of speed, if it's just the QVD without any JOINs or other transformations, the time of loading 100 or 1000000 will be negligible.

marcus_sommer · ‎2018-05-28

Ok. I understand your usecase of pre-loading a reduced dataset and checking them before triggering the big tasks with potentially invalid data. But I think this meant that the data-checking is quite expensive and (by a potent storage-system) it won't make a big difference if a large qvd is first completely loaded and then reduced to n records before the checking are applied.

If there are no further quality-checkings like above mentioned it might be possible to make these checkings with the file-functions like qvdfieldname() and the xml-headers of the qvd which also include which formattings a field has whereby I'm not sure that it will be much faster and/or easier to implement or to use. Maybe it's an idea ...

- Marcus

Miguel_Angel_Baeyens · ‎2018-05-28

Agreed. File functions and metadata should be enough.

Storage is also a factor in our case with different disks in AWS, some of them fast some of them painfully slow for big files. So better loading a fraction of the file and if it's OK, then the rest.