QVX_UNEXPECTED_END_OF_DATA

njmaehler · ‎2022-03-15

Hi,

The current environment is QlikView 11.20 SR 18 and SQL Server v18.9.2
I have a client who has in excess 17.9 million rows of data in a single table.
We created an extractor using the OLE DB connection string and use this to pull the data directly from the SQL table and store it into a QVD from where we can utilize the QVD to do the visualisations.
The extractor is intermittently falling over with the following message:
QVX_UNEXPECTED_END_OF_DATA

This can sometimes happen after 12 million rows or 300k rows or even 15 000 rows so there is no pattern to it at all.
If I run the select * from table within the SQL management studio query, it does take around 30 minutes to return the data.
I noticed that even though there are almost 18 million rows this is not really huge in Qlik standards (I have other clients with way more data) but the table is unstructured (no keys or indexes) and is wide - many fields and mostly nvarchar() fields. There are also to nvarchar(1024) fields. I did read where one post mentioned these long nvarchar() fields as being problematic.

I then created views of the single table with a where clause per year in a attempt to reduce the number of rows that would be returned and I EXCLUDED the two nvarchar(1024) fields as well. This didn't seem to make a difference.

Other posts have said that it could potentially be a network or memory issue. I don't think it is a memory issue because the server looks like it is coping fine. I do have their infrastructure team coming to look at their network later today.

Any other suggestions that I could look at?

My current work around / fix is that I have manually extracted the data by month per year (even some of these smaller subsets of data had to be run a few times before I have the same count as the SQL DB) and I store everything prior to the actual year and current month as "historical data" and then only reload the current month daily. The problem with this interim solution is that they can back date their data up to - 6 months back and in some exceptions even up to 4 years back depending on specific cases.

Dan-Kenobi · ‎2022-05-23

As I read your post my mind naturally gravitated to Networking issues.

However, as I went deeper into that, I would argue the amount of rows would be a readhearing in this situation. What your problems speaks to now is amount of data downloaded from the server.

This table of yours seems to be carrying more bytes than your average sql query would. You said it yourself: it takes 30 minutes on Management Studio.

If you can't get around infrastructure issues, I would recommend you to create one extractor per slice of data (could be by MonthName, Year, ProductLine, etc) and have all these QVDs dumped in a single folder (say \Data\Extract\), and make another job that simply merges all of these QDS together, and then dumps them.

This will allow you to leverage parallelism while (likely) circumvent the issues you're experiencing.

If you use an Indexed column to do the data slicing (even better if it's a clustered-indexed) your query should be faster than you're predicting.

If you can go one extra step and make these parallel jobs into Incremental Loads instead (in which case you shouldn't dump the partial qvds once you're finished with them), then you're likely to be limited only by your disk space on the long-run. Memory/CPU/Load Times shouldn't be an issue anymore.

Please let me know if this helped. -- I can lend a hand with the scripts if you want.

Dan-Kenobi · ‎2022-05-23

Truth be told, even with the suggested strategy, you might still incur in some exceptions here and there.

If that's the case, hit me up and we can try to pin down the precise point where the exception's been thrown in your data pipeline.

Connectors

Related Topics