Occasionally missing records from import - Page 2 - Qlik Community

Jan-Arie · ‎2021-01-12

Hello, I'm new to Qlikview and took over the project from an ex colleague.

The problem is that Qlik is very occasionally missing records after importing from a database.

The project: Every 5 minutes external devices from our customers are putting records into a MySQL database.
Qlik runs every 30 minutes and imports and analyzing the most recently added records from that MySQL database. In the MySQL table every record has an unique auto-increment id and is used as the pointer in Qlik-qvw from where to fetch al the new records.
In Qlik there are several queries to fetch these data and with OUTER JOIN/LOAD DISTINCT combined to 1 table. And at the end of the script combined with the already existing qlik QVD table to form 1 new complete QVD table.

This works good, but very occasionally Qlik is missing records during a certain period.
Sometimes a complete week is correctly imported and then Qlik misses records from a time frame of 5 till 30 minutes.
Complete randomly, sometimes multiple times a day, sometimes only a few records of a 5 minute time frame.

When checking the log files and execute the SQL query by myself on the MySQL database the missing records are just included in the output.

(To get an idea how many missing records: last 20 days with ~70.000 records: 350 records missing)

Why is Qlik missing records?

marcus_sommer · ‎2021-01-14

Your sql isn't extremely complex but it's also not a simple pull from a single table. The sql contains among other things several sub-queries and various tables. Therefore it could become quite difficult to detect any locking or to ensure that really all relevant tables have complete and valid data - means there might be already all keys/data within the main-table but not all conditions might be fulfilled at this time and therefore some records might be excluded.

In this case there is no error and really missing records (it happens like it should) else just the point of time of the data-extract is different and you did compare apple with oranges. To exclude any possible error by loading the data into Qlik you could create with your query a temporary table(s) within your database and/or exporting it into an external csv-file maybe just adding a timestamp to the table/filename and then comparing the number of records from this table against the qvd and against a current query in the db with the previous conditions-parameter.

Beside this the approach to create (several) temporary tables to avoid/minimize the need of various (sub) queries might increase the query-performance(s) quite heavily and is also reducing the risks of conflicting with any locking - and not mentioned yet any kind of time-outs. Qlik won't notice this unless the db+driver return an error, would they? Therefore also a check of the execution times and/or within the performance logs from Qlik/DB/OS may give some hints.

- Marcus

Jan-Arie · ‎2021-01-14

That sounds plausible.
Planning is to create temporary tables and store them in csv/or whatever format. And convert/import the temporary tables then into Qlik. I can go a long way with your ideas and suggestions.
Marcus, Thank you very much!

Jan-Arie

Jan-Arie · ‎2021-01-22

I think I have found the failure of missing records.

The queries read from the source MySQL table start at a point in the table as fetched in the latest run.
This is the 'vTagsHistoryID' pointer which is fetched at the latest point in the script and derived from the records in the final QVDfile (MAX function).
But during the running of the QVW script there are new records inserted by the customer systems. And to complete the script it takes roughly 5 till 10 minutes, so it is most likely that new records are inserted during this time.

The first query fetch nicely from the starting pointer till the end of source table, but this last record is not the latest record for the next coming queries. The result set is for each successive query a little bit bigger.
And when the latest queries fetches some results and put into the final table the maximum ID is higher then was available during the first queries.
The result is that in the next run the starting pointer is higher then the maximum id value of the first query and therefor records are skipped.

The customer systems are not inserted at the same time interval, this can sometimes varies a few minutes, and therefor the difference where to start reading was very small (no missing records) to larger difference (greater change to miss records).

This is what I have modified to solve it:
First determine the Maximum ID value in the source MySQL table before running any other query.
Modify the queries that is fetch only data in that range (e.g. " AND b.id > '$(vTagsHistoryID)' AND b.id <= '$(vMAXTagsHistoryID)' ").
And use/store that Maximum id as the starting point for the next run.
But don't use the maximum id from the derived QVD table!

- Jan-Arie