Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Jan-Arie
Contributor II
Contributor II

Occasionally missing records from import

Hello,  I'm new to Qlikview and took over the project from an ex colleague.

The problem is that Qlik is very occasionally missing records after importing from a database.

The project: Every 5 minutes external devices from our customers are putting records into a MySQL database.
Qlik runs every 30 minutes and imports and analyzing the most recently added records from that MySQL database.  In the MySQL table every record has an unique auto-increment id and is used as the pointer in Qlik-qvw from where to fetch al the new records.
In Qlik there are several queries to fetch these data and with OUTER JOIN/LOAD DISTINCT combined to 1 table. And at the end of the script combined with the already existing qlik QVD table to form 1 new complete QVD table.

This works good, but very occasionally Qlik is missing records during a certain period.
Sometimes a complete week is correctly imported and then Qlik misses records from a time frame of 5 till 30 minutes.
Complete randomly, sometimes multiple times a day, sometimes only a few records of a 5 minute time frame.

When checking the log files and execute the SQL query by myself on the MySQL database the missing records are just included in the output.

(To get an idea how many missing records: last 20 days with ~70.000 records:  350 records missing)

Why is Qlik missing records?

 

 

 

 

12 Replies
marcus_sommer

Maybe those records are not unique and removed through the distinct measure and/or the records aren't complete/invalid in some ways and deleted from any where-clause or join.

More likely as the above is that those records are locked from the db to the query-time and therefore not transferred. This might be bypassed through a read-only access whereby db and driver would need to support this feature.

- Marcus

Jan-Arie
Contributor II
Contributor II
Author

Hello Marcus,

Thanks for your response.
The records in the source database are unique and complete.  They are "equal" to the other records, I mean no missing,empty or malformed fields and with correct data into the fields.
The MySQL database/table is only used by the import application with INSERT queries. And Qlik only does a SELECT on this table.
But I will check if there is some issues with locking.
The table contains already >60 million records, but I presume that that cannot be an issue.

And when I execute the queries, which Qlik uses, I will get the correct records.  So joining in Qlik script is maybe an issue, but then I can't imagine why it will work for weeks and then sometimes not.

 

 

marcus_sommer

It might be helpful to compare the records of the db and those which are loaded in Qlik. Means you need an additionally query on the db maybe with something like this: select count(*) from <tablename> or by querying the db-metadata or per max(auto-increment id) or similar measures. In Qlik you could use: noofrows('tablename').

If there are any differences you could be alert and/or trying to identify when which records are missing - to isolate the nearest timestamp and by which records it happens may give some hints to the cause.

- Marcus

Jan-Arie
Contributor II
Contributor II
Author

Hello Marcus,

Yes, I have done that.  First I have done that with a SELECT query in MySQL and store that result. Then I converted the QVD data file and compare it with the MySQL resultset. 

Now I have made a Qlik script which does it directly (for two Tags in this case and from a certain date):

TRACE >>>>> Start data comparison;
TableESyncTagsHistory:
TRACE >>>>> Fetch ESync data

LOAD DISTINCT
Id as Id,
TagId as TagId,
EsyncTimeStamp as EsyncTimeStamp;
SQL SELECT
a.id as Id, a.TagId,
a._date as EsyncTimeStamp
from
esync.`esync_tagshistory` a
where
a._date > '2020-12-17'
and ( a.TagId = '5532' or a.TagId = '5533' )
ORDER BY
a.TagId, a._date;


TRACE >>>>> Fetch Qlikview data;
TableProcessData:
LOAD DISTINCT
EsyncTagsHistoryID as Id,
TagId as TagId,
TimeStamp as QlikTimeStamp
FROM [D:\temp\qvd\qProcessdata.qvd](qvd)
where
TimeStamp > '2020-12-17'
AND ( TagId = 5532 OR TagId = 5533  )
;

TRACE >>>>> Store tables for debugging;
STORE TableESyncTagsHistory INTO D:\temp\qvd\qTableESyncTagsHistory.qvd;
STORE TableProcessData INTO D:\temp\qvd\qTableProcessData.qvd;

DROP TABLE TableESyncTagsHistory;
DROP TABLE TableProcessData;

 

// Merge the two tables, to be sure that it is not in memory, read it from the files
TRACE >>>>> Merge data;

TableMergeData:
TRACE >>>>> Load Esync table;
LOAD * FROM [D:\temp\qvd\qTableESyncTagsHistory.qvd](qvd);

// TRACE >>>>> Load Qlik table;
JOIN LOAD * FROM [D:\temp\qvd\qTableProcessData.qvd](qvd);

TRACE >>>>> Store merge table for debugging;
STORE * FROM TableMergeData INTO D:\temp\qvd\qTableMergeData.qvd;

TRACE >>>>> Drop temporary tables;TRACE >>>>> FINISHED;
// end script

When analyzing now the qTableMergeData.QVD file, the missing records in Qlik doesn't have the QlikTimeStamp filled in.

Missing recordsMissing records

 

 

 

But this more of a consequence conclusion that records are missing. And not the solution to prevent it.

marcus_sommer

For me it looked that you compares a current sql load with historical qvd data because the qTableProcessData.qvd exists already and must be loaded/created earlier. I thought more in direction like:

#records: sql select #records from db.systemtable where table = 'YourTable';

or maybe:

#records: sql select count(*) as #records from db.YourTable;

t: load fields …; sql select fields … from db.YourTable;
store t into t.qvd (qvd);

let check1 = noofrows('t') = peek('#records', 0, '#records');
let check2 = qvdnoofrecords('t') = peek('#records', 0, '#records');

In a second step you may include also timestamp-information especially if the queries take some time.

- Marcus

Jan-Arie
Contributor II
Contributor II
Author

Hello Marcus,
I am sorry that I have confused you.  
The script is indeed only to verify that there are missing records.  I just wanted to be sure that my manual comparison between MySQL database and Qlik QVD gives the same result of missing records as with this automatically check.

But this is not the problem. Main problem is missing records when importing from MySQL into QVD.

See next post with a part of the script explanation.

Regards, Jan-Arie

 

 

 

 

 

Jan-Arie
Contributor II
Contributor II
Author

This is (a part of) the script:
On other tabs are other queries to fetch different sets of data

 

// Select which stations must be read  (different customers, but only need 2 of them)
SET IdCustomer="(17,21)";

TRACE >>>>> Load all seperate process data into TempProcessData;

//Force dataformat for eSync database
let vLastTimeStamp = date(vLastTimeStamp,'YYYY-MM-DD hh:mm:ss');

TempProcessData:
TRACE >>>>> Load TempProcessData with Capacity;
TRACE >>>>> Calculate the production data for the counters;

LOAD DISTINCT
TagId,
id as EsyncTagsHistoryID,
ApplyMap('MapMachinePart',TagId) as MachinePart,
ApplyMap('MapMachineNumber',TagId) as MachineNumber,
ApplyMap('MapLineNumber',TagId) as LineNumber,
ApplyMap('MapTrackNumber',TagId) as TrackNumber,
Val as Counter_Val,
Production as Production_Val,
_date as TimeStamp,
StationId;
SQL SELECT b.TagId,
b.id, b._date,  a.StationId, b.Val,
if(@lastTag = b.TagId,
      if(b.Val - @lastValue>=0,
           b.Val - @lastValue,
           b.Val),
      if(b.Val - (SELECT @prevVal := c.Val from esync_tagshistory c
           where c.id < b.id
           and c.TagId = b.TagId
           and c.`_date` > (now() - interval 48 hour )
           order by c.id desc limit 1) >= 0,
           b.Val - @prevVal,
           0
      )
) as Production,
@lastTag := b.TagId,
@lastValue := b.Val,
@lastID := b.id
FROM (SELECT * FROM esync.`esync_tags` WHERE Name LIKE "%_Counter" AND StationId IN $(IdCustomer)) a,
esync.`esync_tagshistory` b ,
(SELECT @lastTag := 0, @lastValue := 0, @lastID := 0, @prevVal := 0) SQLVars
WHERE a.Id = b.TagId AND b.id > '$(vTagsHistoryID)'
ORDER BY b.TagId, b._date;

//  ========> The following section with "%_Capacity" will be repeated several times, but then the
// name will be "%_MaxCapacity" or "%_CrateType" or ...


OUTER JOIN (TempProcessData)
LOAD DISTINCT
TagId,
id as EsyncTagsHistoryID,
ApplyMap('MapMachinePart',TagId) as MachinePart,
ApplyMap('MapMachineNumber',TagId) as MachineNumber,
ApplyMap('MapLineNumber',TagId) as LineNumber,
ApplyMap('MapTrackNumber',TagId) as TrackNumber,
Val as Capacity_Val,
_date as TimeStamp,
StationId;
SQL SELECT TagId, b.id, _date, b.Val, a.StationId FROM (SELECT * FROM esync.`esync_tags` WHERE Name LIKE "%_Capacity" AND StationId IN $(IdCustomer)) a, esync.`esync_tagshistory` b WHERE a.Id = b.TagId AND b.id > $(vTagsHistoryID);

//  ========> End of repeated section

// Only for test:  (files are stored apart so that I can analyze these files)
TRACE >>>>> Store TempProcessdata into qTempProcessData.qvd;
STORE TempProcessData INTO $(vQVDpath)qTempProcessdata.qvd;

TRACE >>>>> Load existing Processdata and concatenate new data from TempProcessData;

OrderProcessData:
LOAD * FROM $(vQVDpath)qProcessdata.qvd (qvd);
CONCATENATE LOAD
StationId, TagId, EsyncTagsHistoryID, MachinePart, MachineNumber, LineNumber, TrackNumber, year(TimeStamp) as SampleP_Year, month(TimeStamp) as SampleP_Month,  week(TimeStamp) as SampleP_Week, weekday(TimeStamp) as SampleP_Weekday, day(TimeStamp) as SampleP_Day,
hour(TimeStamp) as SampleP_Hour, TimeStamp,
Autostart_Val, ProductionDuration_Val, Counter_Val, Production_Val, Capacity_Val, MaxCapacity_Val,
CrateType_Val, DowntimeHour_Val,
TempS1_Val, TempS2_Val, TempS3_Val, TempS4_Val, TempS5_Val, LevelS1_Val, LevelZ1_Val, LevelS2_Val,
LevelZ2_Val, LevelS3_Val, LevelZ3_Val, LevelS4_Val, LevelZ4_Val, LevelS5_Val, LevelZ5_Val
RESIDENT TempProcessData;

ProcessData:
NOCONCATENATE LOAD * RESIDENT OrderProcessData
ORDER BY TagId, TimeStamp;
DROP TABLE OrderProcessData;

TRACE >>>>> Store Processdata;
STORE * FROM ProcessData INTO $(vQVDpath)qProcessdata.qvd;
DROP TABLE ProcessData;

// Determine now the latest/highest record ID fetched in this run. And use that as start pointer the next time.
TRACE >>>>> used Tags historyID: $(vTagsHistoryID);
LastEsyncTagsHistoryID:
LOAD
MAX(EsyncTagsHistoryID) as EsyncTagsHistoryID
FROM $(vQVDpath)qProcessdata.qvd (qvd);
LET vTagsHistoryID = peek('EsyncTagsHistoryID',0,'vTagsHistoryID');
STORE LastEsyncTagsHistoryID INTO $(vQVDpath)qTagsHistoryID.qvd;

TRACE >>>>> Next Tags historyID: $(vTagsHistoryID);
DROP TABLE LastEsyncTagsHistoryID;

// End of the script

 

marcus_sommer

I don't see that there is an obvious problem with the script. Probably not related to your issue but sometimes the cause of problems is the use of variables because they could contain unexpected values. This means in cases in which it comes to strange results the variables should be monitored.

Beside this I think you need really to check the number of records in each step. Often it's also helpful to include recno() and rowno() within the Qlik loadings.

Like mentioned the most likely cause is a locking within the database. Maybe you could enforce such locking by an intentionally access from you and some colleagues on some of the records - if those then are missing you found the cause without the more or less expensive checks for number of records in all those loadings.

- Marcus

Jan-Arie
Contributor II
Contributor II
Author

Hi Marcus,

I am going to add logging of all variables and counters. Then I can see at which point there are differences.
Thanks for your help.

Jan-Arie