Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
AnjaliRai
Contributor III
Contributor III

QVD's getting locked during Job Reloads

Hello All,

Hope can get  a help here!

We are facing an issue in Qlik Sense Platform where few of the jobs are failing as the QVD files are getting locked.
Few observations:
1. The jobs are failing with an error (Failed to open file in write mode). When checked from Server, the QVD files are opened in READ mode only, still job fails.
2. The jobs fails with an error (Sharing Violation) sometimes. When checked from server, the files are open in WRITE mode but no other app/tool is using the files. None of our apps write back to QVD's, only reads.
3. No other jobs are using those QVD files, hence cannot be in use/locked by other applications or any tool.
4. When 'Qlik Engine service' is restarted, jobs are re-triggered. The issue is fixed.
5. Antivirus exclusion is in place, as suggested by Qlik.

This issue is quite frequent with our platform now a days.

Work around: We kill the the locked file session from server and re-trigger the job again. Or, we need to restart the Qlik Engine service to release the locking.

Qlik Sense Version: April 2020-13.72.3

Please suggest !

Anjali Rai
28 Replies
AnjaliRai
Contributor III
Contributor III
Author

Hi Arnado,

Many thanks for detailed explanation. I have added my comments, hoe fully we can see if we can narrow down the issue to solve it! 🙂

This issue is quite frequent with our platform now a days. It seems the frequency of the issue is increasing as well as there was a time when the issue never happen, so, Is it possible for you to identify a precise starting date for the issue? if that is possible, then Is it possible to know what enhancements were added to the Qlik Environment (even server environment)
--Yes i remember, when we were in QS version 'Feb 2020', we never faced this issue ( but to add, we had very few apps that time in platform, we were in process of migration).
We upgraded in May , QS version to 'April 2020' and there it started, earlier it was happening only for one stream apps ( which loads data country wise)
but now, it started happening with many more apps ( they also load data country wise).

We kill the the locked file session from server and re-trigger the job again. Does it work? it won't make much sense if it does, but it will help us to know what actually happen when killing-restarting the offending job.
---Yes, wherever locking happens and job fails, we login to server to see who is accessing the file at that moment in real time scenario. It is always the Service account ( all jobs run via service account over server). All we do is close the session of lock file from server and rerun the failed job due to locking issue.
It resolves the issue but not very feasible option to do always.

we need to restart the Qlik Engine service to release the locking. This is the most logical approach, but restarting the Qlik Engine service you are clearing any lock, 100% guarantee.
-- Yes it helps but very difficult to do during business hours. Since many apps run via one node, and restarting the service will impact all jobs at one go, impacting business. So, we avoid this most of times.

Facts:

Those QVDs are updated at least once a day, in that case, some jobs wrote to them;
Regarding your crashing (the one crashing due to the lock) applications; Do you know the offending locked QVD file triggering the crash? (by name, by example: Customer.qvd) Do you know all the QVDs they are reading; if you know all the read-only QVDs involved, you should also identify all the applications writing to them (most likely one application should be doing so).
I suspect one of the applications writing to these QVDs is crashing out, or entering a loop holding the QVD open; you should identify these applications and review their logic. (If you know the date when this issue started, and keep track of the Qlik Application changes -change log-, you may closely identify a culprit).
--I will confirm this with developers once and will get back with more details.
All i know is yes one of them writes to QVD's and rest all apps just read the QVD's.

How often are the affected applications crashing? once a day out of 10 daily runs, or all the time.
-- It gets lock all sudden. Looking past records, in a week 3, 4 times happens.

Once you restart the Qlik Engine service when the error happen again, the same day or the next day?
--When we restart engine, all locking release, re trigger job succeeds. No, not exactly we have noticed after service refresh , any error happens. It happens all sudden, now happening then many be tomorrow or end of week. No pattern i could notice as of now.

I had a similar issue about two years ago, our implementation had jobs running hourly, and some of them ran other jobs in a chain; the issue was related to two jobs competing to write to the same QVD; We ended up implementing a sort of semaphore, but we need to understand your scenario better before find out the best course of action; right know we are trying to diagnostic the issue and understand your diagnosis as well.
--Can you guide me a bit for the 'Semaphore implementation' and how it helped in the issue?

Again thank you for helping me out . Please suggest!

Regards,

Anjali Rai

 

Anjali Rai
AnjaliRai
Contributor III
Contributor III
Author

Yes Markus, in Qlikview we can capture with the process id which job is running and accessing which files over server, but Sense, yes all different.

All we can capture in Task manager is : the qlik services and their resource consumption. No process id , no qvd.exe can be spotted for Sense.

Anjali Rai
AnjaliRai
Contributor III
Contributor III
Author

Hi Brett,

Not at all, even I am new to Qlik Sense ( few months started working on here). I work on both Qlik View & Qlik Sense so yes can say concept differs. 🙂

I have raised a case to Qlik but no response yet, so thought of taking suggestions and discussion on the community portal if i can get help.

Thankyou!

Anjali Rai
ArnadoSandoval
Specialist II
Specialist II

Hi @AnjaliRai 

Thank you for your replies (to @marcus_sommer  and @Brett_Bleess  as well), they help a lot at understanding your issue "QVD's getting locked during Job Reloads"; I am familiar with locked QVDs from my previous job with a QlikSense server, a very elusive and hard to diagnostic issue, specially because QlikSense seems uncooperative telling us the offending job, BUT when a reload jobs find a locked QVD the process abort with an error and you can find out the offending job at the QMC interface (I found this behaviour in early 2018), which your replies  fail to reference, which lead me to rephrase your issue to: "Job Reloads hang with a never ending loop", never ending loops are a.k.a. "infinite loops"; these "infinite loops" main feature is that they do not trigger any error, and as they never end! the only way to stop run away jobs (another way to identify infinite loops) is by restarting 'Qlik Engine service'. 

Based on the re-definition of your issue: "Job Reloads hang with never ending loops", let's suggest some actions:

The first thing to be aware of is that "Any QlikSense load's and reload's jobs write to log files" they are located at the QMC server, and they have silly looking names;  this article Storage  (you need to find those log files, somewhere in the QMC server, the article suggest this folder: %ProgramData%\Qlik\Sense\Log folder but I am aware that is not always the case; these topic Physical Location of QMC Log files also cover QMC Log files; the logs file looks like this:

QMC-LogFile-01.png

These log files silly looking names have three parts: (1) the application id, under the red line; the application id is shown by the QMC Apps, it is one of its columns (attribute); (2) the execution timestamp in UTC time - those times are not your local time, they are UTC time, keep that in mind when working with log files; I am in Australia, we have about a 10 hours difference with UTC time, for us, or at least for me, that is a silly looking time; (3) all log files end with .log; NOTE: QlikSense Desktop's log files have friendly names.

The log files are important because they will help us to pinpoint the culprit job falling into a never ending loop (we suspect that at the moment).

If we open any of these log files, they look like this:

QMC-LogFile-02.png

They are the messages shown when we run the Load Script; any log file start with the message "Execution started." and they finish with the message "Execution finished." I do not expect the offending process being able to write the last message, as it is trapped by the "never ending loop".

You need to find the folder, in your QMC server containing these log files, they could be at the folder identified on my first link, but be prepared for them to be else where (One again, in 2018 our site has 4 Qlik Sense Server, e.g. 4 nodes, and one of the servers contained these logs files, it was not straight forward to find them, but as always  a DIR /S *.log is wonderful - I suggest using the old DIR /S *.log instead of File Explorer -FE is way too slow-).

Once you happily located the folder containing the log files, it is time to analyze them, in other words, let's search for error messages and perhaps log files missing the message "Execution finished."; you need these batch files to help with this tasks:

findstr /M "Error:" *.log > C:\Temp\Script-Errors.log

findstr is a Windows DOS command, it finds the text ("Error:") inside the log files (*.log); the switch /M tells the command just to print the name of the file containing the text, we redirect its output to the folder Script-Errors.log (located at the C:\Temp folder) for easy analysis. 

I named the batch file:  FindStrError.bat and saved in the folder containing the script log files. The output file (Script-Errors.log) contains the name of any script process that ever generated an error since the last time the log folder was cleaned. (my advice to never clean the log file folders). If you open the log file, and scroll down to its end, you will find a more detailed error message, like the one shown below:

QMC-LogFile-03.png

Now, earlier I wrote about script jobs running an infinite loop, may not generate any error, because they are busy looping around forever, never ending, never being able to write "Execution finished."; in the scenario we need a different batch file using the find command, as shown

C:\Windows\SysWOW64\find /c "Execution finished." *.log > C:\Temp\Script-Finished.log

I had to fully qualify my find command with the folder C:\Windows\SysWOW64\ because I likely installed multiple versions of this command on my laptop; the output file: Script-Finished.log looks like this:

QMC-LogFile-03-5.png

The log files with a zero after their names are the ones missing the message "Execution finished."; I named the batch file "FindFinished.bat"; saved at the folder with the log files. You should run this batch file before killing/terminating 'Qlik Engine service'

I do not know what happens after terminating the 'Qlik Engine service' that is why I suggest to run the batch file before doing its termination,  it will be also good to run it after the termination, remember to change its output file name, so you could compare the before and after termination event.

If you identify the script getting into an infinite loop, you should open its log file, scroll to its end and analyse what it is doing and why it is not finishing! If the script is retrieving data from a backend database, it could be possible that the "QlikSense script" is waiting for the database to release a lock, which is also possible, restarting the 'Qlik Engine service' free Qlik from its wait but it will be an important clue if this is the case.

I hope my suggestions make sense and your team can use them to troubleshot your issue, please get back to this thread sharing their findings; issues like these are interesting challenges, understanding them are priceless.

Hope this help,

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.
ArnadoSandoval
Specialist II
Specialist II

@AnjaliRai 

My reply include two DOS batch scripts that should help you:

  1. Script Jobs having errors.
  2. Script Jobs never ending.

It covers two possibles reasons for your issue. The main task is finding the location of the log files in your QMC Server;  each script generates an output file that should be analyzed.

Please keep in mind I faced an issue like yours in 2018, and I do not have access to a QMC server at the moment to give you better advice.

hth 

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.
marcus_sommer

I agree with Arnado that there should be some hints within the various log-files. You should take a look if all possible log-files are enabled and/or if their level of details could be increased. In addition to his suggestion to search for certain error/success-messages with the help of batch-files I recommend to load them (partly) into Qlik. And here I do mean nearly all log-files (script-log, task-logs, event-log, … and also the various OS logs) from all nodes. The idea behind it is to unify all timestamps and to make the chains of events visible over the time to find the pattern which caused the issue.

Of course it's quite a lot of work but in the mid- and long-term you will benefit from such a systematically approach. Further I think you don't need to develop everything from the scratch because AFAIK there exists already various governance tools/examples in regard to the log-files. This means with it you could see which Qlik / OS / third party tools tasks run when and how long did it take (min/max/avg time if it worked and if it failed). I assume there could be some interesting pattern to be found. Several years ago I did something similar with QlikView because of seemingly randomly failing tasks but nevertheless there were a pattern and I detected a clash between QlikView, the windows shadow copies and another backup-routine. You wrote that everything were checked that there isn't such clash possible but without a continuous monitoring any small issue might be easily overlooked.

In regard to my earlier suggestion of monitoring the accessing processes it should be possible to do it also with Sense. Even if Sense it handles a bit different as View and didn't use separate instances of the qvb.exe I assume that it are now separate threads/handles of the service account and should be visible with the process-monitor. For an advanced IT admin it shouldn't be too difficult to look at it. My other suggestions of monitoring the NTFS metadata is there probably much more expensive.

All the above is aimed to find the breaking-point when and where it failed with the assumption that finding this point hints very directly to the real cause. Nevertheless here a reverse approach (with some trial & error and searching within the documentation) to a reason which I could imagine as the causing issue: too many parallel tasks in combination with missing and/wrong set configurations. Depending on the number of nodes and the number of available cores Qlik could run n tasks in parallel and everything above will be queued. There are multiple settings in which this behaviour including various timeout-settings could be configured. My assumption is now that there is something not properly configured and/or that it caused some kind of conflict (you may call it a bug if it’s the case and don't returned a direct error-message) and the origin task won't be closed and remained open and locked therefore the qvd. Especially as you wrote that with the growing of the Qlik environment this failure increased points to a cause in that direction.

- Marcus

AnjaliRai
Contributor III
Contributor III
Author

Hi Arnado,

Again , thank you for the detailed analysis and explanation on the issue!
I will give a try to both the scripts if the issue happens next and will connect with you in any trouble. Hope that works.

Anjali

Anjali Rai
ArnadoSandoval
Specialist II
Specialist II

Hi @AnjaliRai 

Cool, yes, now we have to wait for the issue to happen again, and hopefully those scripts should help to pinpoint it!

Good Luck, I will be waiting for the outcome!

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.
AnjaliRai
Contributor III
Contributor III
Author

Hi Arnado,

Hope you are doing great!

We faced the issue again today with two of our jobs. I ran both the scripts before and after doing service restart and captured the needed details, as suggested by you.

Trying to attach the files here, but no success ( tried in almost every format supported here). May i send them over email, it that works.

Please help me going further with it if possible.Thanks again for all the help! 🙂

--Anjali

 

Anjali Rai
Brett_Bleess
Former Employee
Former Employee

@ArnadoSandoval  Just shouting out, as not sure if you will see things otherwise, you guys could use private message feature potentially, but might be a size limit impacting things here, sadly I do not know what the attachment limit is in Community, but that is the only thing of which I can think.  Sorry I am not more help on this one, if you do see any particular messages in any of the logs, feel free to post those, I might be able to track something down on those potentially.

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.