Im trying to get my head around this problem. I have a production enviroment with 4 QVS and 3 QDS. There are over 100 jobs currently almost all of them have some sort of depeency on another job witch creates huge chains of jobs.
There are also alot of jobs beeing triggeded by timer early in the morning and some later in the evning.
The problem is that some jobs are not triggered. No fail, no log, no nothing. So every morning i need to check this huge list of jobs and make sure that all of them run every day. Right now i thnk i have to manually start 10-15 jobs or somthing. And when one is not triggered, the other 10 jobs under it, doesnt run. Its both time triggered jobs and sometimes jobs that wait for another job to finish. Some of them has been workingh befor and just stoped working. Some have been modified with a delay (to spread out the jobs using the load balacer) and then gotten this problem.
The servers are running version 11 R2. Any help would be very helpful. Thank you.
There may be cases when dozens of tasks are triggered at the same time, that the QDS or QMS does not have allocated space for them to run, so they keep trying to get triggered, but a timeout happens, and the task never runs, and of course, this task has not failed, since it has not been started.
Check the QMC, System, Setup, Distribution Services, Advanced tab, and make sure that the numbers under "Max number of simultaneous QlikView engines for distribution:" are the lowest of 10 or the number of cores -1 (for 32 cores, 31 concurrent tasks).
I actually increased the number from 10 til 15 yesterday and it had no effect. Is this number supposed to be the number of cores in the cluster or for the singel machine? Say that i have 16 cores on three machines. Should this be set to 15 or to 47?
One more thing. Is there anyway to be sure that the above happend? Perhaps a note in a log? Somthing like: "No egines availible. retrying" or somthing? If there is, where can i find it? And if there isnt, it really should!
That is sort of a rule of thumb based on my experience. Consulting Services will be able to give you a more accurate answer for your specific scenario. The real limit is actually set by the OS, and this will depend on the memory available, rest of processes running, number of maximum batch processes allowed (e.g.: account limits), etc.
And yes, there actually are logs to search for those kind of errors. Go to the folder
Where X is the number of node of QDS (if you only have 1, it will be 1) and Date is the date the task has been triggered, in the form YYYYMMDD\hhmmss for example 20130806\103500 and see the TaskLog.txt and search for "Failed to allocate new QlikView Engine". I'd recommend to increase the verbosity level of the logs while you search.
are you running 11.0 SR2 ? Please provide exact version number, as there has been improvments when it comes to taskchaning and triggers.
Running 100task during 24hours demands also a really good timeplanning for schedulation. This means that you have to have in mind how long a task runs etc. This to eliminate that you have a very high amount of task running at one time.
Alright. But the setting "Max number of simultaneous QlikView engines for distribution" is that for the cluster, or for avery machine in the cluster? If its for the whole cluster, then i guess 15 is a bit low. I read somwhere that 9 is the OS max. Witch would then translate to at least 9 per server (27 in my case then)
I also checked the logs, but since the job has not been triggered, there is no log...