Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All
I am tasked with trying to modify a script that is not loading due to a very large dataset which uses our ITSM's apis to generate a report based on the ticket count,
My predecessor created a load script that pulls loads the data and stores it as a qvd,however ,the script fails to due to the qvd's being in excess of 500 000's files deleted and generally takes over 3 hours to load the qvd's and fails .Is there a way to
1) Make the qvd's load any faster
2) skip past deleted tickets
I have tried using
set ErrorMode=0;
For counter= 541272 to 580000 //update counter for newer tickets
RestAPI:
LOAD
*
FROM [lib:/url.......$(counter)*.qvd]
(qvd)
but it it still takes upto 3 hours and still fails
Just with these information it's not possible to provide detailed suggestions. By the most scenarios it's not possible to fix a weakly designed workflow else it needs to start at the beginning by taken all essential requirements and planning then how to get the raw-data and in how many layers they need to be processed. Everything else is curing on the symptoms but not addressing the cause.
Nevertheless you may take the combined ticket-data and slicing them by a period-field and storing them with this information within the filename. This mustn't be a fully automated loop-routine else such work might be done manually for the initial qvd-creation.
Further like mentioned within the first paragraph you will probably need multiple layer of the processing. This could mean not adding a second loop behind the first one within the application else using n applications for the task. Each one is simpler to develop and to monitor and they could run independently from each other.
Hi,
Please add more context
What exactly are you trying to set up here
is this your Initial data load or Daily Incremental ?
I would suggest segment your QVD into multiple parts example one qvd for each group of dimension instead of one large qvd
Split your Data into multiple QVD's based on suitable dimension
example if Dim in file has values A,B,C,E
split data into qvd's as
1)QvdName_A
2)QvdName_B
3)QvdName_C
QVD's are loading very fast - especially if the loading could be performed optimized, which means that no processing is added on the load. Nothing would be loading faster - not in Qlik and not within any other tool.
If you experienced a rather low load-performance by loading qvd's and load-errors it's caused from the way how the load-workflow is designed and/or by a lack of RAM.
In your case you may avoid load-errors from missing files by including an if-loop which checked the exists of a file with filesize() or filetime() and if the check isn't true() the load is skipped. But a rather low performance will remain because IMO the workflow is badly designed. In your example the loop will iterate nearly 40 k and this means exactly so often are I/O operations to the OS triggered and loads initialized and finished.
Therefore I wouldn't be surprised if more than 90% of the load-times goes to overhead-measures to administer the task instead of really loading the data. I suggest to rethink the whole approach by not dividing the data to single tickets else to period-fields like year-month or maybe categories, channels or similar (slicing of data is ok. and very common but not with hundreds of thousand of files). Of course this requires a lot of work to re-organize the source-loading and intermediate storage-level and appropriate adjusted incremental methods.
What I did was: After each loaded file I wrote another file with just the date and the filenumber I was at.
Then, at the start of the script I looked up that file and decided whether the script crashed and I should start from the number stored in the file, or should start from scratch.
But this solution was just for short of 2000 files, not sure if it works for 500,000 files.
Also: I use this when I am not sure whether the file exists. You can do this trick also for one specific file, in which case you add the filename you were expecting to the path. The 'For each File in Filelist' doesn't crash when there are 0 files.
FOR each File in filelist ('lib://@Input/*.xlsm')
LOAD
Date,
Employee,
Presence
FROM [lib://@Input/$(File).xlsm]
(ooxml, embedded labels, header is 5 lines, table is TimeCard);
Next File;
Capacity is the issue on my end ,I am working alone and dont have the capacity to rewrite the entire load ,the predecessor has since left and i would have to get approval from my cio
Could you perhaps be able to provide a sample code where the qvd's are split into year month catergories?
New to the enviroment and could use as much help as possible
Also worth noting that the load statement is then followed by a next statement which oulls data from 7 other fields ,would a where not exist condition as part of the load statement help? at this point in time i just want to be able to pull the qvd's and implement incremental load ie
load...........
from
next
table1:
load
from,
etc
Just with these information it's not possible to provide detailed suggestions. By the most scenarios it's not possible to fix a weakly designed workflow else it needs to start at the beginning by taken all essential requirements and planning then how to get the raw-data and in how many layers they need to be processed. Everything else is curing on the symptoms but not addressing the cause.
Nevertheless you may take the combined ticket-data and slicing them by a period-field and storing them with this information within the filename. This mustn't be a fully automated loop-routine else such work might be done manually for the initial qvd-creation.
Further like mentioned within the first paragraph you will probably need multiple layer of the processing. This could mean not adding a second loop behind the first one within the application else using n applications for the task. Each one is simpler to develop and to monitor and they could run independently from each other.