Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Nku16
Contributor II
Contributor II

Skipping past qvd not found

Hi All

I am tasked with trying to modify a script that is not loading due to a very large dataset which uses our ITSM's apis to generate a report based on the ticket count,

My predecessor created a load script that pulls loads the data and stores it as a qvd,however ,the script fails to due to the qvd's being in excess of 500 000's files deleted and generally takes over 3 hours to load the qvd's and fails .Is there a way to 

1) Make the qvd's load any faster
2) skip past deleted tickets



I have tried using

set ErrorMode=0;
For counter= 541272 to 580000 //update counter for newer tickets

RestAPI:
LOAD
*
FROM [lib:/url.......$(counter)*.qvd]
(qvd)

 

but it it still takes upto 3 hours and still fails



Labels (2)
1 Solution

Accepted Solutions
marcus_sommer

Just with these information it's not possible to provide detailed suggestions. By the most scenarios it's not possible to fix a weakly designed workflow else it needs to start at the beginning by taken all essential requirements and planning then how to get the raw-data and in how many layers they need to be processed. Everything else is curing on the symptoms but not addressing the cause.

Nevertheless you may take the combined ticket-data and slicing them by a period-field and storing them with this information within the filename. This mustn't be a fully automated loop-routine else such work might be done manually for the initial qvd-creation.

Further like mentioned within the first paragraph you will probably need multiple layer of the processing. This could mean not adding a second loop behind the first one within the application else using n applications for the task. Each one is simpler to develop and to monitor and they could run independently from each other.

View solution in original post

9 Replies
vinieme12
Champion III
Champion III

Hi,

Please add more context

What exactly are you trying to set up here

is this your Initial data load or  Daily Incremental ?

  I would suggest segment your QVD into multiple parts example one qvd for each group of dimension instead of one large qvd

 

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
Nku16
Contributor II
Contributor II
Author

I am first trying to get past the initial load data as it fails due to the quantity of the qvd's exceeding 500 000,and then should that be a success i trying to then implement incremental load,
vinieme12
Champion III
Champion III

Split  your Data into multiple QVD's based on suitable dimension

example if Dim in file has values A,B,C,E

split data into qvd's as   

1)QvdName_A

2)QvdName_B

3)QvdName_C

 

 

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
marcus_sommer

QVD's are loading very fast - especially if the loading could be performed optimized, which means that no processing is added on the load. Nothing would be loading faster - not in Qlik and not within any other tool.

If you experienced a rather low load-performance by loading qvd's and load-errors it's caused from the way how the load-workflow is designed and/or by a lack of RAM.

In your case you may avoid load-errors from missing files by including an if-loop which checked the exists of a file with filesize() or filetime() and if the check isn't true() the load is skipped. But a rather low performance will remain because IMO the workflow is badly designed. In your example the loop will iterate nearly 40 k and this means exactly so often are I/O operations to the OS triggered and loads initialized and finished.

Therefore I wouldn't be surprised if more than 90% of the load-times goes to overhead-measures to administer the task instead of really loading the data. I suggest to rethink the whole approach by not dividing the data to single tickets else  to period-fields like year-month or maybe categories, channels or similar (slicing of data is ok. and very common but not with hundreds of thousand of files). Of course this requires a lot of work to re-organize the source-loading and intermediate storage-level and appropriate adjusted incremental methods.  

Lemac
Contributor III
Contributor III

What I did was: After each loaded file I wrote another file with just the date and the filenumber I was at. 

Then, at the start of the script I looked up that file and decided whether the script crashed and I should start from the number stored in the file, or should start from scratch.

But this solution was just for short of 2000 files, not sure if it works for 500,000 files. 

 

Also: I use this when I am not sure whether the file exists. You can do this trick also for one specific file, in which case you add the filename you were expecting to the path. The 'For each File in Filelist' doesn't crash when there are 0 files. 

 

FOR each File in filelist ('lib://@Input/*.xlsm')


LOAD
Date,
Employee,
Presence
FROM [lib://@Input/$(File).xlsm]
(ooxml, embedded labels, header is 5 lines, table is TimeCard);

 


Next File;

Nku16
Contributor II
Contributor II
Author

Capacity is the issue on my end ,I am working alone and dont have the capacity to rewrite the entire load ,the predecessor has since left and i would have to get approval from my cio 

Nku16
Contributor II
Contributor II
Author

Could you perhaps be able to  provide a sample code where the qvd's are split into year month catergories?

 

New to the enviroment and could use as much help as possible

Nku16
Contributor II
Contributor II
Author

Also worth noting that the load statement is then followed by a next statement which oulls data from 7 other fields ,would a where not exist condition as part of the load statement help? at this point in time i just want to be able to pull the qvd's and implement incremental load ie 

load...........
 from 

next

table1:

load

from,

etc 

 

 

marcus_sommer

Just with these information it's not possible to provide detailed suggestions. By the most scenarios it's not possible to fix a weakly designed workflow else it needs to start at the beginning by taken all essential requirements and planning then how to get the raw-data and in how many layers they need to be processed. Everything else is curing on the symptoms but not addressing the cause.

Nevertheless you may take the combined ticket-data and slicing them by a period-field and storing them with this information within the filename. This mustn't be a fully automated loop-routine else such work might be done manually for the initial qvd-creation.

Further like mentioned within the first paragraph you will probably need multiple layer of the processing. This could mean not adding a second loop behind the first one within the application else using n applications for the task. Each one is simpler to develop and to monitor and they could run independently from each other.