Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
innashna
Contributor II
Contributor II

Reload Long Time

Hello, 

we have a model with as 200 qvds that are created there. It takes around 3 hours to load , each qvd 1 min, but as it is 200 qvds then... I need it to be loaded in 1 h.

Any idea to help me?

Thanks,

Inna

Labels (2)
1 Solution

Accepted Solutions
marcus_sommer

I think there isn't much to optimize - at least not at this point of your workflow / environment. It's rather astonishing that it finished in around 3 hours which hints to a quite powerful machine and/or a rather small dataset.

Therefore I suggest a complete re-design of your ETL. This means to split the task into multiple ones and applying also some incremental approaches - at least by the more heavier parts. You mentioned that they do depend on each other - this may true and might things become more difficult but it's not impossible (nearly all environments have dependencies in their data and they must consider certain load-orders). With it comes not only the possibility to run various parts in parallel else you may also use different time-frames and/or frequencies to refresh the data. Even more important would be the more on readability and maintainability if such a task is distributed to multiple ones.

The next point are the heavy transformations within the loads. Why doing them all here? I think many of them should be done before - by the creation of the qvd's. You may also need some more layer within a multi-tier data-architecture. Quite common is the use of three layers: generator --> datamodel --> report but if there are many depending transformations and/or complex requirements and/or several incremental logics within a chain you may need some more layers.

Of course the use of several layers and the split into multiple reload-tasks adds some overhead because they mustn't only be developed else also be administers but nevertheless this is the easy way. The larger the datasets and/or the more complex the requirements the more division of work should be applied.

Further on a quick glance on the code it looks that there big crosstables are created. Quite often are "normal" tables much more suitable - not only in regard to the performance else also in the handling of the data within the UI. Connected with it are the applied joins which may not really necessary or may be better replaced with mappings and/or associated tables in the later datamodels. Also noticeable are the multiple conditions within the where-clauses which could be probably reduced (especially if some transforming parts are done a step before) - at least the biggest qvd loads should be done in an optimized mode which means that there is only an exists(SingleField) allowed.

- Marcus     

View solution in original post

6 Replies
QFabian
Specialist III
Specialist III

hi!, you can separate the script, in 3 scripts in 3 apps, so then you can schedule them to run in paralell.

The only thing to be worried about is resources (RAM and CPU) from your server.

do you do drop table everytime after every store?

QFabian
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Are your qvd loads optimized or non-optimized? (Post your load statement).

innashna
Contributor II
Contributor II
Author

Hi, I can't, because every load depends on the previous one.

innashna
Contributor II
Contributor II
Author

Hi, I can't, because every load depends on the previous one.

innashna
Contributor II
Contributor II
Author

Hi,

please kindly see attached.

marcus_sommer

I think there isn't much to optimize - at least not at this point of your workflow / environment. It's rather astonishing that it finished in around 3 hours which hints to a quite powerful machine and/or a rather small dataset.

Therefore I suggest a complete re-design of your ETL. This means to split the task into multiple ones and applying also some incremental approaches - at least by the more heavier parts. You mentioned that they do depend on each other - this may true and might things become more difficult but it's not impossible (nearly all environments have dependencies in their data and they must consider certain load-orders). With it comes not only the possibility to run various parts in parallel else you may also use different time-frames and/or frequencies to refresh the data. Even more important would be the more on readability and maintainability if such a task is distributed to multiple ones.

The next point are the heavy transformations within the loads. Why doing them all here? I think many of them should be done before - by the creation of the qvd's. You may also need some more layer within a multi-tier data-architecture. Quite common is the use of three layers: generator --> datamodel --> report but if there are many depending transformations and/or complex requirements and/or several incremental logics within a chain you may need some more layers.

Of course the use of several layers and the split into multiple reload-tasks adds some overhead because they mustn't only be developed else also be administers but nevertheless this is the easy way. The larger the datasets and/or the more complex the requirements the more division of work should be applied.

Further on a quick glance on the code it looks that there big crosstables are created. Quite often are "normal" tables much more suitable - not only in regard to the performance else also in the handling of the data within the UI. Connected with it are the applied joins which may not really necessary or may be better replaced with mappings and/or associated tables in the later datamodels. Also noticeable are the multiple conditions within the where-clauses which could be probably reduced (especially if some transforming parts are done a step before) - at least the biggest qvd loads should be done in an optimized mode which means that there is only an exists(SingleField) allowed.

- Marcus