Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
Digvijay_Singh

Concurrent reload of multiple apps having data model copied through binary statement

Hi Friends,

I got core data model app with the disk size of approx 7GB. It has been utilized by 20-25 front-end apps, standard process is to do binary load of the core model in the front-end apps and then drop some of the tables from it if they are not relevant to a front-end app. So front-end apps when reload completes are not that big and mostly are less than 1 GB of disk size.

I task chained my front-end apps with the core data model so when care model reload finishes, it triggers all the 20-25 front-end apps for reload(We have configured 14 concurrent reload in QMC, rest remains in queue), so effectively the first statement in the script, the binary statement is executed in all these apps at the same time.

My question is, would I have huge usage of memory temporarily when the binary load statement is executed (14 times effectively in 14 different front-end apps)? Would I have 7GB size data model in all of my front-end apps before my drop statements are executed.

We have got >100GB RAM with 16 Cores, we have observed  slowness (even Server crashes sometimes) when the core model load finishes and front-end apps reload starts. So trying to understand if I should start widening my task chain so that less no of apps are reloaded concurrently.

@marcus_sommer 

@rwunderlich 

@stevedark 

Thanks in advance for your help!!

 

Labels (1)
3 Solutions

Accepted Solutions
Mark_Little
Luminary
Luminary

HI @Digvijay_Singh 

Certainly not an expert in this case, but i say your assumption is case. I would think that as far as the Server is concerned they are different APPs with different data. 

View solution in original post

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @Digvijay_Singh 

Are the various front end apps chained one after the other, or are they all chained to the one model load?

If they all go concurrently there will certainly be a point where all apps have all of the data and that is going to put a strain on things.

Also, I am not convinced that dropping the tables will necessarily give an app that is as small as if the data were never there in the first place. This would need testing though to prove or disprove.

I tend to never us binary loads and opt instead for a QVD layer. This way in the extraction layer each file can be written to QVD when it is completed and then dropped from memory. This way the extraction layer never reaches the 7GB size, and it is empty when it is finished.

Each app can then cherry pick what QVDs it needs. I'm not sure how it will compare time wise to the running of the reload chain as you currently have it (it will probably take a little longer) but it will certainly be more efficient in memory use.

If you want a 'quick fix' to this to go from what you have to what you need, you will be able to find script on Qlik Community that loops through every table in your extraction data model and writes it to QVD then drops it. In each app you could have a spreadsheet or inline table that lists each QVD required for the app that is then looped around with a LOAD * being done from each QVD. Having a slightly more bespoke approach though will allow you to pick fields from QVDs rather than loading all, and by having a single WHERE EXISTS statement on each QVD you can also cut down the number of rows loaded into each app if required.

This blog post may also be of interest on this topic:
https://www.quickintelligence.co.uk/qlikview-optimised-qvd-loads/

Hope that this helps.

Kind regards,

Steve

View solution in original post

marcus_sommer

I think there are two aspects which could improve the reload-behaviour within your environment.

The most obvious would just be not to run all these binary-loads in parallel else sequentially. Parallel running tasks will not only increase the max. resources of needed CPU + RAM within the environment else also the OS is challenged to handle all these threads and switching between multiple I/O. Further the network/storage must be also capable to provide/store all these data without resulting in longer waiting times and/or creating a long queue. Therefore parallel tasks must not have always benefits else it will depend on the number and kind of tasks and the biggest bottleneck within the system.

The next would be to optimize the reload-process itself. Personally I use a lot of binary loads - mostly 1:1 just as a replicate which then contained the UI and the final section access (the table exists already within the data-model but the document activation is only within the user-report) - and only in some cases to adjust and/or to add some data.

Of course it's also possible to remove unwanted data after the binary load but I'm not sure if it's expedient to load around 7 GB of data and then removing 80% - 90% again - especially as this approach forced Qlik to re-build the data-model which is probably the main-benefit of a binary-load against loading optimized from qvd's.

Like already hinted I love the binary feature and therefore I wouldn't go so far like @stevedark to skip it completely and loading everything optimized from qvd's else rather to provide a binary load of around 1 GB which alle applications have in common and then to add the app-specific data with optimized qvd-loadings.

Beside this I suggest to check if all your data are suitable stored within qvd's. This means that they are appropriately granular sliced in YYYYMM and/or countries/channels/products or similar and that these information are included within the file-names. Background is to be able to pick the right ones within a filelist-loop without touching the irrelevant ones and also without using the only possible exists() parameter to keep a qvd-load optimized (the filelist-efforts from OS are not worth to mention and if you could keep by a YYYYMM granularity over the last n years the extra work of initialising n loads is much lesser as the benefits to load the data extremely flexible).

In the end is probably no perfect solution else you will need to balance the various technically possibilities and their performance/resource impact against the administrative needs to develop, maintain and control the application and their workflow.

- Marcus

View solution in original post

7 Replies
Mark_Little
Luminary
Luminary

HI @Digvijay_Singh 

Certainly not an expert in this case, but i say your assumption is case. I would think that as far as the Server is concerned they are different APPs with different data. 

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @Digvijay_Singh 

Are the various front end apps chained one after the other, or are they all chained to the one model load?

If they all go concurrently there will certainly be a point where all apps have all of the data and that is going to put a strain on things.

Also, I am not convinced that dropping the tables will necessarily give an app that is as small as if the data were never there in the first place. This would need testing though to prove or disprove.

I tend to never us binary loads and opt instead for a QVD layer. This way in the extraction layer each file can be written to QVD when it is completed and then dropped from memory. This way the extraction layer never reaches the 7GB size, and it is empty when it is finished.

Each app can then cherry pick what QVDs it needs. I'm not sure how it will compare time wise to the running of the reload chain as you currently have it (it will probably take a little longer) but it will certainly be more efficient in memory use.

If you want a 'quick fix' to this to go from what you have to what you need, you will be able to find script on Qlik Community that loops through every table in your extraction data model and writes it to QVD then drops it. In each app you could have a spreadsheet or inline table that lists each QVD required for the app that is then looped around with a LOAD * being done from each QVD. Having a slightly more bespoke approach though will allow you to pick fields from QVDs rather than loading all, and by having a single WHERE EXISTS statement on each QVD you can also cut down the number of rows loaded into each app if required.

This blog post may also be of interest on this topic:
https://www.quickintelligence.co.uk/qlikview-optimised-qvd-loads/

Hope that this helps.

Kind regards,

Steve

marcus_sommer

I think there are two aspects which could improve the reload-behaviour within your environment.

The most obvious would just be not to run all these binary-loads in parallel else sequentially. Parallel running tasks will not only increase the max. resources of needed CPU + RAM within the environment else also the OS is challenged to handle all these threads and switching between multiple I/O. Further the network/storage must be also capable to provide/store all these data without resulting in longer waiting times and/or creating a long queue. Therefore parallel tasks must not have always benefits else it will depend on the number and kind of tasks and the biggest bottleneck within the system.

The next would be to optimize the reload-process itself. Personally I use a lot of binary loads - mostly 1:1 just as a replicate which then contained the UI and the final section access (the table exists already within the data-model but the document activation is only within the user-report) - and only in some cases to adjust and/or to add some data.

Of course it's also possible to remove unwanted data after the binary load but I'm not sure if it's expedient to load around 7 GB of data and then removing 80% - 90% again - especially as this approach forced Qlik to re-build the data-model which is probably the main-benefit of a binary-load against loading optimized from qvd's.

Like already hinted I love the binary feature and therefore I wouldn't go so far like @stevedark to skip it completely and loading everything optimized from qvd's else rather to provide a binary load of around 1 GB which alle applications have in common and then to add the app-specific data with optimized qvd-loadings.

Beside this I suggest to check if all your data are suitable stored within qvd's. This means that they are appropriately granular sliced in YYYYMM and/or countries/channels/products or similar and that these information are included within the file-names. Background is to be able to pick the right ones within a filelist-loop without touching the irrelevant ones and also without using the only possible exists() parameter to keep a qvd-load optimized (the filelist-efforts from OS are not worth to mention and if you could keep by a YYYYMM granularity over the last n years the extra work of initialising n loads is much lesser as the benefits to load the data extremely flexible).

In the end is probably no perfect solution else you will need to balance the various technically possibilities and their performance/resource impact against the administrative needs to develop, maintain and control the application and their workflow.

- Marcus

Digvijay_Singh
Author

Thanks so much @stevedark  for sharing your thoughts on this.

To answer your question in the beginning, they are all chained to the one data model.

The idea was to keep all the complex business logic in the core model so that front end apps refresh faster(Core model loads in few secs!) and only have what is needed in the app but its been 4 years now and core model kept on growing bigger and bigger. Also keeping all the business logic at the central place helps in achieving integrity, earlier front-end apps were not having consistent business logic because logic was duplicated across different apps and developers used to forget updating some of the apps. Surely the app governance process wasn't helping but after building core model we had one single truth available in the core model.

We already have QVD layer and core model is using QVDs to build the new centralized data layer having all the complex business transformation. The idea was to keep QVD and the front-end data layer execution as light as possible.

I liked your ideas shared at the end and will propose them internally, thanks a ton for those ideas!!! I think I was right that if 10 front-end apps are concurrently executing binary load of 7GB app, it would consume a huge amount of RAM momentarily until the drop statements are executed to clear unused tables in the front-end apps but I am not quite sure how long Sense keeps the stuff in the RAM loaded through binary load? Do the tables instantly removed from RAM when drop statements are executed?

Appreciate your time on this! It will certainly help in deciding next course of actions!!

 

Thanks,

Digvijay

Digvijay_Singh
Author

Thanks @Mark_Little, I feel the same, I would reorganize the task chain as of now to have fewer apps being reloaded concurrently until we decide on some permanent solution.

Digvijay_Singh
Author

Thanks so much!! @marcus_sommer for so many great ideas!!

Looks like we need to break our core model in 2-3 different apps, kind of reevaluate the business logic viz-a-viz its usage in front-end apps and  to keep the size reasonable,  and optimize the task chain. Will see if some front-end apps can be fed QVDs directly where business logic is not feeding multiple apps. Its been 3-4 years and its time to reorganize stuff!

Thanks again for your time and support!!!!

Digvijay,

 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

@stevedark "Also, I am not convinced that dropping the tables will necessarily give an app that is as small as if the data were never there in the first place."

Here's my understanding of how this works. Dropping a table will release the index (records) space allocated to the table. It will release the symbol space only if the field(s) is not use in any other table being kept. 

If you drop a table, make sure you drop key fields from the remaining data model that were only used to point to the dropped table. That will release the symbol space for the field. It will *not* release the index space for this field in the remaining table.  If you want to release the space allocated to the dropped fields, you must "Load * Resident.." the remaining table(s). 

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com