Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
dselgo_eidex
Partner - Creator III
Partner - Creator III

Is it a bad idea to split my 3.5GB app into ~500 copies of itself with reduced data?

Hello, this question needs some explanation before I begin.

I manage a web application that uses the Qlik Sense Engine and Visualization APIs in order to provide dashboards to our clients. We currently use Section Access to restrict the data that our clients can see, and our datamodel is setup in a way that there isn't much crossover in what data our users are allowed to see. So far, Section Access has allowed us to partition the app according to our needs.

The issue with this is that our data model has grown to a size of approximately 3.5GB and takes about an hour to reload. As the amount of data increases, it takes longer and longer for our users to load into our app. Right now, it takes around a full minute for one of our users to get into our application. Once they are in and their data has been reduced, it isn't so much of an issue. But that loading time has started to frustrate our users.

In order to resolve this issue, I was considering using a similar method as the Loop and Reduce functionality that was available in Qlik View. While it isn't a feature in Qlik Sense, I figured that I might be able to produce something similar.

Essentially, my plan is to partition our app into several copies, one for each of our users (approximately 500 users). Each of these apps will have the same Load Script, but would include a variable specifying what data that user has access to. That variable will be used in a WHERE clause when loading from our database so it will only load in the data that user is allowed to see. This should drastically reduce the amount of data each app contains, thereby reducing the amount of time it takes for a user to load into our web app. It will also drastically reduce the load time for each individual app (most likely a max of 2 minutes per app).

What I need to know is if this is a really bad idea that is doomed to fail. Below are some issues I can foresee some people raising and how I plan to address them:

  • Creating/Managing so many apps could be very cumbersome
    • I plan to automate this process programmatically. It shouldn't be too difficult to create an app, edit its Load Script, reload it, and publish it to a stream via the Engine API.
  • Duplicating an app that many times would increase memory usage since every sheet/app object would get copied as well
    • This isn't an issue because our app doesn't actually have any app objects saved to it. We use the CreateSessionObject method to generate transient app objects on the fly.
  • This could result in data being duplicated across several different apps, increasing the overall storage and memory usage
    • As I said above, there really isn't much overlap in the data that our users need to access. In 95% of use cases, each of our users are the only ones that have permission to view that partition of data. There might be a slight increase in memory and storage, but it should be negligible.
  • Why not just use ODAG?
    • I've already looked into ODAG as an alternative and found that it really does not meet our needs.

Is there anything that I'm not accounting for? Like, is there a limit to how many apps you can have on Qlik Sense? Or something along those lines?

1 Solution

Accepted Solutions
marcus_sommer

I wouldn't say that's a bad idea else it may be the last measurement to avoid hitting the threshold of the response-times for your environment. But IMO it should be really the last action if all other possibilities to optimize the app and the environment are really exhausted.

 Usually it's recommended to develop the datamodel of applications in the direction of a star-scheme but there may be other schemes with a lesser RAM footprint - if this is in the end the biggest bottleneck. But before do a check if really all included data are needed and take a special look on the cardinality of the fields. Probably there are much more occasions to optimize this app. At least the script-runtime should be with (more/better) incremental approaches reduceable.

Beside this I suggest to consider if you could reduce the dataset with a mixed granularity within the fact-tables - maybe something like the last two months are on a atomic level, the next 24 months are aggregated on a daily level and older data on a monthly level. Further if it's practicably to split the application into aggregated/detailed datasets or younger/older periods or similar instead of splitting it for each user.  

In regard to the opening-times within the access point - within the QlikView server is an option available to pre-load an application into the RAM. Maybe in Sense is a similar feature available.

- Marcus

View solution in original post

4 Replies
marcus_sommer

I wouldn't say that's a bad idea else it may be the last measurement to avoid hitting the threshold of the response-times for your environment. But IMO it should be really the last action if all other possibilities to optimize the app and the environment are really exhausted.

 Usually it's recommended to develop the datamodel of applications in the direction of a star-scheme but there may be other schemes with a lesser RAM footprint - if this is in the end the biggest bottleneck. But before do a check if really all included data are needed and take a special look on the cardinality of the fields. Probably there are much more occasions to optimize this app. At least the script-runtime should be with (more/better) incremental approaches reduceable.

Beside this I suggest to consider if you could reduce the dataset with a mixed granularity within the fact-tables - maybe something like the last two months are on a atomic level, the next 24 months are aggregated on a daily level and older data on a monthly level. Further if it's practicably to split the application into aggregated/detailed datasets or younger/older periods or similar instead of splitting it for each user.  

In regard to the opening-times within the access point - within the QlikView server is an option available to pre-load an application into the RAM. Maybe in Sense is a similar feature available.

- Marcus

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I like it. I think it's worth doing a POC.  I did something similar with QV -- which of course had loop and reduce -- and it was very successful.  We also had concerns about duplicating storage for charts, but found overall a large net savings.  Because out of the ~300 customers (apps), only 50-100 were active in a given week.  And for a day, maybe only 25-50 across several time zones.  So the Qlik Server managed the RAM very efficiently throughout the day. 

Customers got good response because we were loading a small app on demand rather than repeatedly executing a Section Access reduce against a large set.

I don't think it would be too hard to craft a custom loop-and-reduce for QS. 

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

dselgo_eidex
Partner - Creator III
Partner - Creator III
Author

Thanks for the suggestions! And I agree, it should be the last resort. Unfortunately, I don't think any of these suggestions will be practical for our application.

I have tried to optimize the app's datamodel as much as possible, but at this point I'm not sure if there is more that I can do. All of the fields that we have in the app are needed and we have taken steps to reduce the data as much as possible while still maintaining the level of granularity that we need.

We have talked about introducing incremental loading and it would help to improve the script-runtime, but at this pointit is not as big of a concern to us. We are more concerned with the efficiency of the app for the client. I included that fact about the long load times to demonstrate the scale of the data.

Mixing the granularity within the fact tables also won't work for us. We've built our app on the idea of showing high-level aggregations while also being able to dig down into individual data points. We also show trended data quite often, so we can't split the dataset into younger/older periods.

mborsadw
Partner - Creator
Partner - Creator

This is an interesting way to deal with large volumes. Was it successful for you? Were you able to support it for a larger set of users (larger than 500 users)?

You mentioned ODAG would not work for you, could you clarify why?