Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Loading a lot of data incremental with deletion (1 bill rows +)

Hello,

I am currently working on an application for Qlik sense server, however I got some technical issues / architecture to figure out.

We got an application today, where our customers can create a new project, set different parameters,  calculate the project and get loads of statistical data as output. The output consists of X datasets.

The sum of the output for each project can be between 2-10 million rows - extremely detailed data.

After a project is ran to completion in the application, it has to be in qlik sense within minutes.

The application got approx 1500 projects  today, it grows with 10-15 new projects each day.

Each customer can "recalculate" any project: The old data in qlik has to be replaced by the new
Each customer can delete any project at a given time: The project has to be deleted in Qlik sense
New project data must be in Qlik sense within minutes.


I have created a service for extracting all the data from our application, the problem is how I load this data into qlik sense, taken into account that projects can get new data or be deleted.

I have tried MongoDB and it works fine with the whole incremental process, including deletion of changed/deleted projects, but as the size of the collections grows the load time is very slow.  I have tried the Qlik connector and the official connector. The official connector was a bit faster than the Qlik Mongo Connector. Load 200-300k rows / sec.

I have also tried with flat files, which is fast, but I have no way to delete projects here. Load 4 mill rows / sec.

What is the best strategy to implement? The data can get into billions rows quite easily, and the data can be changed/deleted anytime by the customer.

Would it be better to "split" the data into different apps for each customer? So each load would be less data?

1 Reply
marcus_sommer

The fastest way to get data into qlik is the use of optimized qvd-loads. This meant you will need to store the data as qvd-files and only the new/deleted data/projects will be queried from the database and also stored as qvd. For this each project need a unique ID. I think the various links to incremental loads and optimized loadings exists() here will be quite useful for you:

Advanced topics for creating a qlik datamodel

- Marcus