Loading a lot of data incremental with deletion (1 bill rows +)
I am currently working on an application for Qlik sense server, however I got some technical issues / architecture to figure out.
We got an application today, where our customers can create a new project, set different parameters, calculate the project and get loads of statistical data as output. The output consists of X datasets.
The sum of the output for each project can be between 2-10 million rows - extremely detailed data.
After a project is ran to completion in the application, it has to be in qlik sense within minutes.
The application got approx 1500 projects today, it grows with 10-15 new projects each day.
Each customer can "recalculate" any project: The old data in qlik has to be replaced by the new Each customer can delete any project at a given time: The project has to be deleted in Qlik sense New project data must be in Qlik sense within minutes.
I have created a service for extracting all the data from our application, the problem is how I load this data into qlik sense, taken into account that projects can get new data or be deleted.
I have tried MongoDB and it works fine with the whole incremental process, including deletion of changed/deleted projects, but as the size of the collections grows the load time is very slow. I have tried the Qlik connector and the official connector. The official connector was a bit faster than the Qlik Mongo Connector. Load 200-300k rows / sec.
I have also tried with flat files, which is fast, but I have no way to delete projects here. Load 4 mill rows / sec.
What is the best strategy to implement? The data can get into billions rows quite easily, and the data can be changed/deleted anytime by the customer.
Would it be better to "split" the data into different apps for each customer? So each load would be less data?
Re: Loading a lot of data incremental with deletion (1 bill rows +)
The fastest way to get data into qlik is the use of optimized qvd-loads. This meant you will need to store the data as qvd-files and only the new/deleted data/projects will be queried from the database and also stored as qvd. For this each project need a unique ID. I think the various links to incremental loads and optimized loadings exists() here will be quite useful for you: