1 Reply Latest reply: Sep 8, 2017 8:09 AM by Marcus Sommer RSS

    Loading a lot of data incremental with deletion (1 bill rows +)

    Freddy Baardsen

      Hello,

       

      I am currently working on an application for Qlik sense server, however I got some technical issues / architecture to figure out.

       

      We got an application today, where our customers can create a new project, set different parameters,  calculate the project and get loads of statistical data as output. The output consists of X datasets.

       

      The sum of the output for each project can be between 2-10 million rows - extremely detailed data.

       

      After a project is ran to completion in the application, it has to be in qlik sense within minutes.

       

      The application got approx 1500 projects  today, it grows with 10-15 new projects each day.

       

      Each customer can "recalculate" any project: The old data in qlik has to be replaced by the new
      Each customer can delete any project at a given time: The project has to be deleted in Qlik sense
      New project data must be in Qlik sense within minutes.


      I have created a service for extracting all the data from our application, the problem is how I load this data into qlik sense, taken into account that projects can get new data or be deleted.

       

      I have tried MongoDB and it works fine with the whole incremental process, including deletion of changed/deleted projects, but as the size of the collections grows the load time is very slow.  I have tried the Qlik connector and the official connector. The official connector was a bit faster than the Qlik Mongo Connector. Load 200-300k rows / sec.

       

      I have also tried with flat files, which is fast, but I have no way to delete projects here. Load 4 mill rows / sec.

       

      What is the best strategy to implement? The data can get into billions rows quite easily, and the data can be changed/deleted anytime by the customer.

       

      Would it be better to "split" the data into different apps for each customer? So each load would be less data?