Hello, We are working on a project which requires to analyse 120 GB ( which will grow to 6TB in couple of years). Our current approach is SQL database ( DW) ---> QVD ---> QS but we are not conformable with this approach as we are doing change capture ( update as insert using exist function based on ID and date) in Qlik and storing the data in QVDs. This process taking 5-6 hrs daily (sometimes more than that) Here are the detail steps Step 1 : extract.qvw -- > exe store procedure and compare with past qvds and update qvd with new data --> 5-6 hrs Step 2 : Transform.qvw --> load all qvds and create a start schema data model -- 30 mins to 1hr Step 3 - QS app - binary load Transform.qvw into QS app -- 10 mins ( QS server has got 1 TB ram) Issues: 1.Process time : Daily 5 -8 hrs process means business won't get data in time, if there are in changes in business logic in SQL store procedure need to full history reload and recreate all qvds. this would take months if not years 😞 2. Scalability: In couple of years it will reach around 6TB ( last 6 years data + future data) , duplicating the same amount of data in QVD format again. infrastructure and architecture teams not happy. 3.Performance of final app : Loading this 6TB into Qliksense memory ( no idea how much memory is required), defiantly there will be performance issues. To resolve these issues, we would like to use big data approach and QABDI. 1. Move all change capture logic and data loads to SQL server --> Teland --> Data lake ( Hadoop) 2. Use QABDI and provide different kind of options like ( live, ODAG, live apps) Is this right approach or are we moving the problem one area to another area?
This does sound like a good approach. However, there are definitely some intermediate steps that you can do to try tackle this data size.
Have you implemented ODAG already? I would suggest doing that now utilizing QVDs. I personally haven't found a use case where a user needs access to that much data at once, so implementing ODAG is usually an easy change management process. If you work on creating your template script to leverage optimized QVD loads, it should still be extremely quick. Also, you can talk directly to your DB or data lake which if they have the resources could potentially be even faster.
The second thing that may be worth looking into is your extract. Do you know why it is taking 6 hours? Is there any way to speed this up?
QABDI sounds like a great use case here and I think is worth doing a POC for it. What's great is everything mentioned above will compound the performance gained through QABDI. It will also help in scenarios where QABDI may not be a great fit or where a feature is currently lacking.
We haven't implemented ODAG yet. We need to access history data ( which is very huge ) for trend analysis and positions and BM analysis. It is taking 6 hours as SQL SP out put is created into multiple small tables ( like data_20190101, 20190102.. ) these are compared with existing QVDs using exists function to add new and updated rows
i have one more question related to QABDI.. does it support MS Azure data lke gen2 ?