we got a huge amount of reload tasks in qlik sense, in order to parallelize the data load. The short version of this question is, does anyone know a good approach to maintain or reduce these tasks?
The long version 🙂
1: We got two enviroments (Production & Development), which have the same tasks. This shall provide the production data and also fresh development data. Also its hard to separat the task buidling accodring to our app building (i will explain shortly).
2: We got a task chain for every data source, so we can handle changes in the sources better. For example two of these sources will load hourly, while the rest of the tasks is startet daily via webservices.
3: Here i explain our approach for our app/data load build. For almost every required table of one data source a separat app is build. This approach let us load the data parallelized and our scripts are still maintainable.
You can see the coarse structure of our taks chain in the following image.
This however means that our taks amount will be calculated as follows:
[Number Of Environments] * [Required Tables per data source]
So after that, does anyone has a better idea or scheme?
Btw the serial load time is about 50min to 65min, while the current parallelized is about 12min to 16min. Also the max concurrent loads is set to a max value of 6. But the amount will increase over time and will no solve the bad maintainablitiy.