Looks like all 5 fields in your AggregatedFact table are also in your FactCSV table.
So this will be attempting to generate a 5 field synthetic key, which depending on your actual data could end up being huge, gobble up your RAM and take ages.
Maybe change your AggregatedFact load to something like this :
GoodsId as AGGRGoodsId
RegionSomeMappedID as AGGRRegionSomeMappedID
sum(SellingSum) as AGGRSellingSum,
sum(PurchaseSum) as AGGRPurchaseSum
GROUP BY StoreId, GoodsId, RegionSomeMappedID;
Even if this loads in a reasonable time I doubt it will be your end game, as the only association between the tables will be on StoreId. I expect you may need a composite key to associate your 2 tables.
Or maybe if you only need the aggregations you could simply drop the FactCSV table at the end.
Yup, in Qlik script a Group By will run single threaded. Whereas an RDBMS would often multithread it.
Group By aggregations will always be heavy on machine resources be it Qlik or database and one has to make a considered choice of which machines' resources to hammer. If it is a Production database then taking load off it can often be preferred as otherwise End User OLTP performance can suffer, plus if you have RDBMS licensing per core the financial difference can be huge.
Some years back I gave a presentation to the board and said I estimated offloading such workload from the Oracle database servers to the Qlik server would save them approx GBP 5 million over five years. Which was quite a claim that the CFO picked up on and asked the CIO what he thought of this claim. The CIO responded that it was probably in the right order of magnitude. The board approved the Qlik project and have not regretted the decision.
Notwithstanding that one of the strengths of Qlik only becomes apparent when data is loaded at the transactional level and not pre-aggregated, as then the End Users in the dashboards can dynamically slice 'n dice and select dimensions interactively in ways that quite likely would never have been thought off whilst specifying what aggregations are needed.
Doing this with 300 million fact rows would need a well designed and tuned data model & front end GUI. But I have a Qlik app with over a billion fact rows and it performs adequately and the End Users love it. It does have a fair stash of RAM and processor cores for Qlik though.
Thank you Bill for your answer, really helpful!
Money always matter)
My real problem is that underlying transactions fact table is way bigger than 300M rows, actuallly it's just a monthly increment. I have serious doubts that raw transactions should be put in the Qlik even for one year in this case. Memory amount needed for such cluster will be very expensive. Hense, I have to aggregate the data.
Ok, Group By will run single threaded. What about the formulas in measure calculations when user makes a request in app? Are they run single or multi threaded?