Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi everyone,
I hope you're doing well.
One option to see the 'total' RAM usage during a reload is to setup a perfmon collector set, log everything to a file and then manually work out the peak RAM usage between the start and end of the reload?
https://techcommunity.microsoft.com/blog/askperf/windows-performance-monitor-overview/375481
Thank you @lachlanwcast
I like your suggestion. I'll give it a try.
I'll leave this thread open to see if there is other solutions.
-Youssef
You will get this information in Qlik Cloud Readiness app.
@PrashantSangle
Nope, all ETL apps (apps that nclude drop tables) shows "zero" on the used base ram, on the Readiness app and on all the monitoring apps out there.
Thanks anyway.
-Youssef
You may get respectively derive the Base RAM by reading the information from the stored files per filesize() and/or to store/drop temporary every load-step as file and fetching their information, too.
Another approach might be to read the meta-data of the tables per loop over all fields and their fieldvaluecount() * avg() bytes per value + considering the size of the table-index per noofrows(). The results would be never exact but suitable to get an order of magnitude.
Both methods aren't without some efforts whereby the first one is rather simple and the second quite complex. But it won't be mandatory necessary to apply it against everything else just the few most heavy applications - I assume you knows already which ones.
Beside the above be aware that the RAM consumption during any joins/aggregations/crosstable/... might be much bigger as the final output-data. To fetch this kind of information you will need a look from the outside to the events of the ETL like mentioned by @lachlanwcast (you may also look for usable information within the performance-logs of the QS server).
Thank you @marcus_sommer
Very interesting approaches.
I'll take both and the give it a try.
The Ram consumption during joings/aggregations/cross etc I think I will not need it, because It will not be calculated by the resources of the machines that will host the Qlik data gateway, it will instead use the resources of the QlikCloud tenant.
My need is basically to calculate the amount of on-prem data uploaded to the tenant. to size correctly the Data gateway machines.
-Youssef
This means that mainly the sources of the raw-data define the sizing and that rather the sql/storage-system should answer the question. This might be further impacted by any incremental approaches or a data-compressing whereby such measurements would also create some efforts.
It will help if it was the exact same size of the raw data. but on every sql script, there is explicit SELECT statement, not really SELECT *.
They are selecting the columns, making filters and aggregations, which reduce the amount of data comparing the raw/total data.
I think the most pragmatically approach would be to look manually for the 5 biggest loads and a simple calculation of:
((F1 * DataType) + (F2 * DataType) + (F3 * DataType) + ....) * Records
whereby Records is then an approximately assumption of % reduction from the filter-conditions and aggregations.
It should be enough in regard to the max. peak because the load-tasks are very likely scheduled in defined chains because the number of parallel tasks are restricted to the performance of the data-base and the network and there might be also various dependencies in regard to the task-orders.