Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
We have a big document about 50 GB.
To open this document in "Clear" state or with binary reload takes about 3-5 minutes. However, if any selection has been made before this document was closed, it might take 1 hour or even more. When the document is coming up, the task manager shows up to 100% CPU utilization for 1-2 minutes and 48GB of memory (out of 384GB), then it fells to 3-4% CPU utilization and waiting.
CPU is Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz.
Application itself has 8 charts with the average calculation time about 30 seconds.
Would you provide any suggestions?
Thanks!
Boris
Hello, Boris
I suggest to split this document using document chaining.
More information Best Practices for Data Modelling (page 19)
This is a really big app. What is the app size on disk?
That is an impressive document .
To improve first page rendering try:
You didn't mention how many CPUs and Cores your server has. Given a very rough rule-of-thumb of 8GB per core you should have 48 cores (so 8 6-core CPUs or something like that)
Thanks!
Here are some answers:
Document has been analyzed, redundant fields have been removed... all data in one fact table (550 million records, 180 columns) + 1 small (60 records) dimension. Size on the disk is 44GB
The server has 16 cores - you mean it's not enough?. Filter on the open doesn't help.
Both rows and columns are a lot. 180 columns and you use them all? What do they contain? text? how many distinct values per column? if you cannot reduce columns and distinct values, you should seriously consider splitting the application.
If there are high-cardinality fields in your app like row-id's or timestamps - you could reduce the app-size (and open-time) a lot by splitting these fields into two or more fields, like these patterns:
date(floor(Timestamp), FORMAT) as Date,
time(frac(Timestamp), FORMAT) as Time
This logic could be applied by other numeric or string fields, too.
- Marcus
By the way, on disk it is 44 and in memory near 50GB? What compression format are you using?
Do you use Section Access in the document?
This can increase load time while binary loads are unaffected.
180 columns! Ouch!
When you are dealing with those kinds of row volumes then you need to be looking at 20-30 columns no more. Especially if you have lots of text values.
Do you have a table/chart that tries to show more than 1000 rows/bars/lines/dots on open?
Is this a VM or physical server?