Query

amitavasen — Sat, 29 Aug 2020 10:34:45 GMT

A company generates 1 GB of ticketing data daily. The data is stored in multiple tables Business
users need to see trends of tickets processed for the past two years. Users very rarely access the
transaction-level data for a specific date. Only the past two years of data must be loaded which is 720
GB of data Which method should a data architect use to meet these requirements?
A. Load only aggregated data for two years and use ODAG for transaction data
B. Load only two years of data in an aggregated app and create a separate transaction app for
occasional use
C. Load only two years of data and use best practices in scripting and visualization to calculate and
display aggregated data
D. Load only aggregated data for two years and use Direct Discovery for transaction data

Re: Query

marcus_sommer — Mon, 31 Aug 2020 09:43:46 GMT

The rawdata-size from the database isn't equally with the data-size in Qlik. At first not all fields and records from the database are necessary and sensible in Qlik.

Quite often there are a lot of outdated fields, various record-id's from the tables and so on which are seldom needed (sometimes during the development to make some data-checks but not within the final dashboards). Further Qlik used a special storing-method by storing only distinct values within the symbol-tables and related bit-stuffed pointer within the data-tables. Depending on your data and requirements your rawdata of approximately 720 GB might end in 20 - 30 GB in Qlik which may depending on your environment workable as a single application.

If it's useful to aggregate the data depends on your biggest bottleneck in your environment and on the aggregation-rate of your records - means depending on the level of needed details 100 records may become 1 record and an aggregation will have some impact on the UI performance or it may remain about 50 records and it might not much noticable within the UI.

Therefore your question couldn't be answered in general else you need to do a careful testing in regard to the requirements and to the performance/possibilities of your environment to determine which approach is the most suitable for your case. Personally I would tend with C then B then A ...

- Marcus

topic Query in Water Cooler

Query

Re: Query