Data Load Optimization

chaudharybilal · ‎2018-09-18

Hi,

We have to load a data set of total 75 GB. There are 4 tables involved, and the max size of one table is around 1.5 billion. remaining tables are under 100 Million. The data resides in Teradata, and we want to pull all of this data. But this is taking way too much time. We limited our data to 300 million records and around 25 GB in size so that we can have the idea of how much time will it take. It took around 16 hours to load this much of data.

We looked at the logs, and data fetching of the first table took around 10 hours alone. The query execution time on db is not more than 2 minutes.

Kindly suggest a way to optimize this process. Currently we are using all of default settings(nothing is changed). We are using Teradata connector that ships with Qlik sense. The data is not incremental, and have to load the same amount of data twice each month.

The data is aggregated on bi-month ID so the data can't be divided on date basis.

Thanks

Ivan_Bozov · ‎2018-09-18

Have you checked Qlik's On Demand App Generation capability? I suggest you give it a go.

Qlik and Big Data: On Demand App Generation - YouTube

vizmind.eu

nsetty · ‎2018-09-18

I guess once data is fetched is getting stored into QVDs.

marcus_sommer · ‎2018-09-18

I suggest to consider an incremental load-approach with storing the data within qvd's - so that only the new/changed/deleted records needs to be queried. Even if there is no date available there are probably some other ID fields which might be usable for it in some way.

If this is really not possible then you need to look for your biggest bottleneck which might be the network-performance and/or the performance of the driver.

If you couldn't adjust the network performance it might be helpful to bring a sense-client into the terra-data environment and create the qvd's there (depending on the kind of data a qvd could be very significantly smaller as raw-data) and transfer them into your Sense environment.

I don't know teradata and its features but I could imagine that a simple export of the data into a csv and creating from them the qvd's could be faster then querying the data with a database-driver.

- Marcus

Related Topics