Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tDenormalizing taking too long and too much memory to run

Hi, 

 

I am using the tDenormalizing component to denormalize two columns in 1.3kk rows and it's taking more than 2h to run and it needs 12GB of RAM. I'd like to know what is the complexity of the algorithm and if there's a way to improve the performance for high volumes of data. 

 

Thanks!

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Denormalising needs to keep all of the data in memory while looking over your 1.3 million records to see if any links between those records exist. That is not going to be easy or efficient. Is there a way that you could group the data and chunk it before trying to denormalise each chunk? That would speed this up I am sure.

View solution in original post

3 Replies
Anonymous
Not applicable
Author

Denormalising needs to keep all of the data in memory while looking over your 1.3 million records to see if any links between those records exist. That is not going to be easy or efficient. Is there a way that you could group the data and chunk it before trying to denormalise each chunk? That would speed this up I am sure.

Anonymous
Not applicable
Author

I ended up separating the portion of the data that needed to be denormalized and it was better. The algorithm seems to have really high complexity, which I think could be improved.

 

Thanks!

Anonymous
Not applicable
Author

Unfortunately the problem requires that every row be potentially linked to every other row or no rows at all. That means that everything has to go into memory. You are essentially dealing with 1,690,000,000,000 comparisons with your dataset of 1,300,000 records. I'm not sure that you can avoid that number of comparisons unless you build heuristics into the algorithm that you would only know about if you know the dataset. It's the job of the developer to build in those heuristics.