Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
gargi_bardhan
Creator
Creator

Approach to handle huge data

Hi Friends,

Again needed advice of you guys.

Following are the deatils:

  • 2 year data
  • DB - Netezza
  • QV11,SR2
  • No Of Records      - 50 Billion Records coming from one fact table (maximum fields are key's like you can say fk's of different tables)
  • No Of Columns     - 92 (Currently user want to fetch all of them in QV,so not able to drop any column)
  • Reload time taken by QV to generate qvd for 1 month = 15min

but while loading data for 24 months, the QVW is geting stuck after reading only for 4 months.

So can anyone guide what approach should be taken to manage this much volume of data.

Thanks in advance .

Gargi

8 Replies
Anonymous
Not applicable

maybe you need to split your data loading for each month and load data simultaneously

and consolidate qvds to the one in final step.

And make sure you need all of 92 columns in the application.

ajaykumar1
Creator III
Creator III

Hi,

As per my understanding;

Please check out with the customers if they need 2 yrs data for their business purpose.If they want 2 yrs data then check the data whether its based on day wise or month wise from the data source.

Primarly load the 1 month data and find out the how much time its taking to load.Based on the load time we will make the no. of extractors/Generators to make QVD’s and then their consolidation in one QVD(Without missing of records coming from the system). Please try to avoid the loading of the other files at that time.

If your maintain the low RAM please try to increase the same in server.

Regards,

Ajay

Not applicable

With my Business Analyst hat on, I'd seriously question why you need to load 50 Billion Rows!

Unless you are trying to do something particularly bizarre (please excuse my use of the word!) the only benefit you'll get from that volume of data are trends. I'd bring in a subset of data (say 1 in every 1000 records), this will still be 50,000,000 records and will give you a reasonable trend realisation.  If you're trying to bring together summarised figures (like sales / profits / turnover etc) from the data, probably the better place to aggregate them is inside the Data Warehouse, after all 50 Billion rows is a lot of network traffic.

Having written the sentence above, I did a bit of maths... 50,000,000,000 rows with 92 fields per row and just 20 bytes per field (excluding database metadata) equates to approx. 90GB of data.  Given this, I can well understand why the server gives up the ghost after 4 months of data is loaded.    I think it's probably better to go back to the business and ask the question again...  Have a read of my post from 2012 about asking the right question.

http://inspiredbusinessintelligence.me/2012/01/05/the-business-analyst/

Anonymous
Not applicable

we did the similar things with big amount of data loading ~1.5 billion of records, but there was another problem with source db performance - we tried to aggregate data there without positive result and decided to move row level data loading to the QVD side with incremental process.

With QVD files we aggregate data for the granularity we should have for the analysis and our final application takes 1,5MB only.

israrkhan
Specialist II
Specialist II

share the log files, in order to see where and why it get stucked...

gargi_bardhan
Creator
Creator
Author

Thanks Kozins & Ajay for your reply ,was on vacation so couldnt reply back.

I have tried that as well and mentioned above that issue remains same .

For each month it was taking 15mins to store qvd monthwise.


So when calling them at a time after 4 months getting stuck.

gargi_bardhan
Creator
Creator
Author

Thanks Kevin for your time and suggestion which I totally agree. I too asked for the relevant columns based on which report is to be created but came across few queries like.

1) There are two servers 16 GB + 64 GB RAM ,so what will be the volume of data QV can handle in this configuration?

2) If we increase the RAM is it 100 % comaptible to handle this much bulk data?

Thanks again & it was a very good read .

Regards,

Gargi 

avinashelite

Hi Gargi,

Try the below steps:

1.Create separate qvd's on the year basis. So you will get all the historic data.

2. then every time load only current year data and rest of the data from QVD.

3.Implement Increment load and load only changed records.

4.Try to make all the complex AGGr calculation in the script.