Problems loading huge data

Report Inappropriate Content · ‎2011-04-20

Currently I'm having a problem with the out of virtual RAM message. I want to load a spreadsheet that is about 22 million rows and about 5 columns and I can get to about 20 million rows before I get the out of RAM message (and then my computer runs so slow I have to restart afterwords). Are there any tricks to getting around this? Can I load the first 12 million records and then concatenate load the last 10 million or will that just give the same error? I'll post the situation below which I think might lead to a solution...

The csv (about 2 gigs) is created daily and has all of the invoices sent by my company. There are a few differences in the csvs from day to day:

1) New invoices from yesterday are on todays csv

2) Some invoices were voided and those invoices are updated (very few)

3) There is an [Amount Due] field which is updated when the invoice is paid off

That being said, is there some way that I could split the the entire csv into like 3 or 4 qvds, load them all, and then somehow use exists( to get the rows that changed to update this everyday? I think this is somehow connected to incremental load but I don't have a "modification date" field... I would just want to load "when this is different" and delete the old record but I'm not sure how to do that and I don't really trust the exists( function for some reason.

Didn't mean for this post to be this long but I really appreciate any help.

erichshiino · ‎2011-04-22

First, if you just break it into pieces and concatenate them, it won't help, you will get the problem.

So, we need to explore the differences from day to day to reduce the number of rows.

You said you don't have a modification date. If you get two files with the same invoice, how can you tell which one has the most updated information? Can you tell this but the cvs name?

let's me say for example, that if I read the latest information is enough, I would read the files in the descending order and use exists to ignore old info.

Tell us how your data is organized then we can continue

Report Inappropriate Content · ‎2011-04-22

Three Things may be responsible for this:

Huge Data: Try to use aggregation (ORDER BY,GROUP BY WHEN USING QVD GENERATOR)

Complex designing of data load: Reduce your join and concatenation or do these operation before loading it into dashbaord QVW.

Heavy Expression: CHECK THIS IF YOUR USING HEAVY EXPRESSION IN YOUR OBJECTS

USE OF MULTIPLE IF IN UR MULTIPLE EXPRESSSION OR MORE NUMBERS OF EXPRESSIONS IN UR REPORTS.

or if everything is fine then u need to work on hardware sizing.............

Report Inappropriate Content · ‎2011-04-22

Hello trent.jones,

your problem does not sound very good. It seems that you don't have any critera to split your source-file into (qvd-)buckets. I am surpriced that you don't have an invoice-date as a potential filter. Do you have a (unique) invoice-number? Let's say to find some "old" invoices which will never more being changed and therefor can be stored once in a qvd. Then load the qvd and afterwards the csv-data starting from qvd-invoice-number +1.

As you can see, all the ideas above are "on thin ice". Next question to mention is: what will happen to your application (computer), if you manage to load all the data, during the first (heavy) calculation?

So I am sorry, but in my oppinion are two possibilities left: Talk to the guy (db-admin, whoever it is) who creates your source-file. Or BEST: talk to your sys-admin to get more ram and an change to a 64-bit windows.

Regards, Roland

Report Inappropriate Content · ‎2011-04-22

Let's say an invoice with a date of 1/1/2011 is voided on 1/5/2011. If you look at the .csv on any day between 1/1/2011 and 1/4/2011, that invoice will be on there. If you look at the csv on 1/5/2011 or any day after that, that invoice will just be gone. That being said, there will never be two invoices with the same invoice number on this report (which is different than how it shows up in the real tables which I can't access at the moment).

After reading some of these responses I think I need to talk to one of the IT guys b/c that huge csv file is some SQL report they run every day and the way it's set up now makes it pretty tough for me to use it...

Report Inappropriate Content · ‎2011-04-22

Right now I'm running XP, an Athlon 64 X2 dual core 4400+ w/ 2.3GHz and 2 GB of RAM... how much of a difference would a new computer actually make? It won't be easy to pull that off.... I think I'll just try to load that csv on my computer at home and see what happens (if I can even get the file there).

Report Inappropriate Content · ‎2011-04-22

Ok,

this can to be a possibility. If you need a memory-stick, I can loan you one 😉

But it seems that your computer is upgradeable up to 4GB phys. ram. This is in any case the maximum using 32-bit-windows and in that case about max. 3GB can be used for a QV-App (rest is for winXP itself, buffer-cache and other things).

So good luck

Roland

mongolu · ‎2011-04-22

Can you post the script text of what you're running? (with comments of what is intented to do, if it's not obvious)

Perhaps some statements could be written slightly differently.

Report Inappropriate Content · ‎2011-04-22

Well there's really nothing to post as it's just:


LOAD
@1 as 'Invoice Number',
@2 as 'Account Number',
@3 as 'Invoice Date',
maybe a couple more like that
FROM whatever.csv
(csv tags here etc.);

But we're talking about at least 22 million rows so maybe I just need a better computer.

Related Topics