Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm trying to unload a table with ~1.5 to 2 billion entries. The expected behaviour is: the data is going to be exported, the result is splitted in multiple CSV files AND the resulting files are going to be compressed at the same time in order to save disk space.
The tFileOutputDelimited has the feature to compress the resulting file OR split the result into mutliple files, but not both at the same time.
So my question is: how am I able to achieve splitting and compressing a file at the same time. Is there a way I can trigger another process at the moment one subset of rows has been written to a file?
Best
Arne
Hi Arne,
About the only practical way I can think to do this, would be to get the records from your source database in batches the size of each files you want to output, as follows:
A quick query to get the record count:
And store this in a global variable for later:
Add a context variable for the batch size, as it's used in two components, and so should be maintained in one place:
The tLoop in "for" mode allows us to iterate and get the necessary offset for each batch of records:
In the database input component, we build a query with the correct LIMIT and OFFSET (or whatever's appropriate for your DBMS):
And then we simply output using a tFileOutputDelimited, with zip compression enabled, and a dynamic filename, in my case based on the record offset to keep it simple:
Giving us our zipped CSV files:
Regards,
Chris
Hi Arne,
About the only practical way I can think to do this, would be to get the records from your source database in batches the size of each files you want to output, as follows:
A quick query to get the record count:
And store this in a global variable for later:
Add a context variable for the batch size, as it's used in two components, and so should be maintained in one place:
The tLoop in "for" mode allows us to iterate and get the necessary offset for each batch of records:
In the database input component, we build a query with the correct LIMIT and OFFSET (or whatever's appropriate for your DBMS):
And then we simply output using a tFileOutputDelimited, with zip compression enabled, and a dynamic filename, in my case based on the record offset to keep it simple:
Giving us our zipped CSV files:
Regards,
Chris
Thanks a lot for this solution. I will give it a try. Unfortunately I have to retrieve the data from a old data sink which does not have any indices at all (and its a data view). So maybe the counting the objects takes a while.