Solved: Split source table based on record count for proce... - Qlik Community

Anonymous · ‎2018-11-06

Hi,

I have a very massive table whose record count will be more than 10 billion.

1. I wanted to read the data from input table for first 2 billion, process it and write to output table.

Then fetch next 2 billion and do the same process and write to output table.

how could I design my job so that it loops to fetch 2 billion at a time. This design should also consider restartability at record level if it errors(checkpoint at each row if this is the best approach).

2. Can I run as a seperate process each 2 billion records?

Thanks

Anonymous · ‎2018-11-08

Hi,

Thanks for the suggestion.

I have used table to store restart and record split details which will be input to the child job.

And I have created parallel flow creating multiple job reading from the same table.

Thanks.

View solution in original post

TRF · ‎2018-11-06

You may also read the table at once and redirect the content to delimited files with the desired number of records for each.
On a separate process, you can start to read the 1rst file as soon as the 2nd one is created and so on.
This way you can start to process the result before the query is finished and restart on the file of your choice in case of.
Restartability at record level is an other story and I'm not sure checkpoint for each row is an option with such a large volume.
Paralelization for insert should be possible in append mode to avoid locks on table (as far as I remember) but you probably need to limit the number of parallel processes.
That's just some ideas. Let us know the rest of the story. Good luck.

Anonymous · ‎2018-11-08

Hi,

Thanks for the suggestion.

I have used table to store restart and record split details which will be input to the child job.

And I have created parallel flow creating multiple job reading from the same table.

Thanks.

Anonymous · ‎2018-11-10

Hi TRF,

Like tCheckpoint customised component, is there any customised component that support parallel execution as well as restartability at job-level?

Thanks,

Revathy.

Split source table based on record count for processing big Oracle tables

REST

Talend Data Integration

v7.x