Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Split source table based on record count for processing big Oracle tables

Hi,

 

I have a very massive table whose record count will be more than 10 billion.

 

1. I wanted to read the data from input table for first 2 billion, process it and write to output table.

Then fetch next 2 billion and do the same process and write to output table.

how could I design my job so that it loops to fetch 2 billion at a time. This design should also consider restartability at record level if it errors(checkpoint at each row if this is the best approach). 

 

2. Can I run as a seperate process each 2 billion records?

Thanks

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

 

Thanks for the suggestion.

I  have used table to store restart and record split details which will be input to the child job.

And I have created parallel flow creating multiple job reading from the same table.

 

Thanks. 

View solution in original post

3 Replies
TRF
Champion II
Champion II

You may also read the table at once and redirect the content to delimited files with the desired number of records for each.
On a separate process, you can start to read the 1rst file as soon as the 2nd one is created and so on.
This way you can start to process the result before the query is finished and restart on the file of your choice in case of.
Restartability at record level is an other story and I'm not sure checkpoint for each row is an option with such a large volume.
Paralelization for insert should be possible in append mode to avoid locks on table (as far as I remember) but you probably need to limit the number of parallel processes.
That's just some ideas. Let us know the rest of the story. Good luck.
Anonymous
Not applicable
Author

Hi,

 

Thanks for the suggestion.

I  have used table to store restart and record split details which will be input to the child job.

And I have created parallel flow creating multiple job reading from the same table.

 

Thanks. 

Anonymous
Not applicable
Author

Hi TRF,

 

Like tCheckpoint customised component, is there any customised component that support parallel execution as well as restartability at job-level?

 

Thanks,

Revathy.