Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
housseinHTR
Contributor II
Contributor II

Continue the migration process where the job was stopped

Hello,

I have a delimited file containing millions of records which should be migrated to a MSsql table. Due to some unexpected reasons, my job may be stopped (killed or server go down ...).

To prevent deleting and re-inserting the records already migrated, and to take advantage of time and work, is there a way to continue the migration process where the job was stopped?

Thanks in advance.

Labels (2)
1 Solution

Accepted Solutions
vapukov
Master II
Master II


@nhalicka wrote:

Hi,

Thank you again.

So, do you advise to always use the Bulk components instead of the simple output components, or depending on the situation?

Does the Bulks components have disadvantages next to the previously mentioned?

Regards


Hi

 

bulk components - fastest method overall (for any database), but they have a disadvantage - they require access to the disk attached to the server

prepared files must be located on database server disk and unfortunately this not always possible.

tDBOutput component - do not required disk access but slower 🙂

 

regarding your original question - how to prevent deleting all rows after the crash and avoid duplicates?

it possible, but always depend on your job and environment details.

an example:

if data between iterations stable and export always processed with rows in the same order (normal select from database do not warrant same order for queries, use order by class) you could:

  • on the first step of your job detect max loaded row (using the sorted column)
  • then add filter into input component for check only not exported records
  • logic could be enriched by any other steps - delete some rows (not all) from a target, check if a target is empty and etc - if it one time job more easy to start from begin, if regular better to have this logic.

 

you can of course use INSERT if not exists properties of output component (tDBOutput) but it slowest from all possible ways

View solution in original post

8 Replies
Anonymous
Not applicable

Hi,

 

   Could you please advise the component you are currently using to do this task? If you are using the Bulk loading components, the data load for even millions of record will finish very quickly. So you can plan for a truncate and load in this case.

0683p000009M31k.png

 

Please refer the Talend help document section for details and sample scenarios about how to use these components.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

 

housseinHTR
Contributor II
Contributor II
Author

Hi Nikhil Thampi,

Thanks for your reply.

Currently, I am using tDBOutput.

I have a tFileInputDelimited -> tMap -> 2 tDBOutput (based on a some condition, the row will be inserted to the corresponding table).

 

Anonymous
Not applicable

Hi,

 

    Whatever you are trying to achieve in tDBOutput can be achieved using Bulk components also, provided you are adding the conditions properly before bulk components.

 

    Are you facing any issues while doing it with Bulk components? If yes, could you please add screen shots of the job flow, sample input file and target schema etc?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

housseinHTR
Contributor II
Contributor II
Author

As I understood, and please correct if i was wrong, the Bulk components make one call to the database to insert data from the created file, so all rows will be migrated or nothing.

On the other hand, the tOuput components insert the rows every time the batch size is reached. So, during the job, if 1000000 records was inserted and the job was stopped, there is no need to re-insert it.

Any way, i will update my plan to reach the objectives of requested scenario in the best conditions, taking in consideration your advice.

Thank you.

Anonymous
Not applicable

Hi,

 

    Even in the failure scenario, the Bulk component will be still beneficial, even if you have to load the entire data again.

 

    You can try some performance test cases to prove your case. 

 

     Hope I have answered your query. Please remember to mark the topic as resolved as it will help other Talend community members also.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

housseinHTR
Contributor II
Contributor II
Author

Hi,

Thank you again.

So, do you advise to always use the Bulk components instead of the simple output components, or depending on the situation?

Does the Bulks components have disadvantages next to the previously mentioned?

Regards

vapukov
Master II
Master II


@nhalicka wrote:

Hi,

Thank you again.

So, do you advise to always use the Bulk components instead of the simple output components, or depending on the situation?

Does the Bulks components have disadvantages next to the previously mentioned?

Regards


Hi

 

bulk components - fastest method overall (for any database), but they have a disadvantage - they require access to the disk attached to the server

prepared files must be located on database server disk and unfortunately this not always possible.

tDBOutput component - do not required disk access but slower 🙂

 

regarding your original question - how to prevent deleting all rows after the crash and avoid duplicates?

it possible, but always depend on your job and environment details.

an example:

if data between iterations stable and export always processed with rows in the same order (normal select from database do not warrant same order for queries, use order by class) you could:

  • on the first step of your job detect max loaded row (using the sorted column)
  • then add filter into input component for check only not exported records
  • logic could be enriched by any other steps - delete some rows (not all) from a target, check if a target is empty and etc - if it one time job more easy to start from begin, if regular better to have this logic.

 

you can of course use INSERT if not exists properties of output component (tDBOutput) but it slowest from all possible ways

housseinHTR
Contributor II
Contributor II
Author

Hi,

Thank you for your response and your generous explanation.