Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a delimited file containing millions of records which should be migrated to a MSsql table. Due to some unexpected reasons, my job may be stopped (killed or server go down ...).
To prevent deleting and re-inserting the records already migrated, and to take advantage of time and work, is there a way to continue the migration process where the job was stopped?
Thanks in advance.
@nhalicka wrote:
Hi,
Thank you again.
So, do you advise to always use the Bulk components instead of the simple output components, or depending on the situation?
Does the Bulks components have disadvantages next to the previously mentioned?
Regards
Hi
bulk components - fastest method overall (for any database), but they have a disadvantage - they require access to the disk attached to the server
prepared files must be located on database server disk and unfortunately this not always possible.
tDBOutput component - do not required disk access but slower 🙂
regarding your original question - how to prevent deleting all rows after the crash and avoid duplicates?
it possible, but always depend on your job and environment details.
an example:
if data between iterations stable and export always processed with rows in the same order (normal select from database do not warrant same order for queries, use order by class) you could:
you can of course use INSERT if not exists properties of output component (tDBOutput) but it slowest from all possible ways
Hi,
Could you please advise the component you are currently using to do this task? If you are using the Bulk loading components, the data load for even millions of record will finish very quickly. So you can plan for a truncate and load in this case.
Please refer the Talend help document section for details and sample scenarios about how to use these components.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi Nikhil Thampi,
Thanks for your reply.
Currently, I am using tDBOutput.
I have a tFileInputDelimited -> tMap -> 2 tDBOutput (based on a some condition, the row will be inserted to the corresponding table).
Hi,
Whatever you are trying to achieve in tDBOutput can be achieved using Bulk components also, provided you are adding the conditions properly before bulk components.
Are you facing any issues while doing it with Bulk components? If yes, could you please add screen shots of the job flow, sample input file and target schema etc?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
As I understood, and please correct if i was wrong, the Bulk components make one call to the database to insert data from the created file, so all rows will be migrated or nothing.
On the other hand, the tOuput components insert the rows every time the batch size is reached. So, during the job, if 1000000 records was inserted and the job was stopped, there is no need to re-insert it.
Any way, i will update my plan to reach the objectives of requested scenario in the best conditions, taking in consideration your advice.
Thank you.
Hi,
Even in the failure scenario, the Bulk component will be still beneficial, even if you have to load the entire data again.
You can try some performance test cases to prove your case.
Hope I have answered your query. Please remember to mark the topic as resolved as it will help other Talend community members also.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
Thank you again.
So, do you advise to always use the Bulk components instead of the simple output components, or depending on the situation?
Does the Bulks components have disadvantages next to the previously mentioned?
Regards
@nhalicka wrote:
Hi,
Thank you again.
So, do you advise to always use the Bulk components instead of the simple output components, or depending on the situation?
Does the Bulks components have disadvantages next to the previously mentioned?
Regards
Hi
bulk components - fastest method overall (for any database), but they have a disadvantage - they require access to the disk attached to the server
prepared files must be located on database server disk and unfortunately this not always possible.
tDBOutput component - do not required disk access but slower 🙂
regarding your original question - how to prevent deleting all rows after the crash and avoid duplicates?
it possible, but always depend on your job and environment details.
an example:
if data between iterations stable and export always processed with rows in the same order (normal select from database do not warrant same order for queries, use order by class) you could:
you can of course use INSERT if not exists properties of output component (tDBOutput) but it slowest from all possible ways
Hi,
Thank you for your response and your generous explanation.