Hi all,
I'm using TOS DI 5.2.2 in Red Hat. I have 25,000 files to load. my jobs as follows:
tfileList----> tFileInputDelimited---->tMap---->tVerticaOutput
Action on table is "Insert on table"
it's taking 35min. to load 50files!!!! and each file size is less then 5MB. But same job if i write it to file (instead of vertica) it's taking less time!!
What i'm missing here..!! i need to change any settings!!
Please suggest solution
Regards,
Akki
Hi Akki, What is the need of tMap? if there are no joins, you can replace it by tJavaRow to add new columns or do some operations... What is the commit size and batch size you have given? is it set to auto commit? vaibhav
Inserting may be slow because referential or other constraints are being evaluated by the database prior to committing these records...
You can also try increasing your java heap size to make more memory available for processing.
As an alternative, use the tVerticaOutputBulk and tVerticaBulkExec components for faster loading. They're designed to load larger datasets faster, and typically leverage other database utilities on the back-end. For more info, see
https://help.talend.com/search/all?query=tVerticaOutputBulkExec&content-lang=en
hi sanvaibhav,
I'm using tMap because there are two flows from tMap. one flow insert all fileds from input files to vertica table.
other flow just takes one required field from file and insert that field to other file.
and i have given commit every 100rows in tverticaoutput other then that no settings i'm using in that component.
i don't tthink tMap is effecting my job speed in this case. because as i already mentioned if i replace vertica component with file output component, then its processing in high speed.
Please let me know any solution.
Hi willm,
As Vertica told if you have large file then go for Bulk insert. but in this case each file size is less then 5MB.
Regards,
Akki
You can increase number of records to 1000 and check the performance, every commit cycle takes time... we can reduce the commit cycle by increasing the number of rows...
- Also try giving auto commit option, it would take its default batch size and can observe the difference...
you can use tChronometerstart and tchronometerstop component to check exact execution time...
Whether single job is loading all the files or you have multiple subjobs to do this?
vaibhav
Have you tried to use tVerticalConnection in your job design. The auto commit option is available in tVerticalConnection component. BTW, What's your job row rate? Best regards Sabrina
hi sabrina,
Thank you for reply. I have tried with tverticaConnection. but problem still same. Now i think we got solution for this.
Our vertica table have sequence number generation, but before this was not there. so we removed sequence for that table and checked loading, loading got fast!!
now still i don understand why sequence number will effect loading to table from Talend tool!! Is there any possibilities to use vertica sequence in tverticaoutput component.
Advice : If we cant use sequences of vertica in our talend job, please try to add this feature in upcoming release.
Regards,
Akki
Hi i am facing an issue in Talend Insert i am using Oracle 11g to insert data and inserting data in 5 tables, out of 5 table Talent Insert batch able to insert but for 1 table it is taking 10 times more than all table inserts even records to be Inserts are same in all table.
Can any one help me and find out solution for the same.
Vertica is a column based database and it is typically behaviour to be very slow in single dataset statements.
To improve the performance I suggest you load at first the input data without any checks into the vertica database and use pure SQL statements to fill the (staging-) data into your target table.
Column based database will usually works best with bulk loads instead of insert/update row by row.