Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Conditional Split of Rows based on Condition and insert them to table

Hi all,
As i am new to Talend and have nill experience in using Talend, I want to know how to split the rows based on condition (Conditional Split) which are read from a "*.tsv" file.
Basically as a part of my job i receive Zip files which contains lot of raw data in "*.tsv" format, so i have to process the files this is from source side in the target side i have a table which have some 400 columns to accomodate the processed raw data in to this table.Before inserting it into the table i have to seperate bad rows and insert only good rows in to the table.I am thinking that i will define the schema of the .tsv file i have and extract and use a tMap and i will take the schema of the oracle table and map it to the tMap. In one line to say my job will be like the below one as per my thought,
tFileInputDelimited --------> tMap -----------> tOracleOutput
I have to implement few conditions before i insert the good rows in to the table,there are two situations here based on which the insertion should be done
First, if my .tsv file contains more than 5000 bad rows then the job should not insert the rows and system should write the bad rows in a .bad file and should write a log file.
Second, If the .tsv files contains less than 5000 bad rows then the job should insert the good rows in to the table and should not insert the bad rows and should create only log file no need to create bad file in this situation.
Is the above mentioned thought process correct if it is correct what extra needs to be added to the job in order to accomplish the two situations which i had mentioned?Can any one explain me in a step by step and which components should be used to accomplish the above mentioned whole process?
Kindly guide me with your knowledge.
Thanks and Regards,
Pavan
Labels (2)
8 Replies
Anonymous
Not applicable
Author

Hi all,
I think tOracleBulkExec_1 satisfies my condition, i see an option to use Control file to split the rows and insert them to the table. I have a control file which i used in SSIS job to accomplish the above mentioned situation, now i want to move the SSIS job to Talend as it is easier to handle and implement in Talend, because this is the best tool i had ever seen as per my experience a tool simpler,easier,faster and smarter.
Coming back to our discussion how to use this Advanced Settings tab in order to accomplish my job, i think my requirement can be accomplished with tOracleBulkExec_1, but still i am not that much confident whether to use this or what are the components that need to be included in my job in order to accomplish the requirement
Kindly guide me.
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

Hi Pavan
According to your requirement, you can create a job like this.
Use tFilterRow instead of tMap.
tHashOutput and tHashInput can save and load data in memory.
Code in Run-if trigger 1(Bad rows).
((Integer)globalMap.get("tHashOutput_1_NB_LINE"))>=5000

Code in Run-if trigger 2(Good rows).
((Integer)globalMap.get("tHashOutput_1_NB_LINE"))<5000

Regards,
Pedro
Anonymous
Not applicable
Author

Hi Pedro,
I have a small question here what if i use a tOracleBulkExec instead of using all these tHash and other stuff, as tOracleBulkExec uses SQLLoader, So i think processing and loading huge data with SQLLoader will be faster it was just my opinion i don't know about each component performance wise, so that is the reason i am asking you. What do you say?
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

Hi Pedro,
Any suggestion on the above mentioned?
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

Hi Pavan
Yes. Using tOracleOutputBulkExec will get better performance.
What I do in this image is to fit the above requirement and the job logic.
First, if my .tsv file contains more than 5000 bad rows then the job should not insert the rows and system should write the bad rows in a .bad file and should write a log file.
Second, If the .tsv files contains less than 5000 bad rows then the job should insert the good rows in to the table and should not insert the bad rows and should create only log file no need to create bad file in this situation.

You can optimize this job as you want.
Regards,
Pedro
Anonymous
Not applicable
Author

Hi Pedro,
I have few questions for the example which you had given me with tHashInput and all other stuff,
1. How does the tFilterRow filters the rows, based on which condition? If it filters the rows based on the schema definition of the tFileInputDelimited then why am i getting type mismatch errors even though my scema definition and data types are correct when i cross checked with the edit schema property?
2. What is the output of tHashInput_2?If it stores all the bad rows based on the max error condition will it output all the bad rows to the tFileOutputDelimited?
3. But just simply adding the tfileOutputDelimited is not working "No schema has been defined yet to this component" error is displayed, in order to over come this error, assuming that the tHashInput_2 will hold all the bad rows i had added the schema of my input file as output file and had given a different name for the file.Is this approach correct?
4. If you don't mind can you please give me another work around of this entire process with tOracleOutputBulkExec?
I am sorry if i am troubling you much!
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

Hi Pedro,
I have attached the images of columns and data types of the file and data base table(STOMNITURE1).What is wrong in that and what should be changed?
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

Hi Pavan
Because of limited info, my job is just an example. You'd better optimize it by yourself.
Question 1: this is I want to ask. How do you filter these rows? Is there any rule?
Question 2: The output of tHashInput is the data in tHashOutput2(Good rows). There is a dropdown list in this tHashInput component for you to choose.
Question 3: You'd better add columns in tHashInput2 manually.
Question 4: In my opinion, you can change tOracleOutput with tOracleOutputBulkExec.
I really want to help you. But I think for some simple issue or error, you can handle it. 0683p000009MA9p.png
Regards,
Pedro