
Anonymous
Not applicable
2012-04-08
03:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Conditional Split of Rows based on Condition and insert them to table
Hi all,
As i am new to Talend and have nill experience in using Talend, I want to know how to split the rows based on condition (Conditional Split) which are read from a "*.tsv" file.
Basically as a part of my job i receive Zip files which contains lot of raw data in "*.tsv" format, so i have to process the files this is from source side in the target side i have a table which have some 400 columns to accomodate the processed raw data in to this table.Before inserting it into the table i have to seperate bad rows and insert only good rows in to the table.I am thinking that i will define the schema of the .tsv file i have and extract and use a tMap and i will take the schema of the oracle table and map it to the tMap. In one line to say my job will be like the below one as per my thought,
tFileInputDelimited --------> tMap -----------> tOracleOutput
I have to implement few conditions before i insert the good rows in to the table,there are two situations here based on which the insertion should be done
First, if my .tsv file contains more than 5000 bad rows then the job should not insert the rows and system should write the bad rows in a .bad file and should write a log file.
Second, If the .tsv files contains less than 5000 bad rows then the job should insert the good rows in to the table and should not insert the bad rows and should create only log file no need to create bad file in this situation.
Is the above mentioned thought process correct if it is correct what extra needs to be added to the job in order to accomplish the two situations which i had mentioned?Can any one explain me in a step by step and which components should be used to accomplish the above mentioned whole process?
Kindly guide me with your knowledge.
Thanks and Regards,
Pavan
As i am new to Talend and have nill experience in using Talend, I want to know how to split the rows based on condition (Conditional Split) which are read from a "*.tsv" file.
Basically as a part of my job i receive Zip files which contains lot of raw data in "*.tsv" format, so i have to process the files this is from source side in the target side i have a table which have some 400 columns to accomodate the processed raw data in to this table.Before inserting it into the table i have to seperate bad rows and insert only good rows in to the table.I am thinking that i will define the schema of the .tsv file i have and extract and use a tMap and i will take the schema of the oracle table and map it to the tMap. In one line to say my job will be like the below one as per my thought,
tFileInputDelimited --------> tMap -----------> tOracleOutput
I have to implement few conditions before i insert the good rows in to the table,there are two situations here based on which the insertion should be done
First, if my .tsv file contains more than 5000 bad rows then the job should not insert the rows and system should write the bad rows in a .bad file and should write a log file.
Second, If the .tsv files contains less than 5000 bad rows then the job should insert the good rows in to the table and should not insert the bad rows and should create only log file no need to create bad file in this situation.
Is the above mentioned thought process correct if it is correct what extra needs to be added to the job in order to accomplish the two situations which i had mentioned?Can any one explain me in a step by step and which components should be used to accomplish the above mentioned whole process?
Kindly guide me with your knowledge.
Thanks and Regards,
Pavan
513 Views
8 Replies

Anonymous
Not applicable
2012-04-08
03:40 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I think tOracleBulkExec_1 satisfies my condition, i see an option to use Control file to split the rows and insert them to the table. I have a control file which i used in SSIS job to accomplish the above mentioned situation, now i want to move the SSIS job to Talend as it is easier to handle and implement in Talend, because this is the best tool i had ever seen as per my experience a tool simpler,easier,faster and smarter.
Coming back to our discussion how to use this Advanced Settings tab in order to accomplish my job, i think my requirement can be accomplished with tOracleBulkExec_1, but still i am not that much confident whether to use this or what are the components that need to be included in my job in order to accomplish the requirement
Kindly guide me.
Thanks and Regards,
Pavan
I think tOracleBulkExec_1 satisfies my condition, i see an option to use Control file to split the rows and insert them to the table. I have a control file which i used in SSIS job to accomplish the above mentioned situation, now i want to move the SSIS job to Talend as it is easier to handle and implement in Talend, because this is the best tool i had ever seen as per my experience a tool simpler,easier,faster and smarter.
Coming back to our discussion how to use this Advanced Settings tab in order to accomplish my job, i think my requirement can be accomplished with tOracleBulkExec_1, but still i am not that much confident whether to use this or what are the components that need to be included in my job in order to accomplish the requirement
Kindly guide me.
Thanks and Regards,
Pavan
513 Views

Anonymous
Not applicable
2012-04-09
03:43 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavan
According to your requirement, you can create a job like this.
Use tFilterRow instead of tMap.
tHashOutput and tHashInput can save and load data in memory.
Code in Run-if trigger 1(Bad rows).
Code in Run-if trigger 2(Good rows).
Regards,
Pedro
According to your requirement, you can create a job like this.
Use tFilterRow instead of tMap.
tHashOutput and tHashInput can save and load data in memory.
Code in Run-if trigger 1(Bad rows).
((Integer)globalMap.get("tHashOutput_1_NB_LINE"))>=5000
Code in Run-if trigger 2(Good rows).
((Integer)globalMap.get("tHashOutput_1_NB_LINE"))<5000
Regards,
Pedro
513 Views

Anonymous
Not applicable
2012-04-09
07:03 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pedro,
I have a small question here what if i use a tOracleBulkExec instead of using all these tHash and other stuff, as tOracleBulkExec uses SQLLoader, So i think processing and loading huge data with SQLLoader will be faster it was just my opinion i don't know about each component performance wise, so that is the reason i am asking you. What do you say?
Thanks and Regards,
Pavan
I have a small question here what if i use a tOracleBulkExec instead of using all these tHash and other stuff, as tOracleBulkExec uses SQLLoader, So i think processing and loading huge data with SQLLoader will be faster it was just my opinion i don't know about each component performance wise, so that is the reason i am asking you. What do you say?
Thanks and Regards,
Pavan
513 Views

Anonymous
Not applicable
2012-04-09
07:30 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pedro,
Any suggestion on the above mentioned?
Thanks and Regards,
Pavan
Any suggestion on the above mentioned?
Thanks and Regards,
Pavan
513 Views

Anonymous
Not applicable
2012-04-09
11:10 PM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavan
Yes. Using tOracleOutputBulkExec will get better performance.
What I do in this image is to fit the above requirement and the job logic.
You can optimize this job as you want.
Regards,
Pedro
Yes. Using tOracleOutputBulkExec will get better performance.
What I do in this image is to fit the above requirement and the job logic.
First, if my .tsv file contains more than 5000 bad rows then the job should not insert the rows and system should write the bad rows in a .bad file and should write a log file.
Second, If the .tsv files contains less than 5000 bad rows then the job should insert the good rows in to the table and should not insert the bad rows and should create only log file no need to create bad file in this situation.
You can optimize this job as you want.
Regards,
Pedro
513 Views

Anonymous
Not applicable
2012-04-10
03:02 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pedro,
I have few questions for the example which you had given me with tHashInput and all other stuff,
1. How does the tFilterRow filters the rows, based on which condition? If it filters the rows based on the schema definition of the tFileInputDelimited then why am i getting type mismatch errors even though my scema definition and data types are correct when i cross checked with the edit schema property?
2. What is the output of tHashInput_2?If it stores all the bad rows based on the max error condition will it output all the bad rows to the tFileOutputDelimited?
3. But just simply adding the tfileOutputDelimited is not working "No schema has been defined yet to this component" error is displayed, in order to over come this error, assuming that the tHashInput_2 will hold all the bad rows i had added the schema of my input file as output file and had given a different name for the file.Is this approach correct?
4. If you don't mind can you please give me another work around of this entire process with tOracleOutputBulkExec?
I am sorry if i am troubling you much!
Thanks and Regards,
Pavan
I have few questions for the example which you had given me with tHashInput and all other stuff,
1. How does the tFilterRow filters the rows, based on which condition? If it filters the rows based on the schema definition of the tFileInputDelimited then why am i getting type mismatch errors even though my scema definition and data types are correct when i cross checked with the edit schema property?
2. What is the output of tHashInput_2?If it stores all the bad rows based on the max error condition will it output all the bad rows to the tFileOutputDelimited?
3. But just simply adding the tfileOutputDelimited is not working "No schema has been defined yet to this component" error is displayed, in order to over come this error, assuming that the tHashInput_2 will hold all the bad rows i had added the schema of my input file as output file and had given a different name for the file.Is this approach correct?
4. If you don't mind can you please give me another work around of this entire process with tOracleOutputBulkExec?
I am sorry if i am troubling you much!
Thanks and Regards,
Pavan
513 Views

Anonymous
Not applicable
2012-04-10
03:28 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pedro,
I have attached the images of columns and data types of the file and data base table(STOMNITURE1).What is wrong in that and what should be changed?
Thanks and Regards,
Pavan
I have attached the images of columns and data types of the file and data base table(STOMNITURE1).What is wrong in that and what should be changed?
Thanks and Regards,
Pavan
513 Views

Anonymous
Not applicable
2012-04-10
03:29 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavan
Because of limited info, my job is just an example. You'd better optimize it by yourself.
Question 1: this is I want to ask. How do you filter these rows? Is there any rule?
Question 2: The output of tHashInput is the data in tHashOutput2(Good rows). There is a dropdown list in this tHashInput component for you to choose.
Question 3: You'd better add columns in tHashInput2 manually.
Question 4: In my opinion, you can change tOracleOutput with tOracleOutputBulkExec.
I really want to help you. But I think for some simple issue or error, you can handle it.
Regards,
Pedro
Because of limited info, my job is just an example. You'd better optimize it by yourself.
Question 1: this is I want to ask. How do you filter these rows? Is there any rule?
Question 2: The output of tHashInput is the data in tHashOutput2(Good rows). There is a dropdown list in this tHashInput component for you to choose.
Question 3: You'd better add columns in tHashInput2 manually.
Question 4: In my opinion, you can change tOracleOutput with tOracleOutputBulkExec.
I really want to help you. But I think for some simple issue or error, you can handle it.
Regards,
Pedro
513 Views
