Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, I am new to Talend. I am using it to parse financial files into something that is in a 'DB insert' - ready state. It is most likely exported from PDF to Excel by my source.
Anyhow the records I get are of format :
Group A Header
record1 someinfoA
record1_id someinfoB
record2 someinfoC
record2_id someinfoD
Group B Header etc etc
The output I want is :
record1 record1_id someinfoA "Group A Header" someinfoB
record2 record2_id someinfoC "Group A Header" someinfoD
so I want to merge the data in the record pair as well as adding in the group header into the record. There is nothing to join the pair except that in the extract the format of a record is a line 1 and a line 2!
Any ideas would be really appreciated!
My solution was to read the header and save in context variable. Read record 1/2 and save values in the context parameters. Then I read record 2/2 and stamp the context parameters on the end of the record. I did this in a tJavaRow. Then I used a tmap to clean things up.
Can you please clarify ,the relation between Input format and output format.
it is not very clear in your message.
Hi,
Since you do not have any common input id for both line items, I would try to add a sequence number to the input flow. Then you can consider 1&2 as same record, 3&4 as next group etc. Based on this numeric grouping, you can use tDenormalize component to join 1 &2 records to same output record. Could you please try it?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Thank you for your input. I was going to use tJavaRow to add a row counter to just the content rows rather than the group header rows.
I am utilising context variables to keep note of the group header and insert into each record until another header is encountered.
if (row20.Quantity == null ) { //group header row
context.Temp_CCP = row20.Description; // set group header param
} else {
row21.Record_Counter = context.Counter; //set record part
row21.CCP = context.Temp_CCP;
context.Counter++;
}
(*CCP is group header)
Any tips for setting up the tDenormalize component which I have not used before for when I go to join up the records?
Hi,
You can store the previous record in a context variable and can join based on the condition. If the tjavarow is working, then you do not have to go for denormalize method. You will have to do sequence number generation and then you need to divide it by 2 to identify whether the record is part of same group or different group.
Then denormalize it based on this group id. But I would say, your current method is easier than my option. So no worries 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Thanks Nikhil. I changed to using context params to capture part 1 of the record. My job was working ok up to the Replicate with 333 records flowing all the way through. I then worked on my tJavaRow component and I am happy with the logic in it it but now only one record flows through and everything stops there once I activate my tJavaRow component in the flow. What might be going on - Do I need a wait somewhere possibly?
I have a couple of ideas, but would need an actual example of the data you are working with. Your comment about how the records are linked (or not) needs a bit more clarification. I think that might come from an actual example. No need to (in fact please do not) include actual data, just some pseudo data with realistic values
Thank you. I am confused about how my job runs when I activate my tJavaRow component as opposed to when I turn it off. I don't understand why I don't get 333 records to flow through up to the point of the tJavaRow. Can you help me understand this. As show in this screenshot the 2 runs.
Here is an example of my data with only test data
The first part of your problem is linking your record A with record B. I think you need another dataset to create a linking record for both of these. So, for example you could have a table holding something like the following....
RecordA | RecordB | Key |
APPLE | APPL_ | 1 |
VODAFONE | VF_ | 2 |
FB_ | 3 |
You would then use these to match against RecordA and the beginning of RecordB and then add the numeric key to the record. Send these records to two different tHashOutput components. Then you can join the data back together using the key you have created.
Regarding your job, I think it will need to be rewritten to accommodate that flow.
Thank you, I will try that.
It is my first time using tJavaRow. I am still stumped as to why a job that is working up to that component, changes behaviour in the initial components when I add on the tJavaRow. (as per screenshots above. Any tricks to working with tJavaRow?