Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, I'm trying to create a job to process csv data input as follow :
The Input :
ID Start_Date End_Date Action Cost
ID1 Date_S1 Date_E1 Action1 1000
ID1 Date_S2 Date_E2 Action2 200
The Output :
If Action2=Value1, the output will be :
ID Start_Date End_Date Action_Date Action Cost
ID1 Date_S1 Date_E1 Date_S2 Action2 1000
ID1 Date_S2+1 Date_E2 1200
I need to apply this changes for each ID with several rows
Any suggestions?
Hi,
Before going to solution part, I have some queries related to input data. Could you please share your thoughts for these scenarios?
a) If we are having a single record, what sould we do for matched records for Action and non matched records?
ID Start_Date End_Date Action Cost
ID1 Date_S1 Date_E1 Action1 1000
ID2 Date_S2 Date_E2 Action2 200
b) If we are having more than two records, what should we do for matched and non matched ids?
ID Start_Date End_Date Action Cost
ID1 Date_S1 Date_E1 Action1 1000
ID1 Date_S2 Date_E2 Action2 200
ID1 Date_S3 Date_E3 Action3 300
c) Is there any chance of duplication with same Action for same id? If yes, what should we do here?
ID Start_Date End_Date Action Cost
ID1 Date_S1 Date_E1 Action1 1000
ID1 Date_S2 Date_E2 Action2 200
ID1 Date_S3 Date_E3 Action2 300
There might be lot of other permutations and combinations also. Could you please first list down the valid combinations before going to the solution part?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi and thank you for your reply.
a) if we have a single record for an ID it will be written as it is.
b) If we are having more than two, the output for your example will be :
ID Start_Date End_Date Action_Date Action Cost
ID1 Date_S1 Date_E1 Date_S2 Action2 1000
ID1 Date_S2+1 Date_E2 Date_S3 Action3 1200
ID1 Date_S3+1 Date_E3 1500
(PS : I have some actions with cost <>0 that I shouldn't add to the cost so my solution is to change its cost to 0. Is there a smarter solution for this?)
c) The same ID can have duplication with same action and they're considered as different action since they have different dates
ID Start_Date End_Date Action_Date Action Cost
ID1 Date_S1 Date_E1 Date_S2 Action2 1000
ID1 Date_S2+1 Date_E2 Date_S3 Action2 1200
ID1 Date_S3+1 Date_E3 1500
I used the "tMemorizeRows" in my job but it's too long since I have to use an asc sorting for the "cost" and a desc sorting for the "Action_date" and I'm still not sure if I got the right results.
I really appreciate your help.
Best Regards
Nihel Haddad
Hi,
Could you please share the current job screenshots and the component screenshots of how you are handling the tmemorize?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
I created two jobs :
1- First job : to calculate the cost
tfileInputDelimited--> tmap-->tmemorizeRow-->tfileOutputDelimited
2- Second job : to update the action_date
tfileInputDelimited-->tSortRow1-->tmap-->tmemorizeRow-->tSortRow2-->tfileOutputDelimited
Best Regards
Nihel Haddad