Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
I have a requirement where I get data from a source file and need to load it to target table.
The requirement is that in the source file for each row per ID and per TERM, there will be a APPLICATION_START_DATE and DECISION_DATE.
I need to load each row to target in such a way that each row should be loaded as multiple rows to target based on APPLICATION_START_DATE and MAX(DECISION_DATE)
For Example below is how my source file looks like. The max(DECISION_DATE) on source file is 8-Aug.
If we consider ID 100, this row should be loaded as 8 rows to target as the APPLICATION_START_DATE = 1-Aug and max(DECISION_DATE) = 8-Aug and also the SNAPSHOT_DATE should be populated as the date range between the APPLICATION_START_DATE and max(DECISION_DATE).
Below is the expected output.
Please help me with this as this is a critical requirement for me.
Thanks a lot.
Hi,
I believe you are looking for below output.
Please refer below screenshots for the steps.
First subjob will read the input file to find the maximum end date in the file and store it to a context variable.
The next step is to find the interval between start date and maximum date for each id and store to an interim hash table. Read the original input file again and pass it to tMap as shown below.
TalendDate.diffDate(context.max_decision_date,row3.START_DATE ,"dd") +1
Change the datatype to Integer
Now, read the file from hash as driving table as shown below.
Use iteration to process the data for each line as shown below.
Add the original file as lookup to tMap and use the addDate function as shown in screenshot.
The function to be sued is as shown below.
TalendDate.addDate(row7.START_DATE,row6.count,"dd")
And you will get output! Please spare a second to mark the topic as resolved and Kudos will be a bonus 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Hi,
I believe you are looking for below output.
Please refer below screenshots for the steps.
First subjob will read the input file to find the maximum end date in the file and store it to a context variable.
The next step is to find the interval between start date and maximum date for each id and store to an interim hash table. Read the original input file again and pass it to tMap as shown below.
TalendDate.diffDate(context.max_decision_date,row3.START_DATE ,"dd") +1
Change the datatype to Integer
Now, read the file from hash as driving table as shown below.
Use iteration to process the data for each line as shown below.
Add the original file as lookup to tMap and use the addDate function as shown in screenshot.
The function to be sued is as shown below.
TalendDate.addDate(row7.START_DATE,row6.count,"dd")
And you will get output! Please spare a second to mark the topic as resolved and Kudos will be a bonus 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Any update please....Some one please help me on this.
I believe I already answered your query!
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Hi Nikhil,
Sorry I was posting my reply the same time you replied to my post. I am following the steps you provided and will update once I get the expected result. Thanks a lot for the quick response.
Hi Nikhil,
My apologies and I just realised that the incoming file I am getting will not contain just one term but there will be multiple terms.
So I need to populate the data based on max(DECISION_DATE) for each ID per TERM.
Below is the file.
Below is the expected output.
Also I tried the first flow you mention in your post and getting below error. Not sure where Im missing. Please suggest.
Hi,
For the original job, you will have to add the context variable. Sorry! I missed it in original screenshot list.
For the new requirement, you can do it in couple of ways. The most easy way is to segregate the data to multiple input files based on the term and follow the current method. You will have to do the current step and call the current job from a parent job.
In the parent job, you can send one input file name at a time (assuming you have separated the input data to multiple files). There are various other methods too. I would recommend you to play with this requirement for couple of days since you already have the logic for the core process. May be a homework or a practical use case experiment to put the thoughts to practice 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Thank you Nikhil. Will try the approach of splitting the data to multiple files and follow the approach you recommended. Thanks Again.
Hi,
Thanks! Since I have resolved your original query, I would really appreciate if you could mark the post as answered by marking my earlier post containing the resolution. It will help the Talend community members during their reference.
Of course, it will give me also a moment of happiness that I have helped a fellow Talend community member 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
Re: Insert one source row as multiple rows
EX:
INSERT : A | A1 | A11 | B1| B11| C1| C11
--->
SAVE AS DATABASE
A1 | A11
B1 | B11
C1 | C11