Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
So, I am having a special kind of xml file (multiple of them) of which I need to map to a template.
Sample file.
<root>
<header>
<generated>2013-11-29 00:00:00</generated>
<somestuff/>
</header>
<catalog/>
<article/>
<id>111111</id> xxx<!-- split / aggregate by this -->
<article-details>/>
<description-short>nice article</description-short>
<description-long>really nice article</description-short>
<keyword>keyword A</keyword>
<keyword>keyword B</keyword>
<keyword>keyword C</keyword>
<keyword>keyword D</keyword>
<keyword>keyword E</keyword>
<keyword>keyword F</keyword>
<keyword>keyword G</keyword>
</root>
I need to map individual keyword plus some of the details in the header individually. The detail fields remain constant but not the values. For example
<root>
<header>
<generated>2013-11-29 00:00:00</generated>
<somestuff/>
</header>
<catalog/>
<article/>
<id>date</id>
<article-details>/>
<description-short>nice article</description-short>
<description-long>really nice article</description-short>
<keyword>keyword A</keyword>
</root>
However the file/s have varying number and values for the keywords.
What would be the best way to approach this:
I could do;
tFiles_Input---->
tFiles_Extract------>
(Not sure what have here)------>tMap_xml ˜{I dont know if the spliting could be done from here either}
-------->tFilesOutputXML.
I surely will be grateful for any assistance or help offered.
Thank you.
Hi
You are iterating multiple files, make sure the 'Clean cache after reading' box is checked on tHashInput component to clean the data for the current file after it is reading.
I don't understand the job design in your first screenshot, I see you are using a tRunJob to call a child job, but you don't move the processing to child job as I suggested.
Hi
Just want to confirm what are your expected result, do you want to generate different xml file for each Keyword? and what does the output file name look like?
Regards
Shong
Hello @Shicong Hong Thanks so much for the quick reply.
In this case, my expected result would be different xml file for each Keyword as you have mentioned , but yes, containing the constant fields that is shared among all the Keywords.
The expected output file name would be {{INPUTFILE_NAME + (Counter +n)}.xml. I am thinking this would be the best way to solve the output name issue. (The input file name + counter, which counter would follow order of read of the keywords. ) e.g InputFileName_1.xml for Keyword A, InputFileName_2.xml for Keyword B InputFileName_3.xml for Keyword C etc.
But if InputFileName_KeywordA.xml for Keyword A and InputFileName_KeywordB.xml for Keyword B is also possible, I would be glad to be guided in attainment of both outputs.
Thanks so much @Shicong Hong
One detail I did not mention was that under the Keywords, exist other child fields(Sub-Keywords) that provide details about the Keyword.
e.g
<keyword>keyword A</keyword>
<sub1-keyword>sub1</sub1-keyword>
<sub2-keyword>sub2</sub2-keyword>
<sub3-keyword>sub3</sub1-keyword>
<sub3-keyword>sub4</sub2-keyword>
<keyword>keyword B</keyword>
<sub1-keyword>sub_1</sub1-keyword>
<sub2-keyword>sub_2</sub2-keyword>
<sub3-keyword>sub_3</sub1-keyword>
<sub3-keyword>sub_4</sub2-keyword>
<keyword>keyword C</keyword>
<sub1-keyword>sub_K</sub1-keyword>
<sub2-keyword>sub_X</sub2-keyword>
<sub3-keyword>sub_G</sub1-keyword>
<sub3-keyword>sub_L</sub2-keyword>
But these sub-keyword fields are the same, except their values are always different.
SO I tired to split it in this format, but already the tExtract is not lopping through the repeating field as I would expect. The resultant of the Extract loping through gives me the right number of looping, but the values are the same, none is changing. as seen in the 2nd screenshot.So as seen about the Values of fields DG1_1, CE_1, CE_2, CE_3, CE_1 are different but are encapsulated under a main DG1 field which is looped through . E.g on this specific file, DG1 appears 4 times and is the number of times the loop has occurred, but only that the loop is not changing the values.
Been able to split a and now will have to join each pair and then map to an output which output I would like to have the fileName+ (DG1_1_Number) as seen in the logRow output screenshot. This is where I now most probably need some help and guidance.
Log RowOutput
@Shicong Hong So, I found thatr it would be better to have the File name using the file own parameters, hence would like to have the output name as {(file_name+(DGE_1_Number)} The DG_1_Number is the connecting value that links a pair as seen in the screenshot below.
Final and expected output. Only issue now is naming the output basing on the InputFile_Name+(FieldValue) as stated in the previous post. Will be happy for any guidance from anybody.
Thank you.🙏
Hi
Please redesign your job a bit as seen below:
main job:
tFileList--iterate--tRunJob
tRunJob: call the child job and pass the current file path to child job, please refer to this article to learn how to pass the file path to child job.
https://help.talend.com/r/HavZ1pLN5PZ~FuTJY51TEQ/~c3ZfKEEXBag51N2ux_tBQ
Child job:
tFileInputXML.........tHashOutput1
tHashOutput2
|onsubjobok
tHashInput1-----main---tMap---out1-tFlowToIterate--iterate--tFixeFlowInput---main--tXMLMap---tFileOutputXML
|lookup
thashInput2
on tMap, do an inner join based on DG_1_Number to merge the columns from the main flow and lookup flow.
In order to generate one file for each row, use a tFlowToIterate to iterate each row, and you are able to access the current field value on tFIleOutputXML as file path, eg:
"D:/file/filename_"+(String)globalMap.get("out1.DG_1_Number ")+".xml"
tFixedFlowInput: generate the current row, define the same schema as out1 from tMap, and configure the value in 'Use single table' model. eg;
column value
DG_1_Number out1.DG_1_Number
Please try and let me know if you have any questions.
Regards
Shong
Image is not available
Hi @Shicong Hong Thanks so much for your help, so on re-design, process is failing at the tFixedFlowInput. Not sure if I did something wrongly.
Also, tMap inner join screenshot.
Here is the error I get
The expression in the value field is not right, the right expression is: rowName.columnName, for example, in your case:
MainOut.DG_1_Number