Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
MAnywar
Contributor III
Contributor III

Help with split-mapping xml file to template schema to produce multiple output xml files from a single file.

So, I am having a special kind of xml file (multiple of them) of which I need to map to a template.

Sample file.

<root>

<header>

<generated>2013-11-29 00:00:00</generated>

<somestuff/>

</header>

<catalog/>

<article/>

<id>111111</id> xxx<!-- split / aggregate by this -->

<article-details>/>

<description-short>nice article</description-short>

<description-long>really nice article</description-short>

<keyword>keyword A</keyword>

<keyword>keyword B</keyword>

<keyword>keyword C</keyword>

<keyword>keyword D</keyword>

<keyword>keyword E</keyword>

<keyword>keyword F</keyword>

<keyword>keyword G</keyword>

</root>

I need to map individual keyword plus some of the details in the header individually. The detail fields remain constant but not the values. For example

<root>

<header>

<generated>2013-11-29 00:00:00</generated>

<somestuff/>

</header>

<catalog/>

<article/>

<id>date</id>

<article-details>/>

<description-short>nice article</description-short>

<description-long>really nice article</description-short>

<keyword>keyword A</keyword>

</root>

However the file/s have varying number and values for the keywords.

What would be the best way to approach this:

I could do;

tFiles_Input

---->

tFiles_Extract

------>

(Not sure what have here)------>tMap_xml ˜{I dont know if the spliting could be done from here either}

-------->tFilesOutputXML.

I surely will be grateful for any assistance or help offered.

Thank you.

1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi

You are iterating multiple files, make sure the 'Clean cache after reading' box is checked on tHashInput component to clean the data for the current file after it is reading.

I don't understand the job design in your first screenshot, I see you are using a tRunJob to call a child job, but you don't move the processing to child job as I suggested.

0693p00000AcOySAAV.png 

View solution in original post

12 Replies
Anonymous
Not applicable

Hi

Just want to confirm what are your expected result, do you want to generate different xml file for each Keyword? and what does the output file name look like?

 

Regards

Shong

MAnywar
Contributor III
Contributor III
Author

Hello @Shicong Hong​  Thanks so much for the quick reply.

 

In this case, my expected result would be different xml file for each Keyword as you have mentioned , but yes, containing the constant fields that is shared among all the Keywords.

 

The expected output file name would be {{INPUTFILE_NAME + (Counter +n)}.xml. I am thinking this would be the best way to solve the output name issue. (The input file name + counter, which counter would follow order of read of the keywords. ) e.g InputFileName_1.xml for Keyword A, InputFileName_2.xml for Keyword B InputFileName_3.xml for Keyword C etc.

But if InputFileName_KeywordA.xml for Keyword A and InputFileName_KeywordB.xml for Keyword B is also possible, I would be glad to be guided in attainment of both outputs.

 

Thanks so much @Shicong Hong​ 

 

 

 

MAnywar
Contributor III
Contributor III
Author

One detail I did not mention was that under the Keywords, exist other child fields(Sub-Keywords) that provide details about the Keyword.

e.g

<keyword>keyword A</keyword>

<sub1-keyword>sub1</sub1-keyword>

<sub2-keyword>sub2</sub2-keyword>

<sub3-keyword>sub3</sub1-keyword>

<sub3-keyword>sub4</sub2-keyword>

<keyword>keyword B</keyword>

<sub1-keyword>sub_1</sub1-keyword>

<sub2-keyword>sub_2</sub2-keyword>

<sub3-keyword>sub_3</sub1-keyword>

<sub3-keyword>sub_4</sub2-keyword>

<keyword>keyword C</keyword>

<sub1-keyword>sub_K</sub1-keyword>

<sub2-keyword>sub_X</sub2-keyword>

<sub3-keyword>sub_G</sub1-keyword>

<sub3-keyword>sub_L</sub2-keyword>

 

But these sub-keyword fields are the same, except their values are always different.

MAnywar
Contributor III
Contributor III
Author

SO I tired to split it in this format, but already the tExtract is not lopping through the repeating field as I would expect. The resultant of the Extract loping through gives me the right number of looping, but the values are the same, none is changing. as seen in the 2nd screenshot.0693p00000Ac5nBAAR.png0693p00000Ac5rhAAB.pngSo as seen about the Values of fields DG1_1, CE_1, CE_2, CE_3, CE_1 are different but are encapsulated under a main DG1 field which is looped through . E.g on this specific file, DG1 appears 4 times and is the number of times the loop has occurred, but only that the loop is not changing the values.

MAnywar
Contributor III
Contributor III
Author

Been able to split a and now will have to join each pair and then map to an output which output I would like to have the fileName+ (DG1_1_Number) as seen in the logRow output screenshot. This is where I now most probably need some help and guidance.

0693p00000Ac9fRAAR.pngLog RowOutput

@Shicong Hong​ So, I found thatr it would be better to have the File name using the file own parameters, hence would like to have the output name as {(file_name+(DGE_1_Number)} The DG_1_Number is the connecting value that links a pair as seen in the screenshot below.

0693p00000Ac9v4AAB.png

MAnywar
Contributor III
Contributor III
Author

Final and expected output. Only issue now is naming the output basing on the InputFile_Name+(FieldValue) as stated in the previous post. Will be happy for any guidance from anybody.

Thank you.🙏

Anonymous
Not applicable

Hi

Please redesign your job a bit as seen below:

main job:

tFileList--iterate--tRunJob

 

tRunJob: call the child job and pass the current file path to child job, please refer to this article to learn how to pass the file path to child job.

https://help.talend.com/r/HavZ1pLN5PZ~FuTJY51TEQ/~c3ZfKEEXBag51N2ux_tBQ

 

Child job:

tFileInputXML.........tHashOutput1

tHashOutput2

 

|onsubjobok

tHashInput1-----main---tMap---out1-tFlowToIterate--iterate--tFixeFlowInput---main--tXMLMap---tFileOutputXML

|lookup

thashInput2

 

on tMap, do an inner join based on DG_1_Number to merge the columns from the main flow and lookup flow.

In order to generate one file for each row, use a tFlowToIterate to iterate each row, and you are able to access the current field value on tFIleOutputXML as file path, eg:

"D:/file/filename_"+(String)globalMap.get("out1.DG_1_Number ")+".xml"

 

tFixedFlowInput: generate the current row, define the same schema as out1 from tMap, and configure the value in 'Use single table' model. eg;

column value

DG_1_Number out1.DG_1_Number

 

Please try and let me know if you have any questions.

 

Regards

Shong

 

MAnywar
Contributor III
Contributor III
Author

Image is not available

Hi @Shicong Hong​ Thanks so much for your help, so on re-design, process is failing at the tFixedFlowInput. Not sure if I did something wrongly.

0693p00000AcFnLAAV.pngAlso, tMap inner join screenshot.

0693p00000AcFo4AAF.png 

Here is the error I get

0693p00000AcFzRAAV.png 

Anonymous
Not applicable

The expression in the value field is not right, the right expression is: rowName.columnName, for example, in your case:

MainOut.DG_1_Number