Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Informatique1
Contributor III
Contributor III

Full Extraction XML field

Hello,

I have this kind of XML architecture:

<?xml version="1.0"?>

<CATALOG>

<PRODUCTS>

<PRODUCT>

<ID>1111</ID>

<CODE>6666666</CODE>

<ATTRIBUTES>

<LABEL>test product</LABEL>

</ATTRIBUTES>

<MEDIAS>

<MEDIA>

<FILENAME>666666_test_product.jpg</FILENAME>

<MEDIA_TYPE>==</MEDIA_TYPE>

<URL>https://6666_test_product.jpg?1643353703</URL>

</MEDIA>

<MEDIA>

<FILENAME>666666_test_product2.jpg</FILENAME>

<MEDIA_TYPE>==</MEDIA_TYPE>

<URL>https://666666_test_product2.jpg?1595193757</URL>

</MEDIA>

<MEDIA>

<FILENAME>666668_test_product3.jpg</FILENAME>

<MEDIA_TYPE>==</MEDIA_TYPE>

<URL>https://666666_test_product3.jpg?1595193758</URL>

</MEDIA>

</MEDIAS>

<AAAA>

<BBB>

<ID/>

<CODE/>

<ATTRIBUTES>

<LABEL/>

</ATTRIBUTES>

</BBB>

</AAAA>

<CCCC>

<DD>

<ID/>

<CODE/>

<ATTRIBUTES>

<LABEL/>

</ATTRIBUTES>

</DD>

</CCCC>

</PRODUCT>

</PRODUCTS>

</CATALOG>

 

How to extract every fields when the medias loop could be different in size between 2 extraction?

Should I loop first on MEDIAS, and then loop on products, and join the 2 loops in a tMap for example?

Thanks a lot

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Yes, tExtractXMLFields component does not support extracting multiple loop element at a time, so you have to do multiple extractions and join all columns back if needed.

 

View solution in original post

9 Replies
Informatique1
Contributor III
Contributor III
Author

0695b00000fI6uKAAS.png

Anonymous
Not applicable

Hello

What do you meant the medias loop could be different? Can you take an example?

In this example file, you can extract all fields under MEDIA element.

0695b00000fICE0AAO.png 

Regards

Shong

Informatique1
Contributor III
Contributor III
Author

Hello,

 

Thanks for the answer.

 

The number of <MEDIA> inside <MEDIAS> can be different on day to another.

It can be just one media, it can be 10 media the next day.

I know I can loop on medias like you do, but I'm concern about get the <ID> and <CODE> (for exemple) at the same time.

Because I also can have different number of <PRODUCT> inside <PRODUCTS>.

 

My main root is <PRODUCTS>.

But it can have many <PRODUCT>, and then all the product can have many <MEDIA>.

This is where I don't know how to extract at the same time all the data because I have loops inside loops...

 

Thanks

Anonymous
Not applicable

From your description, I think loop on media element as shown in screenshot, you can get all media data + <ID> and <CODE> at the same time from different products, the output result looks like:

ID;CODE;FILENAME;URL

1;code1;filename1;URL1

1;code1;filename2;URL2

1;code1;filename3;URL3

2;code2;filename4;URL4

2;code2;filename5;URL5

...

This is because your file structure is loops inside loops, not different loop elements at the same level,isn't it?

Informatique1
Contributor III
Contributor III
Author

Yes this it what I have so far, but I did not want to repeat the ID and CODE many times.

 

Not sure how to convert in one row in a CSV with something like that:

ID;CODE;FILENAME;URL

1;code1;filename1,filename2,filename3,filename4,filename5;url1,url2,url3,url4,url5

 

Thanks

Anonymous
Not applicable

OK, I think you need to extract data two times:

the first time

extract id,code,filename columns;

1;code1;filename1

1;code1;filename2

2;code2;filename1

2;code2;filename2

 

then use tDenormalize component to convert multiple rows to one rows

1;code1;filename1,filename2

2;code2;filename1,filename2

 

the second time:

extract id,code,url columns;

1;code1;url1

1;code1;url2

2;code2;url1

2;code2;url2

 

then use tDenormalize component to convert multiple rows to one rows

1;code1;url1,url2

2;code2;url1,url2

 

In next subjob, do an inner join between the above two results to merge all columns.

 

Regards

Shong

 

Informatique1
Contributor III
Contributor III
Author

And what if I have another loop in my xml?

For exemple:

<ID>

<CODE>

<PRODUCTS

<product>

<filename>

<url>

</PRODUCTS>

<ATTRIBUTES>

<label>

<size>

</ATTRIBUTES>

 

For exemple if I need to loop on PRODUCTS, and also on ATTRIBUTES, should I do 3 extractions?

First extraction: ID and CODE

2nd extraction: PRODUCTS

3rd extraction: ATTRIBUTES

 

And then join all in a tMap?

 

Thanks a lot

Anonymous
Not applicable

Yes, tExtractXMLFields component does not support extracting multiple loop element at a time, so you have to do multiple extractions and join all columns back if needed.

 

Informatique1
Contributor III
Contributor III
Author

Hello,

 

I tried to develop by extracting from my MEDIA loop, denormalize to have this:

1;code1;filename1,filename2

2;code2;filename1,filename2

 

Problem is that I have this from my xml since the number of MEDIA is different from one product to another:

1;code1;filename1,filename2,filename3

2;code2;filename1

3;code3;filanem1,filename2

 

And what I try is to recreate the XML after changing some things in a tXMLMap but I don't get it how to loop to a new XML.

The job for the moment looks like this:

0695b00000hsFbbAAE.pngI have to read the XML file, add some rules in the XMLMap, then re-create the XML.

In that version if the structure of the output XML is ok, the problem is in the MEDIA loop because it does not recreate the loop but just put everything in the same object as it is the same column:

0695b00000hsFcAAAU.png 

I don't get it how I'm supposed to do when the number of column is constantly different and how to re-compose the XML in the output.