Solved: Re: Full Extraction XML field - Qlik Community

Informatique1 · ‎2023-03-22

Hello,

I have this kind of XML architecture:

<?xml version="1.0"?>

<LABEL>test product</LABEL>

</ATTRIBUTES>

<MEDIA>

<FILENAME>666666_test_product.jpg</FILENAME>

<MEDIA_TYPE>==</MEDIA_TYPE>

<URL>https://6666_test_product.jpg?1643353703</URL>

</MEDIA>

<MEDIA>

<FILENAME>666666_test_product2.jpg</FILENAME>

<MEDIA_TYPE>==</MEDIA_TYPE>

<URL>https://666666_test_product2.jpg?1595193757</URL>

</MEDIA>

<MEDIA>

<FILENAME>666668_test_product3.jpg</FILENAME>

<MEDIA_TYPE>==</MEDIA_TYPE>

<URL>https://666666_test_product3.jpg?1595193758</URL>

</MEDIA>

</MEDIAS>

<AAAA>

<BBB>

<ID/>

<CODE/>

</ATTRIBUTES>

</BBB>

</AAAA>

<CCCC>

<DD>

<ID/>

<CODE/>

</ATTRIBUTES>

</DD>

</CCCC>

</PRODUCT>

</PRODUCTS>

</CATALOG>

How to extract every fields when the medias loop could be different in size between 2 extraction?

Should I loop first on MEDIAS, and then loop on products, and join the 2 loops in a tMap for example?

Thanks a lot

Anonymous · ‎2023-04-04

Yes, tExtractXMLFields component does not support extracting multiple loop element at a time, so you have to do multiple extractions and join all columns back if needed.

View solution in original post

Informatique1 · ‎2023-03-22

Anonymous · ‎2023-03-22

Hello

What do you meant the medias loop could be different? Can you take an example?

In this example file, you can extract all fields under MEDIA element.

Regards

Shong

Informatique1 · ‎2023-03-27

Hello,

Thanks for the answer.

The number of <MEDIA> inside <MEDIAS> can be different on day to another.

It can be just one media, it can be 10 media the next day.

I know I can loop on medias like you do, but I'm concern about get the <ID> and <CODE> (for exemple) at the same time.

Because I also can have different number of <PRODUCT> inside <PRODUCTS>.

My main root is <PRODUCTS>.

But it can have many <PRODUCT>, and then all the product can have many <MEDIA>.

This is where I don't know how to extract at the same time all the data because I have loops inside loops...

Thanks

Anonymous · ‎2023-03-27

From your description, I think loop on media element as shown in screenshot, you can get all media data + <ID> and <CODE> at the same time from different products, the output result looks like:

ID;CODE;FILENAME;URL

1;code1;filename1;URL1

1;code1;filename2;URL2

1;code1;filename3;URL3

2;code2;filename4;URL4

2;code2;filename5;URL5

...

This is because your file structure is loops inside loops, not different loop elements at the same level，isn't it?

Informatique1 · ‎2023-03-30

Yes this it what I have so far, but I did not want to repeat the ID and CODE many times.

Not sure how to convert in one row in a CSV with something like that:

ID;CODE;FILENAME;URL

1;code1;filename1,filename2,filename3,filename4,filename5;url1,url2,url3,url4,url5

Thanks

Anonymous · ‎2023-03-30

OK, I think you need to extract data two times:

the first time

extract id,code,filename columns;

1;code1;filename1

1;code1;filename2

2;code2;filename1

2;code2;filename2

then use tDenormalize component to convert multiple rows to one rows

1;code1;filename1,filename2

2;code2;filename1,filename2

the second time:

extract id,code,url columns;

1;code1;url1

1;code1;url2

2;code2;url1

2;code2;url2

then use tDenormalize component to convert multiple rows to one rows

1;code1;url1,url2

2;code2;url1,url2

In next subjob, do an inner join between the above two results to merge all columns.

Regards

Shong

Informatique1 · ‎2023-04-04

And what if I have another loop in my xml?

For exemple:

<ID>

<CODE>

<PRODUCTS

<url>

</PRODUCTS>

<label>

<size>

</ATTRIBUTES>

For exemple if I need to loop on PRODUCTS, and also on ATTRIBUTES, should I do 3 extractions?

First extraction: ID and CODE

2nd extraction: PRODUCTS

3rd extraction: ATTRIBUTES

And then join all in a tMap?

Thanks a lot

Anonymous · ‎2023-04-04

Yes, tExtractXMLFields component does not support extracting multiple loop element at a time, so you have to do multiple extractions and join all columns back if needed.

Informatique1 · ‎2023-05-17

Hello,

I tried to develop by extracting from my MEDIA loop, denormalize to have this:

1;code1;filename1,filename2

2;code2;filename1,filename2

Problem is that I have this from my xml since the number of MEDIA is different from one product to another:

1;code1;filename1,filename2,filename3

2;code2;filename1

3;code3;filanem1,filename2

And what I try is to recreate the XML after changing some things in a tXMLMap but I don't get it how to loop to a new XML.

The job for the moment looks like this:

I have to read the XML file, add some rules in the XMLMap, then re-create the XML.

In that version if the structure of the output XML is ok, the problem is in the MEDIA loop because it does not recreate the loop but just put everything in the same object as it is the same column:

I don't get it how I'm supposed to do when the number of column is constantly different and how to re-compose the XML in the output.

Full Extraction XML field

Talend Data Integration

v8.x

XML