Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Select a specifc XML node (from a repeating node) with TXMLMap

I'm currently building a DataWarehouse and I have to fetch all sales data from POS into DB. I am able to extract and load the data to MSSQL DB, but for one of the xml node I actually would like to get the second node instead. The XML schema with the desire output as below:

 

0683p000009Lvo8.png

 

The job is very simple like below.

0683p000009LvoI.png

 

If you notice here the <CloseTime> having 2 child nodes one for previous day business day and EOD, another for the current day bizday and eod.

I need to fetch the second node to combine with the sales header information. 

 

0683p000009Lv93.png


Currently only the first node is fetch and successfully combine with the sales header information, but it should be the second node to be combine with the sales header. How do I achieve this in TXMLMap? Or do I need to use another component just to do that?

 

This is my first time using Talend so if you could recommend a solution with steps it would be best!

 

Thanks.

Labels (3)
18 Replies
Anonymous
Not applicable
Author

It's difficult to say for sure because I don't have your data (and this takes a little bit of trial and error in a lot of cases), but I would suggest putting your tMap after the tExtractXMLField (line only).

 

Another thing that might solve a few issues for you. Lets say the your first tExtractXMLField component output the following columns....

 

id (int)
name (String)

age (int)

header (Document)

 

...and you want to process the header document in the next tExtractXMLField component BUT want to output the id, name and age. You can simply add the id, name and age columns to your next tExtractXMLField component and leave them blank where you would normally configure the XPath. This will act as a pass through. So your next column config in the second tExtractXMLField component might look like below....

id (int)
name (String)

age (int)

header_data_1 (String)

header_data_2 (String)

header_data_3 (String)

header_data_4 (String)

 

This is quite an abstracted example, but have a play and see what you get.

 

Anonymous
Not applicable
Author


@rhall wrote:

It's difficult to say for sure because I don't have your data (and this takes a little bit of trial and error in a lot of cases), but I would suggest putting your tMap after the tExtractXMLField (line only).

 

Another thing that might solve a few issues for you. Lets say the your first tExtractXMLField component output the following columns....

 

id (int)
name (String)

age (int)

header (Document)

 

...and you want to process the header document in the next tExtractXMLField component BUT want to output the id, name and age. You can simply add the id, name and age columns to your next tExtractXMLField component and leave them blank where you would normally configure the XPath. This will act as a pass through. So your next column config in the second tExtractXMLField component might look like below....

id (int)
name (String)

age (int)

header_data_1 (String)

header_data_2 (String)

header_data_3 (String)

header_data_4 (String)

 

This is quite an abstracted example, but have a play and see what you get.

 


Oh man, the highlighted part is the life saver! I tried to figure it out and to no avail and your clue just come in time!

 

 

Anonymous
Not applicable
Author

I thought that might be a problem. It all seems to logical until you try to figure out how to pass extracted data through a tExtractXMLField component for the first time 🙂

Talend is a really powerful tool, but there is a lot that is not made immediately obvious

Anonymous
Not applicable
Author

Hmm I think this is not what I want to achieve still..

 

Because the tmap after tExtractXMLField (line only) will not get the fields for header. Means I have to get back to one tExtractXMLField with the loop on <line>. So for e.g. my xml ideally have 400 <header> and 1000 <line> (it's a 1-m relationship), but since I loop on <line> I will get 1000 <header> with 1000 <line>.

 

If I put this in tMap, will it be able to separate into 400 sales header and 1000 sales line into DB? If so how do I configure tMap to do so?

 

Anonymous
Not applicable
Author

I will need to see a complete XML structure. You mention "Line" but I am not sure how that relates to "Header". You say that when you get the Line records you get the same amount of Header records. This makes sense. You are flattening the data into rows. This will mean that the following structure....

<xml>
<data1>d1</data1>
<data2>d2</data2>
<data3>d3</data3>
 <loops>
<loop>
<ldata1>ld1a</ldata1>
<ldata2>ld2a</ldata2>
</loop>
<loop>
<ldata1>ld1b</ldata1>
<ldata2>ld2b</ldata2>
</loop>
<loop>
<ldata1>ld1c</ldata1>
<ldata2>ld2c</ldata2>
</loop>
</loops>
</xml>

....will look like below when it is flattened....

d1,d2,d3,ld1a,ld2a
d1,d2,d3,ld1b,ld2b
d1,d2,d3,ld1c,ld2c

The first 3 columns are multiplied by the number of loops. This actually makes perfect sense. If you only want to see the data once, then you need to output to two tables (in this case).

You can use logic to do that in your case. You can either use an existing piece of data as a "key" to see when each loop has finished, or you can add one. Now if I were to split this data to two outputs (which would make sense....two tables), I would put these columns into a tMap and send the first 3 to one output and the last 2 to the other output. I would then use a tAggregateRow to group my data (reduce duplicates to 1 row) using the "key" I mentioned earlier.

 

It takes a bit of thinking about, but it is quite straight forward once you've logically solved it. You have to consider XML as a mini relational database. Think of it like that (considering possible primary keys and foreign keys) and you will crack this.

 

 

Anonymous
Not applicable
Author

Thanks for that! Will definitely try on that.
One side question. I'm actually building a data warehouse based on the sales data from POS, what do you think should I store the data in the flattened data (denormalised) or should I normalise it? These data is actually store for future reference in case I need to build different data mart to answer different questions.

After the first load from XML to DB, I'm going to create different table for different summarise data such as sales by item by store per day, sales by transaction count by item by store per day etc.

I understand that with denomalize data it is better in read performance, which I supposed Datewarehouse is more heavily on reading for analytics purpose. What is your saying on this? (This is my first time building data warehouse so need some advise)
Anonymous
Not applicable
Author

You shouldn't really ask multiple questions in one question because useful answers to the community can get lost. However, this isn't really specific to Talend. The question of normalized vs denormalized data is a lot more complicated than what I'd feel comfortable answering in a few lines. It also requires a lot more information. You have a grasp of the basics (from your description), what I suggest you do is learn more about your requirements (both functional and non-functional) and decide based on that. There is little point normalizing data that will ultimately be queried in the same way it was entered....unless of course space is an issue. It's always a weigh up between space and time

Anonymous
Not applicable
Author

I'm able to achieve what I wanted, but sometimes the CloseTime <Day> element only appear once (when the store is newly opened and no previous <Day> element). Currently I am using TMap and the expression is /CloseTime/Day[2]/bizDate, but if the <Day> node appear once then this will return null. How do I specify if it is null then take Day[1]/bizDate instead? 

Anonymous
Not applicable
Author

Return both values and then decide (maybe in a tMap) based on whether the 2nd date returned is null or not.