Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Combining data from delimited files to produce xml doc for posting to solr

Hi All,

New to Talend

I am attempting to combine 2 delimited files to produce a document that can be submitted to solr
My inputs are approx. as follows (simplified, but hopefully communicates the requirement)
I would eventually expect this requirement to increase to cover more than just x2 delimited files

prod.csv
prod_id  name

sku.csv
sku_id  prod_id  name

For each row in prod.csv, I would like to generate a document structure that can be posted to solr in the following format

<doc>
  <field name="prod_uid">prod.csv_prod_id</field>
  <field name="prod_skus">[sku.csv.sku_id, sku.csv.sku_id,sku.csv.sku_id,]</field>
</doc>

 

I have spent time trying to work with tMAP and tXMLMap (and various other components) and whilst I have got somewhere, I typically end up posting (viatRESTClient) individual documents for every product+sku combination, rather than grouping the sku_ids in a single element for a product
There are lots of examples in the forum, but nothing I have been table to take to solve this (what I am assuming is) simple problem


All help welcomed, thanks

Labels (4)
1 Reply
nfz11
Creator III
Creator III

The key component you need here is tAggregateRow.  If you group by prod_id and aggregate the sku_ids into a list it should be straightforward:

Here is my test input and output:

Starting job TestXmlArray at 00:17 07/07/2019.

[statistics] connecting to socket on port 3925
[statistics] connected
.--------------------.
|   #1. tLogRow_1    |
+-----------+--------+
| key       | value  |
+-----------+--------+
| prod_id   | 1      |
| prod_name | Orange |
+-----------+--------+

.-------------------.
|   #2. tLogRow_1   |
+-----------+-------+
| key       | value |
+-----------+-------+
| prod_id   | 2     |
| prod_name | Apple |
+-----------+-------+

.-------------------.
|   #3. tLogRow_1   |
+-----------+-------+
| key       | value |
+-----------+-------+
| prod_id   | 3     |
| prod_name | Pear  |
+-----------+-------+

.-----------------.
|  #1. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 10    |
| prod_id | 1     |
+---------+-------+

.-----------------.
|  #2. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 21    |
| prod_id | 2     |
+---------+-------+

.-----------------.
|  #3. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 22    |
| prod_id | 2     |
+---------+-------+

.-----------------.
|  #4. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 31    |
| prod_id | 3     |
+---------+-------+

.-----------------.
|  #5. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 32    |
| prod_id | 3     |
+---------+-------+

.-----------------.
|  #6. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 33    |
| prod_id | 3     |
+---------+-------+

.--------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                     #1. tLogRow_2                                                                      |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value                                                                                                                                            |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------+
| xml | <?xml version="1.0" encoding="ISO-8859-15"?>

<doc>
  <prod_id>1</prod_id>
  <prod_name>Orange</prod_name>
  <prod_skus>[10]</prod_skus>
</doc>
 |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------+

.-----------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                       #2. tLogRow_2                                                                       |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value                                                                                                                                               |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| xml | <?xml version="1.0" encoding="ISO-8859-15"?>

<doc>
  <prod_id>2</prod_id>
  <prod_name>Apple</prod_name>
  <prod_skus>[21, 22]</prod_skus>
</doc>
 |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+

.--------------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                        #3. tLogRow_2                                                                         |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value                                                                                                                                                  |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| xml | <?xml version="1.0" encoding="ISO-8859-15"?>

<doc>
  <prod_id>3</prod_id>
  <prod_name>Pear</prod_name>
  <prod_skus>[31, 32, 33]</prod_skus>
</doc>
 |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+

[statistics] disconnected

Job TestXmlArray ended at 00:17 07/07/2019. [exit code=0]

See attached screenshot of the test job with the details of the tAggregateRow.  prod_sku_ids is of type List in the schema.

0683p000009M6C6.png

In the tMap you do an inner join on the prod id:
0683p000009M6CB.png