Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
New to Talend
I am attempting to combine 2 delimited files to produce a document that can be submitted to solr
My inputs are approx. as follows (simplified, but hopefully communicates the requirement)
I would eventually expect this requirement to increase to cover more than just x2 delimited files
prod.csv
prod_id name
sku.csv
sku_id prod_id name
For each row in prod.csv, I would like to generate a document structure that can be posted to solr in the following format
<doc>
<field name="prod_uid">prod.csv_prod_id</field>
<field name="prod_skus">[sku.csv.sku_id, sku.csv.sku_id,sku.csv.sku_id,]</field>
</doc>
I have spent time trying to work with tMAP and tXMLMap (and various other components) and whilst I have got somewhere, I typically end up posting (viatRESTClient) individual documents for every product+sku combination, rather than grouping the sku_ids in a single element for a product
There are lots of examples in the forum, but nothing I have been table to take to solve this (what I am assuming is) simple problem
All help welcomed, thanks
The key component you need here is tAggregateRow. If you group by prod_id and aggregate the sku_ids into a list it should be straightforward:
Here is my test input and output:
Starting job TestXmlArray at 00:17 07/07/2019. [statistics] connecting to socket on port 3925 [statistics] connected .--------------------. | #1. tLogRow_1 | +-----------+--------+ | key | value | +-----------+--------+ | prod_id | 1 | | prod_name | Orange | +-----------+--------+ .-------------------. | #2. tLogRow_1 | +-----------+-------+ | key | value | +-----------+-------+ | prod_id | 2 | | prod_name | Apple | +-----------+-------+ .-------------------. | #3. tLogRow_1 | +-----------+-------+ | key | value | +-----------+-------+ | prod_id | 3 | | prod_name | Pear | +-----------+-------+ .-----------------. | #1. tLogRow_3 | +---------+-------+ | key | value | +---------+-------+ | sku_id | 10 | | prod_id | 1 | +---------+-------+ .-----------------. | #2. tLogRow_3 | +---------+-------+ | key | value | +---------+-------+ | sku_id | 21 | | prod_id | 2 | +---------+-------+ .-----------------. | #3. tLogRow_3 | +---------+-------+ | key | value | +---------+-------+ | sku_id | 22 | | prod_id | 2 | +---------+-------+ .-----------------. | #4. tLogRow_3 | +---------+-------+ | key | value | +---------+-------+ | sku_id | 31 | | prod_id | 3 | +---------+-------+ .-----------------. | #5. tLogRow_3 | +---------+-------+ | key | value | +---------+-------+ | sku_id | 32 | | prod_id | 3 | +---------+-------+ .-----------------. | #6. tLogRow_3 | +---------+-------+ | key | value | +---------+-------+ | sku_id | 33 | | prod_id | 3 | +---------+-------+ .--------------------------------------------------------------------------------------------------------------------------------------------------------. | #1. tLogRow_2 | +-----+--------------------------------------------------------------------------------------------------------------------------------------------------+ | key | value | +-----+--------------------------------------------------------------------------------------------------------------------------------------------------+ | xml | <?xml version="1.0" encoding="ISO-8859-15"?> <doc> <prod_id>1</prod_id> <prod_name>Orange</prod_name> <prod_skus>[10]</prod_skus> </doc> | +-----+--------------------------------------------------------------------------------------------------------------------------------------------------+ .-----------------------------------------------------------------------------------------------------------------------------------------------------------. | #2. tLogRow_2 | +-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+ | key | value | +-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+ | xml | <?xml version="1.0" encoding="ISO-8859-15"?> <doc> <prod_id>2</prod_id> <prod_name>Apple</prod_name> <prod_skus>[21, 22]</prod_skus> </doc> | +-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+ .--------------------------------------------------------------------------------------------------------------------------------------------------------------. | #3. tLogRow_2 | +-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | key | value | +-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | xml | <?xml version="1.0" encoding="ISO-8859-15"?> <doc> <prod_id>3</prod_id> <prod_name>Pear</prod_name> <prod_skus>[31, 32, 33]</prod_skus> </doc> | +-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+ [statistics] disconnected Job TestXmlArray ended at 00:17 07/07/2019. [exit code=0]
See attached screenshot of the test job with the details of the tAggregateRow. prod_sku_ids is of type List in the schema.
In the tMap you do an inner join on the prod id: