Normalize csv file : from rows to columns

_AnonymousUser · ‎2008-11-19

Hello,
I have a complicated thing to do with Talend..
My input is a csv file. Here is its schema :
Id_shop;Id_cust;Id_prod;Year;Month;day1;day2;day3;...;day30;
The fields "day1" to "day30" contain quantities. The quantities of products ("Id_prod") sold by a shop ("Id_Shop") to a customer ("Id_cust") at the date "Month/Year". The field "day1" contains the quantity of the first day of the month and "day30" the quantity of the last day.
I would like to load those data in a oracle table which has this schema :
Id_shop;Id_cust;Id_prod;Date;Quantity
No problem for the Id_shop, Id_cust and Id_prod (it's the primary key) but the field Date is defined by the month, the year and the days from the csv file.
For example :
in the csv file : month = 05 and year = 2008.
in the oracle table, i should have for each key all dates from 01/05/2008 to 30/05/2008
Then, for the date 01/05/2008 the field Quantity contains the quantity written in the field "day1" of the csv file.
Is that clear?
With informatica, there is a module called normalizer which allows to do that.. Is there something equivalent in Talend?
Thanks!
Nico

_AnonymousUser · ‎2008-11-24

Thanks a lot! This component is perfect for my needs.
Just one question : the final schema is composed of 3 columns :
row_key
pivot_key
pivot_value
How can we have at the end the same schema as the original schema? By exploding the "row_key" column thanks to a PERL regular expression?
Thanks,
Nicolep

Anonymous · ‎2008-11-24

How can we have at the end the same schema as the original schema? By exploding the "row_key" column thanks to a PERL regular expression?

In , you will find tExtract*Fields. Currently (Talend Open Studio 3.0), you have tExtractDelimitedFields, tExtractPositionalFields and tExtractRegexFields. tUnpivotRow was designed to be followed by a tExtractDelimitedFields.

_AnonymousUser · ‎2008-11-24

Ok thanks..
This component allows to split the "row_key" but it doesn't allow to load the "pivot_key" and the "pivot_value" and it doesn't support 2 outputs..
How can we split the "row_key" and then load all the columns (with "pivot_key" and "pivot_value")? With a TMap ?

Anonymous · ‎2008-11-24

This component allows to split the "row_key" but it doesn't allow to load the "pivot_key" and the "pivot_value" and it doesn't support 2 outputs..
How can we split the "row_key" and then load all the columns (with "pivot_key" and "pivot_value")? With a TMap ?

tExtractDelimitedFields fills several output fields from a single input field + keep other input fields unchanged (tExtractRegexFields un component reference guide shows this way to use tExtract*Fields component).

_AnonymousUser · ‎2008-11-24

Ok that's right, when we put the same column name ("pivot_key" and "pivot_value") in the output schema of the tExtractDelimitedFile, the column are automatically loaded.
It works fine!
Thanks

Other

Talend Data Integration