Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
From a CSV File, using a tFileInputdelimited, I want to transform a column’s elements to then send them to a database. The data has to be transformed before reaching the database by using a “dictionnary” file containing the terms to replace.
Context :
My initial CSV file looks like this :
I want to change the “State” and “Criticity” value to an id (integer) defined like this in my dictionnary :
The id set is a reference to a parameter table in my database that will indicate me what it means:
In my mapping, the term « faible » should take id “2” so that it is translated as « Low » in my database. I simplified the case for the example but I got more or less 20 ids to define.
How could I succeed creating this job in Talend ?
Thx for your help !
Hello !
Okay so I checked with more details all the examples you provided. Very helpful for me to understand your problem.
@Antoine01 wrote:So we need 3 columns in output but we don't know how to do this.
Is it possible to do this with only one Dictionary or should we create a Dictionary for every Column that we want to translate and in this is the case, how can we do it properly ?
It is possible with only one dictionary : I don't think so. For me Talend cannot do that natively, maybe with some custom java's class you can achieve this point, but I don't see, or I don't know, how to do that with only one dictionary.
With your examples I made a job which uses 3 lookup, one for each column. Here's the screenshot below.
How does it works :
Reading your input csv, lookup the 3 columns from your dictionary. In the tMap I'm just using the same as you did before, inner join with the value and return the ID.
The only added a few java code to "match" if your Priority or Criticite in the input file, is present in the "Valeur" column of your dictionary.
Here's the second screenshot :
Details of the "Var" column :
varPriority : row2.VALEUR.contains(row3.Priority)?row2.ID:null
varCriticte : row8.VALEUR.contains(row3.Criticite)?row8.ID:null
I didn't use a var for "IssueType" because it matched perfectly between the input and your dictionary.
The result of that is not optimized, and honestly I don't know how to do it better.
I'm using a UniqRow, because in the tMap there is no key on the left, so I have a big cartesian product between the 2 columns and the input file. (lines_nb_dictionary_priority * lines_nb_dictionary_criticite * lines_nb_nput_files)
Then I've added a tFilter, many lines contained "null" values, because of the cartesian product, so I squizzed them.
The final screenshot with the result :
4 - 11 - 2 : High - Improvement - Low (CHAMP -> PRIORITY)
4 - 11 - 7 : High - Improvement - Low (CHAMP -> CRITICITY)
It might be a bug, I don't know if I had to deal with the column "CHAMP".
Find in attachment the job, so you can import it into your Talend, It may help, keep my in touch
Hey there,
How you can succeed to do this ? It's not hard, here's the solution I made with your example :
First tFileInputDelimited is your input with the following data.
@Antoine01My initial CSV file looks like this :
I'm adding a first Log just to see the result of reading this file. Then you put a tMap, and add a second tFileOutDelimited.
Above this is the tMap. Row3 corresponding to the first input, row2 is your dictionnary. You drag&drop the column with Priority of your input, to the column "Criticity" which is the "key" between theses two files ("faible" in your example). Then you just want to keep this id, so drag&drop the ID to your output 'out_mapping'.
The result is 2 when Talend sees 2 times "faible" or 5 when "haute" etc... Else null (look at the console).
And that's all !
Check the attachments, the two files I used are available
EDIT : syntax error
Hi,
Thanks for your answer ! I apologize because our situation is more complex than the example that I created and your answer doesn't work with our case.
In fact we want to do that with many column at the same time.
My CSV file contains many columns with values that I want to translate into an ID:
My real dictionary look like :
I want to translate many column at the same time with only one dictionary.
My tMap look like (State is not use in this example) :
My output has to look like :
So we need 3 columns in output but we don't know how to do this.
Is it possible to do this with only one Dictionary or should we create a Dictionary for every Column that we want to translate and in this is the case, how can we do it properly ?
We found a subject who look like our situation :
Thanks for your help !
Regards,
Hello !
Okay so I checked with more details all the examples you provided. Very helpful for me to understand your problem.
@Antoine01 wrote:So we need 3 columns in output but we don't know how to do this.
Is it possible to do this with only one Dictionary or should we create a Dictionary for every Column that we want to translate and in this is the case, how can we do it properly ?
It is possible with only one dictionary : I don't think so. For me Talend cannot do that natively, maybe with some custom java's class you can achieve this point, but I don't see, or I don't know, how to do that with only one dictionary.
With your examples I made a job which uses 3 lookup, one for each column. Here's the screenshot below.
How does it works :
Reading your input csv, lookup the 3 columns from your dictionary. In the tMap I'm just using the same as you did before, inner join with the value and return the ID.
The only added a few java code to "match" if your Priority or Criticite in the input file, is present in the "Valeur" column of your dictionary.
Here's the second screenshot :
Details of the "Var" column :
varPriority : row2.VALEUR.contains(row3.Priority)?row2.ID:null
varCriticte : row8.VALEUR.contains(row3.Criticite)?row8.ID:null
I didn't use a var for "IssueType" because it matched perfectly between the input and your dictionary.
The result of that is not optimized, and honestly I don't know how to do it better.
I'm using a UniqRow, because in the tMap there is no key on the left, so I have a big cartesian product between the 2 columns and the input file. (lines_nb_dictionary_priority * lines_nb_dictionary_criticite * lines_nb_nput_files)
Then I've added a tFilter, many lines contained "null" values, because of the cartesian product, so I squizzed them.
The final screenshot with the result :
4 - 11 - 2 : High - Improvement - Low (CHAMP -> PRIORITY)
4 - 11 - 7 : High - Improvement - Low (CHAMP -> CRITICITY)
It might be a bug, I don't know if I had to deal with the column "CHAMP".
Find in attachment the job, so you can import it into your Talend, It may help, keep my in touch