Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello all,
I want to perform some quality control on my CSV files. One of the controls is to check the number of fields for each row.
Here is one example of a CSV file :
Adresse_1,Adresse_2,CP,Ville 23 rue de la pierre,,75001,Paris 25 rue de la pierre,,75001,Paris 27, rue de la pierre,,75001,Paris 29 rue de la pierre,,75001,Paris 31 rue de la pierre,,75001,Paris
Field delimiter is the Comma. And there is supposed to be 4 fields.
The line with 27 has an extra comma which creates a mess.
In the TOS, I defined that the Headers must be used as column names.
But the result I have is (for the resulting columns) :
Adresse_1 Adresse_2 CP Ville Column 4
This is not what I want.
What I want to have is a process that would remove the lines the the incorrect number of separators from the input file, and create 2 output files :
- One with all the correct lines
- One with the wrong lines (to correct the errors)
Is it possible ?
Thanks for your help.
Hi.
What do you need to filter the lines that have more than 4 separators? that is, in this case, do you want to have two files with the lines.
23 rue de la pierre,,75001,Paris 25 rue de la pierre,,75001,Paris
29 rue de la pierre,,75001,Paris 31 rue de la pierre,,75001,Paris
On the other file:
27, rue de la pierre,,75001,Paris
Because you have the control of separator count's and you can filter by count.
This is what you need?
Regards!
Hello..
Apply filter on tmap
corrected lines : StringHandling.COUNT(row1.ADRESSE_1,",") !=1
Wrong lines : StringHandling.COUNT(row1.ADRESSE_1,",")==1
Hope this solves your purpose...
Regards
When a file has an extra comma, Talend will create as many new column as extra commas are found in runtime. Having said this, you could create an extra column on the metadata. Then using a tmap to check if that extra column contains any value. If it does, send that record out to a different output. If not, it means the record is fine and you can continue the processing.