Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
SanyaBLR
Contributor
Contributor

how to reduce tFileDelimited by actual number of columns

hi guys,

I'm a newbie in Talend and need some help.

I have a job which converts xlsx file to csv file with 100 columns like (col1;col2;col3;...;col100;). There is in the red section (1st) this file. 

In the blue section (2nd) I want to reduce this file to by actual number of columns (header)…

0695b00000N3BfmAAF.png

I use tFileInputFullRow component ("read raw file") to convert all rows to one string

then use tMap-expression to trim all the last “;” characters and add one “;” like below:

0695b00000N3BfJAAV.png

TalendString.talendTrim(row10.newColumn,';',-1)+";"

so, for majority of my files this flow works correct, but some of them is not correct.. cause I have missed some values in some rows.. and its reduce not correct.. for example, I have actually header with 18 columns, and some of rows have cells with missed values.. and as a result it only contains 17 “;” .. and finally I have file with incorrect structure…

1st row is header

2nd and 3rd are correct

4-6th are incorrect and have only 17 instead of 18 separators.

0695b00000N3BkIAAV.png

can anybody help me how to do it?

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

I don't understand why you are removing the last ";" and re-adding it. Is this because you end up with some rows like this....

 

col1;col2;col3;col4;col5;col6;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

 

...and you want to remove the excess ";" characters? If so (and I presume that is the reason), I have knocked up a quick example. You will need to extrapolate from this to fit it to what you are doing, but it works for data like above.

 

Here is a screenshot of my job....

 

0695b00000N3CrtAAF.pngI have simply add a dataset like yours. It looks like this. It is 3 rows and a header. The header is the first row.

 

The tMap makes use of a routine I have written to do this with a bit of Java. Java is a MASSIVE advantage when trying to accomplish some tricky stuff like this. I'll show the Java next, but first I'll show the tMap......0695b00000N3CsrAAF.pngWhat is happening is 3 things. First, I am identifying the first row. The header row with a bit of code using the Numeric.sequence method. This counts the records that pass through the tMap. This sets the "RowNum" variable. The next thing I do is calculate the number of valid columns from the first row. I use the "RowNum" variable to decide whether to run this code or not. If it is 1, then it runs. Otherwise it uses the same value for the rest of the rows. The code I use for this I have written and will share below. The final thing I do is to use some more code to restrict all rows (even the header) to a strict number of columns (using the "ColumnCount" variable).

 

The code for each of the variables looks like this.....

 

RowNum

routines.Numeric.sequence("filename", 1, 1) 

 

"filename" is just a String, but if you are using this for several files, you can use the filename to reset the counter for each file.

 

ColumnCount

Var.RowNum==1 ? routines.ValidColumnHeader.countValidColumns(row14.newColumn, ";") : Var.ColumnCount 

 

This is an in-line IF condition. It says "IF the RowNum is equal to 1, use the "countValidColumns" method. Otherwise, use the last value of this variable.

 

FixedRow

routines.ValidColumnHeader.restrictValidColumns(row14.newColumn,";", Var.ColumnCount) 

 

This processes each row (even the header) and limits it to the number of columns counted in the previous variable.

 

The routine that I have quickly knocked up can be seen below. It basically has two methods. One to count the number of valid columns in the header and the other to restrict all rows to that number of columns.

 

*******************************************************************************************

package routines;

 

public class ValidColumnHeader {

 

  public static int countValidColumns(String header, String separator) {

    String[] data = header.split(separator);

     

    int counter = 0;

     

    for(int i=0; i<data.length; i++) {

     if(data[i].trim().compareTo("")!=0) {

     counter++;

     }

    }

     

    return counter;

  }

   

   

  public static String restrictValidColumns(String row, String separator, int columns) {

    String[] data = row.split(separator,-1);

     

    String returnString = "";

     

    for(int i=0; i<columns; i++) {

     returnString = returnString + data[i]+";";

     

    }

     

    return returnString;

  }

}

*******************************************************************************************

 

You can create a Routine, call it ValidColumnHeader and paste this code in there to use it.

 

This should solve your issue, but may need a bit of tweaking to exactly suit your requirement. As an idea of what it does, this is the output of my job running, with the input data just before.....

 

Input Data

col1;col2;col3;col4;col5;col6;;;;;;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;;

aa;;cc;;ee;;;;;;;;;;;;;;

 

Output Data

col1;col2;col3;col4;col5;col6;

aa;bb;cc;dd;ee;ff;

aa;bb;cc;dd;ee;ff;

aa;;cc;;ee;;

 

 

View solution in original post

7 Replies
Anonymous
Not applicable

I don't understand why you are removing the last ";" and re-adding it. Is this because you end up with some rows like this....

 

col1;col2;col3;col4;col5;col6;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;

 

...and you want to remove the excess ";" characters? If so (and I presume that is the reason), I have knocked up a quick example. You will need to extrapolate from this to fit it to what you are doing, but it works for data like above.

 

Here is a screenshot of my job....

 

0695b00000N3CrtAAF.pngI have simply add a dataset like yours. It looks like this. It is 3 rows and a header. The header is the first row.

 

The tMap makes use of a routine I have written to do this with a bit of Java. Java is a MASSIVE advantage when trying to accomplish some tricky stuff like this. I'll show the Java next, but first I'll show the tMap......0695b00000N3CsrAAF.pngWhat is happening is 3 things. First, I am identifying the first row. The header row with a bit of code using the Numeric.sequence method. This counts the records that pass through the tMap. This sets the "RowNum" variable. The next thing I do is calculate the number of valid columns from the first row. I use the "RowNum" variable to decide whether to run this code or not. If it is 1, then it runs. Otherwise it uses the same value for the rest of the rows. The code I use for this I have written and will share below. The final thing I do is to use some more code to restrict all rows (even the header) to a strict number of columns (using the "ColumnCount" variable).

 

The code for each of the variables looks like this.....

 

RowNum

routines.Numeric.sequence("filename", 1, 1) 

 

"filename" is just a String, but if you are using this for several files, you can use the filename to reset the counter for each file.

 

ColumnCount

Var.RowNum==1 ? routines.ValidColumnHeader.countValidColumns(row14.newColumn, ";") : Var.ColumnCount 

 

This is an in-line IF condition. It says "IF the RowNum is equal to 1, use the "countValidColumns" method. Otherwise, use the last value of this variable.

 

FixedRow

routines.ValidColumnHeader.restrictValidColumns(row14.newColumn,";", Var.ColumnCount) 

 

This processes each row (even the header) and limits it to the number of columns counted in the previous variable.

 

The routine that I have quickly knocked up can be seen below. It basically has two methods. One to count the number of valid columns in the header and the other to restrict all rows to that number of columns.

 

*******************************************************************************************

package routines;

 

public class ValidColumnHeader {

 

  public static int countValidColumns(String header, String separator) {

    String[] data = header.split(separator);

     

    int counter = 0;

     

    for(int i=0; i<data.length; i++) {

     if(data[i].trim().compareTo("")!=0) {

     counter++;

     }

    }

     

    return counter;

  }

   

   

  public static String restrictValidColumns(String row, String separator, int columns) {

    String[] data = row.split(separator,-1);

     

    String returnString = "";

     

    for(int i=0; i<columns; i++) {

     returnString = returnString + data[i]+";";

     

    }

     

    return returnString;

  }

}

*******************************************************************************************

 

You can create a Routine, call it ValidColumnHeader and paste this code in there to use it.

 

This should solve your issue, but may need a bit of tweaking to exactly suit your requirement. As an idea of what it does, this is the output of my job running, with the input data just before.....

 

Input Data

col1;col2;col3;col4;col5;col6;;;;;;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;;

aa;bb;cc;dd;ee;ff;;;;;;;;;;;

aa;;cc;;ee;;;;;;;;;;;;;;

 

Output Data

col1;col2;col3;col4;col5;col6;

aa;bb;cc;dd;ee;ff;

aa;bb;cc;dd;ee;ff;

aa;;cc;;ee;;

 

 

SanyaBLR
Contributor
Contributor
Author

@rhall, thank you very much for such a detailed explanation! Very understandable! I've never used a routine before. Your solution is pretty good!

Anonymous
Not applicable

No problem, I hope it helps!

ddduser1643959971
Contributor
Contributor

Helleo I m new with Talend and i need to get the number of Clumns in csv file Input to make condition

 

if NumberColumns==y strat do..

if NumberColumns==x strat do..

else die the job

 

 

please any person can help

thank you in advance

Anonymous
Not applicable

The routine I posted above will help with that. You would need to use the "countValidColumns" method to get the number.

ddduser1643959971
Contributor
Contributor

Thank you rhall for your response , yes i tried with your solution but i don't know how to configure my file csv with the componente tFixedFlowIput

0695b00000N4fzyAAB.png 

 

0695b00000N4g0NAAR.png 

 

 

0695b00000N4g0hAAB.png 

 

 

 

 

Anonymous
Not applicable

You will need to use a tFileInputDelimited component for the CSV. You can see an example here....

https://help.talend.com/r/en-US/8.0/delimited/tfileinputdelimited-tlogrow-reading-data-from-delimited-file-and-display-output-standard-component-the