Normalize multiple columns

Anonymous · ‎2018-08-20

Hi,

Need to normalize multiple columns.

Below is my data

ID| Col1| Col2

1 |a,b,c |d,e,f

2 |g,h,i |j,k,l

Expected result

ID| Col1| Col2

1 |a | d

1 |b | e

1 |c | f

2 |g | j

2 |h | k

2 | i | l

Below is my try,

String[] value1 = input_row.newColumn.split(",");
String[] value2 = input_row.newColumn1.split(",");
int index = input_row.dummy-1;
for(int i= 0; i<value1.length;i++){
for(int j= i; j<=i;j++){
	System.out.println(" Id="+input_row.dummy+"  Col1="+value1[i]+"  Col2="+value2[j]);	

output_row.dummy = input_row.dummy;
output_row.newColumn = value1[i];
output_row.newColumn1 = value2[j];		
	}
}

While printing the statement,

System.out.println(" Id="+input_row.dummy+"  Col1="+value1[i]+"  Col2="+value2[j]);

array is providing four results as expected.

But while passing to the output_row, only the last array is passed. Only one entry is avaliable in the result.

Could any one help out?

Below is the flow,

Inpit - > tjavarow -> output

fdenis · ‎2018-08-20

you have to use tjavaFlex.

fdenis · ‎2018-08-20

tJava run for each row you need 1row --> 3 row !

fdenis · ‎2018-08-20

did you want 9 row?
1 |a | d
1 |b | d
1 |c | d
1 |a | e
1 |b | e
1 |c | e
1 |a | f
1 |b | f
1 |c | f
in this case you can use tNormalize.
Regards,

Ganshyam · ‎2018-10-06

Hello,

Your try is upto mark, Make use of tmap component to generate the sequence and use as index and then assign to col2.

Tjavarow logic:

String[] values2 = input_row.COL2.split(",");
int index = input_row.seq - 1;

output_row.ID = input_row.ID;
output_row.COL1 = input_row.COL1;
output_row.COL2 = values2[index];

Hope this helps your purpose.

Regards

Ganshyam

Anonymous · ‎2019-03-07

It works... Great

Anonymous · ‎2019-07-19

Hi Ganshyam,

I tried the java code you had provided in tjavarow using the following steps but i am getting

java.lang.ArrayIndexOutOfBoundsException: -1

i am also attaching the screenshots for the tmap and tjavarow components which i have used.

Thanks & Regards

normolize multi columns.docx

GRomain · ‎2021-06-15

Hi, you can use this way :

my data is :

lang = "EN,FR"

label = "sweatshirt,pull"

in the tJavaRow:

String [] listing_lang = {""};

String [] listing_label = {""};

listing_lang = input_row.lang.split(",");

listing_label = input_row.label.split(",");;

nb = StringHandling.COUNT(input_row.lang, ",");

in the tJavaFlex :

row9 being the out of my tJavaFlex

David_Underdown · ‎2025-04-04

Though this query is a few years old, it helped point me in the right direction.

You can actually achieve everything after the initial normalization within a tMap, you don't need a tJavaRow or anything.

My tMap looks like

and my overall job flow is

The initial tJavaRow uses String split using the same separator used for normalization to check that there are the same number of items within the two fields to be normalized, since otherwise it would not be clear how to align the fields. It will throw an IllegalArgumentException if the lengths of the two string arrays created by splitting the fields are not equal. If they are equal it just passes through the data on the row

String [] firstColumnArray = input_row.firstColumnToNormalize.split(context.parameter_separator_regex);
String [] secondColumnArray = input_row.secondColumnToNormalize.split(context.parameter_separator_regex);

if (firstColumnArray.length == secondColumnArray.length) {
	output_row.rowNumber = input_row.rowNumber;
	output_row.firstColumnToNormalize = input_row.firstColumnToNormalize;
	output_row.secondColumnToNormalize = input_row.secondColumnToNormalize;
} else {
	throw new IllegalArgumentException("firstColumnToNormalize and secondColumnToNormalise must have the same number of items in them");
}

the key in the tMap is that a new sequence is created each time the value in the first column changes, this is done by setting the sequence name to be the value of the first column. making the sequence start at zero means we can immediately use is as the row index after splitting the second column of interest.

Java

Talend Data Integration

v7.x