Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
DM_J
Contributor II
Contributor II

Normalizing more than one column without using a custom routine

Hi,

 

tNormalize normalize only one row, I want to normalize 3 columns which are related to each other:

 

Sample data:

 

 

id address email phone
1 add1~add2~add3 a@a.com~a@b.com 12345678~8765432~12345
2 add1~add2 b@b.com~c@c.com 12345678~8765432
3 add1~add2~add3 a@a.com~a@b.com~d@d.com 12345678~8765432~12345
4 add1~add2 b@b.com 123456~8765
5 add1~add2~add3    
6 add1 b@b.com 8765432
7 add1~add2~add3 b@b.com~c@c.com 8765432

 

I want to normalize this table in a way which

  • first phone and first email go to the first address
  • the second phone and the second email go to the second address if they exist.
  • and so on...

 

Desired output for sample data

 

 

id address email phone
1 add1 a@a.com 12345678
1 add2 a@b.com 8765432
1 add3   12345
2 add1 b@b.com 12345678
2 add2 c@c.com 8765432
3 add1 a@a.com 12345678
3 add2 a@b.com 12345678
3 add3 d@d.com 12345
4 add1 b@b.com 123456
4 add2   8765
5 add1    
5 add2    
5 add3    
6 add1 b@b.com 8765432
7 add1 b@b.com 8765432
7 add2 c@c.com  
7 add3    



I there any other component that I can use? 

Or any java thought that I can put in a tjava?

 

Thank you.

 

 

Labels (3)
1 Solution

Accepted Solutions
lennelei
Creator III
Creator III

Hi,

 

I see 2 options :

 

1) use a tJavaFlex. Assuming you are confident that you'll never have more that 3 (for example) address, you can use a tJavaFlex followed by a tFilterRow to produce the desired result:

<data_source> --(main)--> tJavaFlex --(main)--> tFilterRow

The idea is to create 3 output rows for each input rows and in each output row, put the nth item of the input row if it exists. The tFilterRow will simply discard rows with no data at all (or no address or whatever).

 

The tJavaFlex code will contain something like this :
BEGIN (replace 3 with the maximum number of data you can have) :

for (int i=0; i<3; i++) {

MAIN (row1 is the input row, row2 the output):

	String[] add;
	String[] email;
	String[] phone;
	
	row2.id = row1.id;
	
	add=row1.add.split("~", -1);
	if (add != null && add.length > i)
		row2.add=add[i];
	else
		row2.add="";
		
	email=row1.email.split("~", -1);
	if (email != null && email.length > i)
		row2.email=email[i];
	else
		row2.email="";		
		
	phone=row1.phone.split("~", -1);
	if (phone != null && phone.length > i)
		row2.phone=phone[i];
	else
		row2.phone="";

Please note that it's certainly possible to optimize this code! Take this as a guideline 0683p000009MACn.png

The idea is to split each input data and take the nth item if it exists.

 

2) Another option would be to first split your data into 3 table (with a tMap for example) :
id,address / id,email / id,phone
You then normlize each table separatly and add a sequence for each "row" of the same id:
id,row,address / id,row,email / id,row,phone
with row being something such as Numeric.sequence("address_"+id,1,1)
And at the end, you join all 3 datasets back.

I think it should also work assuming the order is preserved will going through the tNormalize.

 

In doubt, I would probably prefer the first option: it has more java code but it's probably safer and more efficient.

 

Regards.

View solution in original post

1 Reply
lennelei
Creator III
Creator III

Hi,

 

I see 2 options :

 

1) use a tJavaFlex. Assuming you are confident that you'll never have more that 3 (for example) address, you can use a tJavaFlex followed by a tFilterRow to produce the desired result:

<data_source> --(main)--> tJavaFlex --(main)--> tFilterRow

The idea is to create 3 output rows for each input rows and in each output row, put the nth item of the input row if it exists. The tFilterRow will simply discard rows with no data at all (or no address or whatever).

 

The tJavaFlex code will contain something like this :
BEGIN (replace 3 with the maximum number of data you can have) :

for (int i=0; i<3; i++) {

MAIN (row1 is the input row, row2 the output):

	String[] add;
	String[] email;
	String[] phone;
	
	row2.id = row1.id;
	
	add=row1.add.split("~", -1);
	if (add != null && add.length > i)
		row2.add=add[i];
	else
		row2.add="";
		
	email=row1.email.split("~", -1);
	if (email != null && email.length > i)
		row2.email=email[i];
	else
		row2.email="";		
		
	phone=row1.phone.split("~", -1);
	if (phone != null && phone.length > i)
		row2.phone=phone[i];
	else
		row2.phone="";

Please note that it's certainly possible to optimize this code! Take this as a guideline 0683p000009MACn.png

The idea is to split each input data and take the nth item if it exists.

 

2) Another option would be to first split your data into 3 table (with a tMap for example) :
id,address / id,email / id,phone
You then normlize each table separatly and add a sequence for each "row" of the same id:
id,row,address / id,row,email / id,row,phone
with row being something such as Numeric.sequence("address_"+id,1,1)
And at the end, you join all 3 datasets back.

I think it should also work assuming the order is preserved will going through the tNormalize.

 

In doubt, I would probably prefer the first option: it has more java code but it's probably safer and more efficient.

 

Regards.