Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
tNormalize normalize only one row, I want to normalize 3 columns which are related to each other:
Sample data:
id | address | phone | |
1 | add1~add2~add3 | a@a.com~a@b.com | 12345678~8765432~12345 |
2 | add1~add2 | b@b.com~c@c.com | 12345678~8765432 |
3 | add1~add2~add3 | a@a.com~a@b.com~d@d.com | 12345678~8765432~12345 |
4 | add1~add2 | b@b.com | 123456~8765 |
5 | add1~add2~add3 | ||
6 | add1 | b@b.com | 8765432 |
7 | add1~add2~add3 | b@b.com~c@c.com | 8765432 |
I want to normalize this table in a way which
Desired output for sample data
id | address | phone | |
1 | add1 | a@a.com | 12345678 |
1 | add2 | a@b.com | 8765432 |
1 | add3 | 12345 | |
2 | add1 | b@b.com | 12345678 |
2 | add2 | c@c.com | 8765432 |
3 | add1 | a@a.com | 12345678 |
3 | add2 | a@b.com | 12345678 |
3 | add3 | d@d.com | 12345 |
4 | add1 | b@b.com | 123456 |
4 | add2 | 8765 | |
5 | add1 | ||
5 | add2 | ||
5 | add3 | ||
6 | add1 | b@b.com | 8765432 |
7 | add1 | b@b.com | 8765432 |
7 | add2 | c@c.com | |
7 | add3 |
I there any other component that I can use?
Or any java thought that I can put in a tjava?
Thank you.
Hi,
I see 2 options :
1) use a tJavaFlex. Assuming you are confident that you'll never have more that 3 (for example) address, you can use a tJavaFlex followed by a tFilterRow to produce the desired result:
<data_source> --(main)--> tJavaFlex --(main)--> tFilterRow
The idea is to create 3 output rows for each input rows and in each output row, put the nth item of the input row if it exists. The tFilterRow will simply discard rows with no data at all (or no address or whatever).
The tJavaFlex code will contain something like this :
BEGIN (replace 3 with the maximum number of data you can have) :
for (int i=0; i<3; i++) {
MAIN (row1 is the input row, row2 the output):
String[] add; String[] email; String[] phone; row2.id = row1.id; add=row1.add.split("~", -1); if (add != null && add.length > i) row2.add=add[i]; else row2.add=""; email=row1.email.split("~", -1); if (email != null && email.length > i) row2.email=email[i]; else row2.email=""; phone=row1.phone.split("~", -1); if (phone != null && phone.length > i) row2.phone=phone[i]; else row2.phone="";
Please note that it's certainly possible to optimize this code! Take this as a guideline
The idea is to split each input data and take the nth item if it exists.
2) Another option would be to first split your data into 3 table (with a tMap for example) :
id,address / id,email / id,phone
You then normlize each table separatly and add a sequence for each "row" of the same id:
id,row,address / id,row,email / id,row,phone
with row being something such as Numeric.sequence("address_"+id,1,1)
And at the end, you join all 3 datasets back.
I think it should also work assuming the order is preserved will going through the tNormalize.
In doubt, I would probably prefer the first option: it has more java code but it's probably safer and more efficient.
Regards.
Hi,
I see 2 options :
1) use a tJavaFlex. Assuming you are confident that you'll never have more that 3 (for example) address, you can use a tJavaFlex followed by a tFilterRow to produce the desired result:
<data_source> --(main)--> tJavaFlex --(main)--> tFilterRow
The idea is to create 3 output rows for each input rows and in each output row, put the nth item of the input row if it exists. The tFilterRow will simply discard rows with no data at all (or no address or whatever).
The tJavaFlex code will contain something like this :
BEGIN (replace 3 with the maximum number of data you can have) :
for (int i=0; i<3; i++) {
MAIN (row1 is the input row, row2 the output):
String[] add; String[] email; String[] phone; row2.id = row1.id; add=row1.add.split("~", -1); if (add != null && add.length > i) row2.add=add[i]; else row2.add=""; email=row1.email.split("~", -1); if (email != null && email.length > i) row2.email=email[i]; else row2.email=""; phone=row1.phone.split("~", -1); if (phone != null && phone.length > i) row2.phone=phone[i]; else row2.phone="";
Please note that it's certainly possible to optimize this code! Take this as a guideline
The idea is to split each input data and take the nth item if it exists.
2) Another option would be to first split your data into 3 table (with a tMap for example) :
id,address / id,email / id,phone
You then normlize each table separatly and add a sequence for each "row" of the same id:
id,row,address / id,row,email / id,row,phone
with row being something such as Numeric.sequence("address_"+id,1,1)
And at the end, you join all 3 datasets back.
I think it should also work assuming the order is preserved will going through the tNormalize.
In doubt, I would probably prefer the first option: it has more java code but it's probably safer and more efficient.
Regards.