Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tExtractRegex usage and escaping for talend or java

I have a column in my data I am trying to break into 4 columns on a | delimter. I ended up using tExtractRegexFields and got a pattern to work in regex testers finally as groups, but the talend regex won't escape the pipe ( | ) and I end up getting odd results after the tExtractRegex and tConvert (split into strings, then try to cast.
The regex tester is here 
here is my pattern: ^([0-9\.]*) \| ([0-9\.]*) \| ([0-9\.]*) \| (.*) \| (.*)$
here is the terrible sample data column: 1 | 6.39 | 9.76 | FL500S | FILTER ASY - OIL


debug console tLog after the regex has | replaced with [] so I can see if the pipes were removed and new columns made.
the tLog row pre regex has :: instead of |

Repair_Order SoLine SoPartLine Qty Cost List Part Part_Description
Repair_Order SoLine SoPartLine Qty_Cost_List_Part_Desc
6262880::3::1::1 | 1736.33 | 2315.11 | 7L3Z7000ABRM | AUTOMATIC T
6262880 [] 3 [] 1 [] 1 []  []  []  [] 
6262880 [] 3 [] 1 []  [] 1736.33 | 2315.11 | 7L3Z7000ABRM | AUTOMATIC []  []  [] 
6262880::3::2::1 | 600.00 | 600.00 | 7L3Z7000ABRM-C | 7L3Z 7000 A
6262880 [] 3 [] 2 [] 1 []  []  []  [] 
6262880 [] 3 [] 2 []  [] 600.00 | 600.00 | 7L3Z7000ABRM-C | 7L3Z 7000 []  []  [] 

Labels (2)
3 Replies
Anonymous
Not applicable
Author

_AnonymousUser
Specialist III
Specialist III

so double backslash to escape the escape in java I guess. I was getting 2 or 3 rows per record because the grouping wasn't working right because of the pipe meaning either or in regex if it was not escaped right. which made debugging harder for me to catch the issue.
"^([0-9]*) \\| (.*) \\| (.*) \\| (.*) \\| (.*)$" used to parse: "6 | 3.75 | 7.86 | XO5W20BFS | MOTORCRAFT SAE 5W-20"
gave me the results I wanted

6298055::3::2::6 | 3.75 | 7.86 | XO5W20BFS | MOTORCRAFT SAE 5W-20

6298055 [] 3 [] 2 [] 6 [] 3.75 [] 7.86 [] XO5W20BFS [] MOTORCRAFT SAE 5W-20
cterenzi
Specialist
Specialist

Glad to hear you fixed your issue.  I think you can also split a field on a delimiter character using the tExtractDelimitedFields component.