Solved: tExtractRegexFields not working - Qlik Community

Anonymous · ‎2019-05-27

Hello!

I want to split one column. For example, 21.A-01-BTA or 21.A03-01-BTA. The split I want to do is like

BDel: 21

Group: A

UnderGroup: (if there is any number after Group then it should go to UnderGroup). So for 1st example it will be Null and for 2nd it will be 03

Remaining string1: 01

Remaining string2: BTA

I tried to use tExtractRegexFields with the following expression but i get no values

"([0-9][0-9]).([A-Z])([0-9][0-9])?-([0-9])-([A-Z])"

-- Used '?' since undergroup might or might not be present for the group.

What is the correct syntax for this?

Regards

Priyadarshini

Anonymous · ‎2019-05-28

Another alternative is to use this regex :

"^"+
"([0-9]{2}).([A-Z])([0-9][0-9])?-([0-9][0-9])-([A-Z]{3})" +
".*"

make sure you create at least 5 columns in your shema

View solution in original post

Anonymous · ‎2019-05-27

@priyadarshiniv

Please refer the below details for parsing of data. Please note that I have considered only happy path of data. So you will have to do testing with various conditions and make necessary amendments for null checking and string length. The solutions for these two are already available in stackoverflow. So I am not touching on that aspect and give it as a hands-on exercise to you.

Coming to the java functions, please refer below.

var1 ->             row1.input.substring(row1.input.indexOf(".")+1,row1.input.indexOf("-")).replaceAll("\\D+","") 

BDel ->             row1.input.substring(0 ,row1.input.indexOf(".")) 
Group ->            row1.input.substring(row1.input.indexOf(".")+1,row1.input.indexOf("-")).replaceAll("[^A-Za-z]+", "") 
UnderGroup ->      Var.var1.equals("")?null:Var.var1 
R_string1->        row1.input.substring(row1.input.indexOf("-") +1,row1.input.indexOf("-", row1.input.indexOf("-") +1)) 
R_string2->        row1.input.substring(row1.input.indexOf("-", row1.input.indexOf("-") +1)+1)

Hope you are happy with the resolution. Please spare a minute to give kudos and mark the topic as resolved 🙂

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous · ‎2019-05-28

Thank you so much @nthampi !!! I will check it today!

Regards

Priya

Anonymous · ‎2019-05-28

Another alternative is to use this regex :

"^"+
"([0-9]{2}).([A-Z])([0-9][0-9])?-([0-9][0-9])-([A-Z]{3})" +
".*"

make sure you create at least 5 columns in your shema

Anonymous · ‎2019-05-28

Hello @nthampi

When I use this I get this error!

String index out of range: -4

Anonymous · ‎2019-05-28

Thank you @dgm01 for your reply! I need one more help! Instead of having Remaining_1 and Remaining_2 I want to have everything after UnderGroup to go in as one part. Tried this but gives wrong result:

"^"+

"([0-9]{2})?(\\.[A-Z])?([0-9][0-9])?(-[0-9][0-9])?(-[.]*)?" +

".*"

for 21.A03-01-BTA it gives me

BDel: 21

Group: A

UnderGroup:03

Rest1: 01

It doesnt take the last part -BTA. Might be "-" is not trated as a character. How can then the expression be?

Regards

Priya

Anonymous · ‎2019-05-28

Hello @priyadarshiniv

Please, try this :

"^"+
"([0-9]{2})?(\\.[A-Z])?([0-9][0-9])?(-[0-9][0-9])?" +
"(.*)"

Don't forget to create at least 5 columns in the schema

inputString:
expected Result:

Then I will help you write the regex

tExtractRegexFields not working

Talend Data Integration

Related Topics