Filter output rows

stvn · ‎2019-02-13

Hello Talend Community,

I'd like to have your help on an issue that I face. the following job have a path with an input (/Default/Tools/Articles/Documents/Documents_Extract/) and as an ouput these result:
/Default/
/Default/Tools/
/Default/Tools/Articles/
/Default/Tools/Articles/Documents/
/Default/Tools/Articles/Documents/Documents_Extract/

Now I'd like to filter those output rows in order to have unified values such as:

/Default/

/Tools/
/Articles/
/Documents/
/Documents_Extract/

Could you help me whit the appropriate components to use? I'don't know which component should I use.

Thank you in advance

Steve

fdenis · ‎2019-02-13

you can try to use this on a tMap
row1.filepath --> row1.filepath.replaceAll("(/[^/]+/)$", "$1")

stvn · ‎2019-02-14

Hi François,

thank you for your reply.

My first question is: What does "$1" represent?

if there is a second row start by : /Default/Tools/Articles/Files/FilesAfterExtract/ Is the tMap is goind to give me as an output these result?:

/Default/Tools/Articles/Documents/Documents_Extract/

/Default/Tools/Articles/Files/FilesAfterExtract/

/Default/
/Tools/
Articles/
/Documents/
/Documents_Extract/

/Files/

/FileAfterExtract/

I want to keep variable that has not been created yet.

thank you for your time and your help

fdenis · ‎2019-02-15

you can get help on regula expression with java on multiple site.
but on our case $1 represent the value inside parenteses(/[^/]+/)
/ is /
[^\]+ is something without \ one or more time.
$ is the end of the line

row1.filepath.replaceAll("(/[^/]+/)$", "$1")
just give you the last element between / and /

Regards,

stvn · ‎2019-02-18

Hi François,

Thank for your help, I tried your recommended regex and unfortunately it did not removes any duplicate Values.

You will find below the result:

My idea was to have at the end

/Default/
/Tools/
/Articles/
/Documents/
/DocFirst/

If there is a second line started by /Default/Tools/Articles/DocumentsFirst/DocumentsSecond then the output would be:

/DocumentsFirst/

/DocumentsSecond/

could you advise me any other suggestion? Thank for your help.

Warm Regards,

Steeve

manodwhb · ‎2019-02-19

@stvn, Please check the below solution for your case.

stvn · ‎2019-02-19

@manodwhb

Hi I'd like to thank you for the provided solution. It works pretty well but the think is, I did not express very well my needs. I apologize for that. I have a csv file as input:

/Default/Tools/Documents/Articles/Factures/
/Default/Tools/Documents/Articles/Factures/Factures_Comptable/
/Default/Tools/Documents/Articles/Factures/Factures_Comptable/Année

And I'd like as an ouput:

/Default/
/Default/Tools/
/Default/Tools/Documents/
/Default/Tools/Documents/Articles/Factures/

/Factures_Comptable/

/Année

After the first line, I don't have any other duplicates variable. I implement the job below in order to filter my rows and keep only the data as shown in the example above. Could you recommend me any other solutions?

Thank you for your time and consideration.

Warm regards,

Stevn

Talend Data Integration

v7.x