
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Removing question marks "?" in Talend
I have several rows which are entirely question marks. I am pasting some sample data below
id text 1328qdfjhase This is a text 1038qdfjhase ???? ?? ???? 1114qdfjhase This is also text 1455qdfjhase Another text 1376qdfjhase Extra text
I want to get rid of the second row as it only contains question mark and the data is of no use to me. I tried using tMap function EREPLACE function to replace the question marks to blank as
StringHandling.EREPLACE(out3.text,"?","")
and next i plan to filter the rows which are blank. However i am getting error at tMap component as
Exception in component tMap_1 java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 0 ? ^ at java.util.regex.Pattern.error(Pattern.java:1955) at java.util.regex.Pattern.sequence(Pattern.java:2123) at java.util.regex.Pattern.expr(Pattern.java:1996) at java.util.regex.Pattern.compile(Pattern.java:1696) at java.util.regex.Pattern.<init>(Pattern.java:1351) at java.util.regex.Pattern.compile(Pattern.java:1028) at java.lang.String.replaceAll(String.java:2223) at routines.StringHandling.CHANGE(StringHandling.java:96) at routines.StringHandling.EREPLACE(StringHandling.java:189) at local_project.clean_crmjl2_0_1.Clean_CRMJL2.tFileInputExcel_1Process(Clean_CRMJL2.java:4743) at local_project.clean_crmjl2_0_1.Clean_CRMJL2.runJobInTOS(Clean_CRMJL2.java:7478) at local_project.clean_crmjl2_0_1.Clean_CRMJL2.main(Clean_CRMJL2.java:7335)
Can anyone help?
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, i was on vacation. I don't know why but instead this worked in tMap expression builder. I think issue was something else, not sure what though. I am now taking the input from excel files instead of CSV. could be because of encoding?
StringHandling.EREPLACE(out3.text,"?","")

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As the error message suggests, a question mark is a meta character in pattern strings. You get around this by escaping it. Because your String will be interpreted before being used as a pattern, you have to type "\\?"

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried that and its not removing the question marks row for me.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried something like row5.newColumn.replaceAll("\\?", "") ?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, the string replacement will only make that value blank. It won't remove the entire row from the data flow. For that you'll need to filter using a tFilter component or a tMap. If you trim() the text after replacing all of the question marks, you can set up an output filter like:
!rowX.text.isEmpty()
to only pass through records that aren't empty (assuming you don't have other empty values you want to preserve).

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@douglaszickuhr wrote:
Have you tried something like row5.newColumn.replaceAll("\\?", "") ?
I am getting a new error as follows
Exception in component tMap_1 java.lang.NullPointerException at local_project.clean_crmjl2_0_1.Clean_CRMJL2.tFileInputExcel_1Process(Clean_CRMJL2.java:4743) at local_project.clean_crmjl2_0_1.Clean_CRMJL2.runJobInTOS(Clean_CRMJL2.java:7477) at local_project.clean_crmjl2_0_1.Clean_CRMJL2.main(Clean_CRMJL2.java:7334)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that the value is null. Are you sure that you have value on that?
Are your components connected right?
Paste here a screenshot of your job please.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tMap is all you need:
Here is the expression I used to filter output rows:
!(StringHandling.BTRIM(row109.text.replaceAll("\\?*", ""))).equals("")
StringHandling.BTRIM is here to remove extra blanks which included in the text if any.
And the result (remark the last line which contains "?" but also other characters, so the line is in the result:
Starting job test at 22:00 23/06/2017.
[statistics] connecting to socket on port 3599
[statistics] connected
1328qdfjhase|This is a text
1114qdfjhase|This is also text
1455qdfjhase|Another text
1376qdfjhase|Extra text
999999999999|An extra ??? ?? ???? text to keep
[statistics] disconnected
Job test ended at 22:00 23/06/2017. [exit code=0]
Hope this helps.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please, let us know and mark the case as solved if it is.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, i was on vacation. I don't know why but instead this worked in tMap expression builder. I think issue was something else, not sure what though. I am now taking the input from excel files instead of CSV. could be because of encoding?
StringHandling.EREPLACE(out3.text,"?","")
