Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Clean accented character and white space in column

I have a workflow as follows. In the column 'summary', i want to remove

1. question mark(?)
2. white space from the text
3. replace accented alphabets with the english equivalent. For example é into e.

0683p000009LtrX.jpg

Input

?? at Shenzhen Xingjiexun Electronics Co.Ltd
Designer at FabUnion | ????????
Jinanhaolu Ñ manager

Output

at Shenzhen Xingjiexun Electronics Co.Ltd
Designer at FabUnion |
Jinanhaolu N manager

For the accented alphabet, above is just a sample as it can be anything and i do not have a finite list to produce for an example.

 

Thanks in advance!!



Labels (2)
18 Replies
TRF
Champion II
Champion II

What's the encoding of the tFileInputDelimited?
Anonymous
Not applicable
Author

UTF-8


@TRF wrote:
What's the encoding of the tFileInputDelimited?

 

TRF
Champion II
Champion II

But is your file encoded as utf8?
I just tested on my side and it works fine.
vboppudi
Partner - Creator III
Partner - Creator III

Hi,

The following steps might helps you.

Step1: Change file read encoding 

0683p000009Ltxd.png

 

Step2: Create new routines stripAccents with below script.

package routines;
import java.text.Normalizer;
public class stripAccents {

public static String stripAccents(String s)
{
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}
}

 

0683p000009Lu0M.png

 

create job src--> tMap--> tLogRow

0683p000009Ltse.png

COL as input in Source and row1.COL as in put in tMap. COL as output in tMap.

 

output COL --> stripAccents.stripAccents(row1.COL).replaceAll("[?]", "").replaceAll("^ ", "") 

 

Input Data:

?? at Shenzhen Xingjiexun Electronics Co.Ltd
Designer at FabUnion | ????????
Jinanhaolu Ñ manager
aaaéééàààçççbbbb
Shenzhen WenTong electronic co.Ltd Ñ power adapter

 

Output Data:

0683p000009Ltxx.png

Hope this helps!

Regards,

Anonymous
Not applicable
Author

@TRF can you post screenshot? @vboppudi file is in UTF-8 format and if i change the format in input, file is not read properly, I faced this issue and it took me a week to understand the reason and after i switched to UTF-8, data was read properly.

vboppudi
Partner - Creator III
Partner - Creator III

Hi,
If i change encoding to UTF-8, i am not able to read data properly. Getting like below
|at Shenzhen Xingjiexun Electronics Co.Ltd |
|Designer at FabUnion | |
|Jinanhaolu � manager |
|aaa���������bbbb |
|Shenzhen WenTong electronic co.Ltd � power adapter
Regards,
TRF
Champion II
Champion II

Here is the job with the tFileInputDelimited:

0683p000009LtaG.png

The Advanced settings tab of the tFileInputDelimited:

0683p000009Ltut.png

The input file with the Encoding menu (from Notepad++):

0683p000009LrmP.png

Finally, the result:

0683p000009Lt36.png

@Enthusiast, let us know the encoding system for your file.

 

Regards,

Anonymous
Not applicable
Author

Its appearing as ANSI when i open it in Notepad++

TRF
Champion II
Champion II

So just select ISO-8859-15 as the encoding system in the Advanced settings tab.

It works (I've tried).