Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Clean accented character and white space in column

I have a workflow as follows. In the column 'summary', i want to remove

1. question mark(?)
2. white space from the text
3. replace accented alphabets with the english equivalent. For example é into e.

0683p000009LtrX.jpg

Input

?? at Shenzhen Xingjiexun Electronics Co.Ltd
Designer at FabUnion | ????????
Jinanhaolu Ñ manager

Output

at Shenzhen Xingjiexun Electronics Co.Ltd
Designer at FabUnion |
Jinanhaolu N manager

For the accented alphabet, above is just a sample as it can be anything and i do not have a finite list to produce for an example.

 

Thanks in advance!!



Labels (2)
1 Solution

Accepted Solutions
vboppudi
Partner - Creator III
Partner - Creator III

Hi,

The following steps might helps you.

Step1: Change file read encoding 

0683p000009Ltxd.png

 

Step2: Create new routines stripAccents with below script.

package routines;
import java.text.Normalizer;
public class stripAccents {

public static String stripAccents(String s)
{
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}
}

 

0683p000009Lu0M.png

 

create job src--> tMap--> tLogRow

0683p000009Ltse.png

COL as input in Source and row1.COL as in put in tMap. COL as output in tMap.

 

output COL --> stripAccents.stripAccents(row1.COL).replaceAll("[?]", "").replaceAll("^ ", "") 

 

Input Data:

?? at Shenzhen Xingjiexun Electronics Co.Ltd
Designer at FabUnion | ????????
Jinanhaolu Ñ manager
aaaéééàààçççbbbb
Shenzhen WenTong electronic co.Ltd Ñ power adapter

 

Output Data:

0683p000009Ltxx.png

Hope this helps!

Regards,

View solution in original post

18 Replies
vboppudi
Partner - Creator III
Partner - Creator III

Hi,

 

Please provide some sample data and expected output.

 

Regards,

TRF
Champion II
Champion II

Hi,

Here is an example of howto:

0683p000009LrmO.png

1st, load the commons-lang3-3.4.jar file and import org.apache.commons.lang3.StringUtils.

For that, in tLibraryLoad Basic settings select "commons-lang3-3.4.jar", then in Advanced setting enter import "org.apache.commons.lang3.StringUtils;" in the import field.

In tJavaRow, enter the following (maybe something similar in tMap depending on your use case):

output_row.line = StringUtils.stripAccents(input_row.line);

tFixedFlowInput is here to generate data for the flow ("aaaéééàààçççbbbb" for my example), and the result is:

aaaeeeaaacccbbbb

Hope this helps,

 

TRF
Champion II
Champion II

Sorry, I forgot "?" and space.

Just replace:

output_row.line = StringUtils.stripAccents(input_row.line);

with:

output_row.line = StringUtils.stripAccents(input_row.line).replaceAll("[? ]", "");

 

That's all.

 

Anonymous
Not applicable
Author

How should i connect tLibraryLoad and tJavaRow in my workflow?

should it be as follows? Please suggest if i should arrange this palettes in different way.

 

tMap -> tLibraryLoad -> tJavaRow -> tFileOutputDelimited

TRF
Champion II
Champion II

well, if you just want to remove starting white spaces (not all) just use:

output_row.line = StringUtils.stripAccents(input_row.line).replaceAll("[?]", "").replaceAll("^ ", "");

maybe exists a shorter form, but it works:

0683p000009Ltzx.png

Regards,

TRF
Champion II
Champion II

Usually, place the tLibraryLoad at the bebenning of the job.
In my example, because there is nothing else inthe job, it's the 1st component and the following tFixedFlowInput is connected with a trigger onSubjobOk (or onComponentOk).

Don't forget to indicate the topic as solved (if it's) - also Kudos are welcome 0683p000009MA9p.png
Anonymous
Not applicable
Author

I downloaded the jar file from http://book2s.com/java/jar/c/commons-lang3/download-commons-lang3-3.4.jar.html and  tried working with the suggested solution and made tLibrary as first component. Below is how tLibraryLoad is configured

0683p000009Lu02.jpgBasic Settings0683p000009LteR.jpgAdvanced settings

And this is how tJavaRow is configured. I added the column name 'summary' after output_row and input_row in the code as follows

 

0683p000009LteS.jpg

However, i am getting error

Execution failed : Job compile errors 
At least job "Test2_Copy" has a compile errors, please fix and export again.
Error Line: 49
Detail Message: Syntax error on token ""org.apache.commons.lang3.StringUtils;"", delete this token
There may be some other errors caused by JVM compatibility. Make sure your JVM setup is similar to the studio.

 

TRF
Champion II
Champion II

you must load the library first: tLibraryLoad - onSubjob OK -> tFileList

also verifiy Advanced setting of tLibraryLoad. It must contain import org.apache.commons.lang3.StringUtils; in the Import field.

 

 

Edit: OK, forget, just remove both " in the Import field (that's Java code, not just a string)

 

 

 

Anonymous
Not applicable
Author

I inserted import org.apache.commons.lang3.StringUtils; in the advanced settings field and it ran without any error, however the output is not what i need. It simply replace accented Ñ with a question mark ?

 

Shenzhen WenTong electronic co.Ltd Ñ power adapter

 

is converted into 

 

Shenzhen WenTong electronic co.Ltd ? power adapter