Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
spr654
Creator
Creator

[resolved] Encoding Problem with Special Characters

Hi,

I have 2 files: one xml and one json. Both are UTF-8, and I store them both in a thash. After this step, i compare them. When I open the non-matched file (I have set it to UTF-8 as well) i see that all the special characters are displayed as ?

 

How can I fix this issue?

 

Any help is welcome

Labels (4)
1 Solution

Accepted Solutions
Anonymous
Not applicable

@spr654 

 

It was an interesting problem. Your input JSON has got two parts. The first part is address which is an array.

 

"addresses":[{"country":"LT","streetAndNumber":"My Street N 2-10","postCode":"00028","validFrom":182200000000,"type":"ATYPE","city":"ACITY"}]

The second part is one level above and is at same level as address 

"referenceNumber":"AREFNUM","referenceType":"AREFTYPE","fullName":"My.Full.Name","createdByTransactionId":123456,"_class":"my.class","validFrom":182200000000,"validTo":282200000000

 

So if you need to read address part, you will need to parse the JSON at different level compared to your name part.

 

Below job is reading the data and parsing them correctly. I have not added all the columns but I have added first and last part of both section to make sure that columns are parsing correctly.

0683p000009M2gv.png

 

0683p000009M2h0.png

I believe you are happy with output also. Now lets see the details.

 

The first JSONExtract is parsing the outer layer and push the entire address info as a single string.

0683p000009M2fe.png

 

In the second part, I am parsing the address column and I am leaving the other data without any processing. So the data will automatically pass for these columns (like fullName, validTo etc.)

 

0683p000009M2U7.png

 

Now, we have parsed the data from both structures and you have the output data as shown in previous tLogrow.

 

Hope I have answered your query 🙂

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

View solution in original post

10 Replies
Anonymous
Not applicable

Hi,

 

   Could you please add tLogrow at strategic areas of your program. I would try to see the results after reading from the source files, after reading from Hash, before matching and after matching.

 

   This will give you an idea of the exact area where UTF is not getting converted properly. Could you please do it and share the results? Screenshots of your job flow along with sample data will be appreciated as it will give us more idea about your job.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

spr654
Creator
Creator
Author

Hi,

 

I have done several tests. I had a problem with Unix - DOS conversion. Now I have only one record in error, but I cannot understand the reason. I have tried almost everything. If the record is in the initial file, The tMap component finds it but the tExtractJSONfield doesn't. If I copy and paste it in a separate file, it is found by both components. If I copy a segment of the file, containing the record in error, it is found by both components. If you have any clue of what can be, any help is welcome. What else can I check?

Anonymous
Not applicable

Hi,

 

    Could you please share some screen shots so that we will get better understanding of the flow?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

spr654
Creator
Creator
Author

Hi,

 

Thanks for your support. I have captured the output that goes into the tExtractJSONField in a tFileOutputDelimited. In the cases the record in error is missing, I can see it in the tFileOutputDelimited_1 component, what means that is passing the tMap and enters the tExtractJSONField component, but there disappears (only if I use it in the original file).

0683p000009M2f0.png

Anonymous
Not applicable

Hi,

 

     Now I got the issue 🙂

 

     Could you please share the sample records from the tFileOutputDelimited_1 component and also share the screenshot of your tExtractJSONField for customers. Please also include the input and output schemas of this component. I suspect that the parsing of JSON is not happening in right manner.

 

     If you could show the expected value of output, that will be really great and we can try to fill the missing gaps.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

spr654
Creator
Creator
Author

Hi @nthampi,

My file contains more than 2000000 records. Only when I run this file, I loose that specific record in the way, and only that record. When I take a subset of records, that record works ok, so sample records won't help in this case and the data is confidential. I have attached an example of record, nevertheless.

Here are the mapping and the schema:

 

0683p000009M2fJ.png

 

0683p000009M2fO.png


Example.csv
Anonymous
Not applicable

Hi,

 

   Could you please try with JSON path instead of Xpath?

 

   Please also note that you will have to parse through multiple lists. So try to parse the address related info and once you get those details, go for the rest.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

spr654
Creator
Creator
Author

Hi,

I had to change to xpath because I couln’t Do my mapping with Jason path’ can you show how it is done with Jason path?

Thanks in advance
Anonymous
Not applicable

@spr654 

 

It was an interesting problem. Your input JSON has got two parts. The first part is address which is an array.

 

"addresses":[{"country":"LT","streetAndNumber":"My Street N 2-10","postCode":"00028","validFrom":182200000000,"type":"ATYPE","city":"ACITY"}]

The second part is one level above and is at same level as address 

"referenceNumber":"AREFNUM","referenceType":"AREFTYPE","fullName":"My.Full.Name","createdByTransactionId":123456,"_class":"my.class","validFrom":182200000000,"validTo":282200000000

 

So if you need to read address part, you will need to parse the JSON at different level compared to your name part.

 

Below job is reading the data and parsing them correctly. I have not added all the columns but I have added first and last part of both section to make sure that columns are parsing correctly.

0683p000009M2gv.png

 

0683p000009M2h0.png

I believe you are happy with output also. Now lets see the details.

 

The first JSONExtract is parsing the outer layer and push the entire address info as a single string.

0683p000009M2fe.png

 

In the second part, I am parsing the address column and I am leaving the other data without any processing. So the data will automatically pass for these columns (like fullName, validTo etc.)

 

0683p000009M2U7.png

 

Now, we have parsed the data from both structures and you have the output data as shown in previous tLogrow.

 

Hope I have answered your query 🙂

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂