Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have 2 files: one xml and one json. Both are UTF-8, and I store them both in a thash. After this step, i compare them. When I open the non-matched file (I have set it to UTF-8 as well) i see that all the special characters are displayed as ?
How can I fix this issue?
Any help is welcome
It was an interesting problem. Your input JSON has got two parts. The first part is address which is an array.
"addresses":[{"country":"LT","streetAndNumber":"My Street N 2-10","postCode":"00028","validFrom":182200000000,"type":"ATYPE","city":"ACITY"}]
The second part is one level above and is at same level as address
"referenceNumber":"AREFNUM","referenceType":"AREFTYPE","fullName":"My.Full.Name","createdByTransactionId":123456,"_class":"my.class","validFrom":182200000000,"validTo":282200000000
So if you need to read address part, you will need to parse the JSON at different level compared to your name part.
Below job is reading the data and parsing them correctly. I have not added all the columns but I have added first and last part of both section to make sure that columns are parsing correctly.
I believe you are happy with output also. Now lets see the details.
The first JSONExtract is parsing the outer layer and push the entire address info as a single string.
In the second part, I am parsing the address column and I am leaving the other data without any processing. So the data will automatically pass for these columns (like fullName, validTo etc.)
Now, we have parsed the data from both structures and you have the output data as shown in previous tLogrow.
Hope I have answered your query 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
Could you please add tLogrow at strategic areas of your program. I would try to see the results after reading from the source files, after reading from Hash, before matching and after matching.
This will give you an idea of the exact area where UTF is not getting converted properly. Could you please do it and share the results? Screenshots of your job flow along with sample data will be appreciated as it will give us more idea about your job.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
I have done several tests. I had a problem with Unix - DOS conversion. Now I have only one record in error, but I cannot understand the reason. I have tried almost everything. If the record is in the initial file, The tMap component finds it but the tExtractJSONfield doesn't. If I copy and paste it in a separate file, it is found by both components. If I copy a segment of the file, containing the record in error, it is found by both components. If you have any clue of what can be, any help is welcome. What else can I check?
Hi,
Could you please share some screen shots so that we will get better understanding of the flow?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
Thanks for your support. I have captured the output that goes into the tExtractJSONField in a tFileOutputDelimited. In the cases the record in error is missing, I can see it in the tFileOutputDelimited_1 component, what means that is passing the tMap and enters the tExtractJSONField component, but there disappears (only if I use it in the original file).
Hi,
Now I got the issue 🙂
Could you please share the sample records from the tFileOutputDelimited_1 component and also share the screenshot of your tExtractJSONField for customers. Please also include the input and output schemas of this component. I suspect that the parsing of JSON is not happening in right manner.
If you could show the expected value of output, that will be really great and we can try to fill the missing gaps.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi @nthampi,
My file contains more than 2000000 records. Only when I run this file, I loose that specific record in the way, and only that record. When I take a subset of records, that record works ok, so sample records won't help in this case and the data is confidential. I have attached an example of record, nevertheless.
Here are the mapping and the schema:
Hi,
Could you please try with JSON path instead of Xpath?
Please also note that you will have to parse through multiple lists. So try to parse the address related info and once you get those details, go for the rest.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
It was an interesting problem. Your input JSON has got two parts. The first part is address which is an array.
"addresses":[{"country":"LT","streetAndNumber":"My Street N 2-10","postCode":"00028","validFrom":182200000000,"type":"ATYPE","city":"ACITY"}]
The second part is one level above and is at same level as address
"referenceNumber":"AREFNUM","referenceType":"AREFTYPE","fullName":"My.Full.Name","createdByTransactionId":123456,"_class":"my.class","validFrom":182200000000,"validTo":282200000000
So if you need to read address part, you will need to parse the JSON at different level compared to your name part.
Below job is reading the data and parsing them correctly. I have not added all the columns but I have added first and last part of both section to make sure that columns are parsing correctly.
I believe you are happy with output also. Now lets see the details.
The first JSONExtract is parsing the outer layer and push the entire address info as a single string.
In the second part, I am parsing the address column and I am leaving the other data without any processing. So the data will automatically pass for these columns (like fullName, validTo etc.)
Now, we have parsed the data from both structures and you have the output data as shown in previous tLogrow.
Hope I have answered your query 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂