<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: UTF-8 BOM Encoded File Processing in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278531#M3358</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp; We are not able to process UTF-8 BOM file.When we run a job of 10 file,every time it skips the first row of every file.We are waiting for talend team to respond to our issue.&lt;/P&gt;</description>
    <pubDate>Wed, 09 Aug 2017 06:39:43 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2017-08-09T06:39:43Z</dc:date>
    <item>
      <title>UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278528#M3355</link>
      <description>&lt;P&gt;We are getting a daily file in UTF-8 BOM encoding because of which our Talend ETL Job always misses the first row of the file&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sample Data in File:&lt;/P&gt;
&lt;P&gt;P, 1234, $10&lt;/P&gt;
&lt;P&gt;Q,1235,$20&lt;/P&gt;
&lt;P&gt;R, 1236, $15&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Our actual flow is like&lt;/P&gt;
&lt;P&gt;tFileList ==&amp;gt;&amp;gt; tFileInputDelimited ==&amp;gt;&amp;gt; fReplicate ==&amp;gt; tFilterRow ==&amp;gt; &amp;nbsp;tMSSqlSCD&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Actually tFileInputDilimited is able to process all rows but when we use tFilterRow, but it always misses first row of every particular file&lt;/P&gt;
&lt;P&gt;The condition for tFilterRow is column0 &amp;nbsp;Equals &amp;nbsp;"P"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When we configured tLogRow we found few special characters prefixed with the first rows of all files. Example ???P&lt;/P&gt;
&lt;P&gt;Also when we opened our CSV files in Notepad++ we discovered that File is encoded in UTF-8-BOM&lt;/P&gt;
&lt;P&gt;We have option only for UTF-8 in Advanced settings of tfiledilimited&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let us know how can we process UTF-8-BOM file using Talend job&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 09:29:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278528#M3355</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T09:29:53Z</dc:date>
    </item>
    <item>
      <title>Re: UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278529#M3356</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt; 
&lt;P&gt;So far, talend tfileinputdelimited component uses "UTF-8" without BOM. There is an option "Custom" in Encoding part.&lt;/P&gt; 
&lt;P&gt;Could you please try it to see if it works?&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="utf-bom.png" style="width: 668px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009Lw7g.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/151949i07A9F2C0BE43F25B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009Lw7g.png" alt="0683p000009Lw7g.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;Best regards&lt;/P&gt; 
&lt;P&gt;Sabrina&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2017 10:19:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278529#M3356</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-07-28T10:19:47Z</dc:date>
    </item>
    <item>
      <title>Re: UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278530#M3357</link>
      <description>Hi Sabrina,&lt;BR /&gt;I have tried encoding type - Custom - "UTF-BOM" but it didnt work.&lt;BR /&gt;I have even tried "UTF-8-BOM" even that didnt work.&lt;BR /&gt;Please provide a valuable solution.&lt;BR /&gt;Awaiting for your kind response</description>
      <pubDate>Mon, 31 Jul 2017 08:34:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278530#M3357</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-07-31T08:34:00Z</dc:date>
    </item>
    <item>
      <title>Re: UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278531#M3358</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp; We are not able to process UTF-8 BOM file.When we run a job of 10 file,every time it skips the first row of every file.We are waiting for talend team to respond to our issue.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 06:39:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278531#M3358</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-09T06:39:43Z</dc:date>
    </item>
    <item>
      <title>Re: UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278532#M3359</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Talend uses "UTF-8" without BOM. A UTF-8 BOM encoded file contains a three-byte pattern (0xEF 0xBB 0xBF) in the prolog, that is probably not parsed successfully by the tFileInputDelimited component.&lt;/P&gt;
&lt;P&gt;Have you already checked tChangFileEncoding component to see if it works?&lt;/P&gt;
&lt;P&gt;Best regards&lt;/P&gt;
&lt;P&gt;Sabrina&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 09:26:24 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278532#M3359</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-09T09:26:24Z</dc:date>
    </item>
    <item>
      <title>Re: UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278533#M3360</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;tChangeFileEncoding changes "&amp;lt;U+FEFF&amp;gt;" in UTF-8-BOM into "?" in the first header of the file, which doesn't help. I need to remove first 4 characters .&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;I need to use dynamic schema to load CSV file into DB, DB load component reads the header line to get the column name.&amp;nbsp; Extra&amp;nbsp;"&amp;lt;U+FEFF&amp;gt;" makes DB load component to fail. Any way to deal with this?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Bin&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Mar 2019 00:04:03 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278533#M3360</guid>
      <dc:creator>wangbinlxx</dc:creator>
      <dc:date>2019-03-23T00:04:03Z</dc:date>
    </item>
    <item>
      <title>Re: UTF-8 BOM Encoded File Processing</title>
      <link>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278534#M3361</link>
      <description>&lt;P&gt;Same problem here, nothing from Talend ? We need to deal with UTF8 XML with BOM.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jul 2020 12:54:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/UTF-8-BOM-Encoded-File-Processing/m-p/2278534#M3361</guid>
      <dc:creator>SncJt</dc:creator>
      <dc:date>2020-07-13T12:54:51Z</dc:date>
    </item>
  </channel>
</rss>

