<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Can Data Quality analyse unstructured data, such as data in csv file? in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270989#M2633</link>
    <description>Hi ,
&lt;BR /&gt;I would like to use Data quality (DQ) to analyse/validate data in CSV files,i.e. highlighting invalid data based on user predefined rules/constraints. 
&lt;BR /&gt;I have read Data Quality documentation, Talend Open Studio for DQ provides a powerful data profiling tool for users to analysis database tables, rows and columns with great UX design. However, I could not find any content that describes how to analyse unstructured data, such as content in CSV.
&lt;BR /&gt;If DQ does not provide such functionality to validate data in CSV files, do you have any suggestion to approach my data validation goal? Since it is a open source project, is it possible to extend it to read text files? and then reuse existing data profiling component (defined rules/constraints + validate + highlight invalid data)? 
&lt;BR /&gt;Is this trunk the right place I should look at? 
&lt;A href="http://www.talendforge.org/trac/top/browser/trunk" target="_blank" rel="nofollow noopener noreferrer"&gt;http://www.talendforge.org/trac/top/browser/trunk&lt;/A&gt;. 
&lt;BR /&gt;
&lt;BR /&gt;Thank you in advance.
&lt;BR /&gt;Yukun</description>
    <pubDate>Sat, 16 Nov 2024 11:44:53 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T11:44:53Z</dc:date>
    <item>
      <title>Can Data Quality analyse unstructured data, such as data in csv file?</title>
      <link>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270989#M2633</link>
      <description>Hi ,
&lt;BR /&gt;I would like to use Data quality (DQ) to analyse/validate data in CSV files,i.e. highlighting invalid data based on user predefined rules/constraints. 
&lt;BR /&gt;I have read Data Quality documentation, Talend Open Studio for DQ provides a powerful data profiling tool for users to analysis database tables, rows and columns with great UX design. However, I could not find any content that describes how to analyse unstructured data, such as content in CSV.
&lt;BR /&gt;If DQ does not provide such functionality to validate data in CSV files, do you have any suggestion to approach my data validation goal? Since it is a open source project, is it possible to extend it to read text files? and then reuse existing data profiling component (defined rules/constraints + validate + highlight invalid data)? 
&lt;BR /&gt;Is this trunk the right place I should look at? 
&lt;A href="http://www.talendforge.org/trac/top/browser/trunk" target="_blank" rel="nofollow noopener noreferrer"&gt;http://www.talendforge.org/trac/top/browser/trunk&lt;/A&gt;. 
&lt;BR /&gt;
&lt;BR /&gt;Thank you in advance.
&lt;BR /&gt;Yukun</description>
      <pubDate>Sat, 16 Nov 2024 11:44:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270989#M2633</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T11:44:53Z</dc:date>
    </item>
    <item>
      <title>Re: Can Data Quality analyse unstructured data, such as data in csv file?</title>
      <link>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270990#M2634</link>
      <description>Hello Yukun, 
&lt;BR /&gt;the studio can analyze csv files, but if your csv fields contain unstructured text and you want to dig into that unstructured text, then I would suggest you to have a look how to create your own Java indicator at 
&lt;A href="https://help.talend.com/pages/viewpage.action?pageId=20824858#Raa27234" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/pages/viewpage.action?pageId=20824858#Raa27234&lt;/A&gt; 
&lt;BR /&gt;Then you could share your indicators with the community by uploading them to the Talend Exchange website. 
&lt;BR /&gt;In the enterprise version of the studio, we provide a component that does text parsing and extraction from some parser rules: 
&lt;A href="https://help.talend.com/search/all?query=tStandardizeRow&amp;amp;content-lang=en" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/search/all?query=tStandardizeRow&amp;amp;content-lang=en&lt;/A&gt;</description>
      <pubDate>Mon, 17 Feb 2014 11:42:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270990#M2634</guid>
      <dc:creator>Sebastiao_Qlik</dc:creator>
      <dc:date>2014-02-17T11:42:00Z</dc:date>
    </item>
    <item>
      <title>Re: Can Data Quality analyse unstructured data, such as data in csv file?</title>
      <link>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270991#M2635</link>
      <description>Hi Scorreia, thank you for your reply, now I find the fileDelimited connection option in DQ, so I am able to analysis my csv files. 
&lt;BR /&gt;Cheers 
&lt;BR /&gt;Yukun</description>
      <pubDate>Tue, 18 Feb 2014 22:22:36 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Can-Data-Quality-analyse-unstructured-data-such-as-data-in-csv/m-p/2270991#M2635</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-02-18T22:22:36Z</dc:date>
    </item>
  </channel>
</rss>

