Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Can Data Quality analyse unstructured data, such as data in csv file?

Hi ,
I would like to use Data quality (DQ) to analyse/validate data in CSV files,i.e. highlighting invalid data based on user predefined rules/constraints.
I have read Data Quality documentation, Talend Open Studio for DQ provides a powerful data profiling tool for users to analysis database tables, rows and columns with great UX design. However, I could not find any content that describes how to analyse unstructured data, such as content in CSV.
If DQ does not provide such functionality to validate data in CSV files, do you have any suggestion to approach my data validation goal? Since it is a open source project, is it possible to extend it to read text files? and then reuse existing data profiling component (defined rules/constraints + validate + highlight invalid data)?
Is this trunk the right place I should look at? http://www.talendforge.org/trac/top/browser/trunk.

Thank you in advance.
Yukun
Labels (3)
2 Replies
Sebastiao_Qlik
Employee
Employee

Hello Yukun,
the studio can analyze csv files, but if your csv fields contain unstructured text and you want to dig into that unstructured text, then I would suggest you to have a look how to create your own Java indicator at https://help.talend.com/pages/viewpage.action?pageId=20824858#Raa27234
Then you could share your indicators with the community by uploading them to the Talend Exchange website.
In the enterprise version of the studio, we provide a component that does text parsing and extraction from some parser rules: https://help.talend.com/search/all?query=tStandardizeRow&content-lang=en
Anonymous
Not applicable
Author

Hi Scorreia, thank you for your reply, now I find the fileDelimited connection option in DQ, so I am able to analysis my csv files.
Cheers
Yukun