Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
JuttuSenthil
Contributor
Contributor

Talend Open Studio for Data Quality

Dear Community,

 

Request your kind help asap with the below query:

 

We are exploring TOS for Data Quality tool for our DQ profiling. We have our source data in Azure Data Lake Storage (ADLS) for whom profiling need to be done.

Is there a way to connect to ADLS directly/indirectly from TOS for Data Quality?

Also do we have an option to load parquet/orc/avro files in TOS for Data Quality? If yes, kindly help with some documentation

Please feel free to revert should there be any queries.

Thanks,

Senthil

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hello,

I'm afraid that this feature is not available in talend open studio for data quality.

For your issue "How to connect azure storage from a data profiling perspective"

After checking with DQ experts, they recommended trying to create an HD insight cluster to connection azure storage from the DQ side.( solution required HD Insight installation on Azure storage)

Profile parquet files in “profiling perspective” is a new feature for us.

 

In addition to that, you are able to read parquet and CSV file from Azure storage in Talend Studio integration perspective in Standard job and Big Data Batch job and "tFileInputParquet" component, is available for both DI and BD (Batch/Streaming) jobs.

Hope it helps.

Best regards

Sabrina

View solution in original post

4 Replies
Anonymous
Not applicable

Hello,

I'm afraid that this feature is not available in talend open studio for data quality.

For your issue "How to connect azure storage from a data profiling perspective"

After checking with DQ experts, they recommended trying to create an HD insight cluster to connection azure storage from the DQ side.( solution required HD Insight installation on Azure storage)

Profile parquet files in “profiling perspective” is a new feature for us.

 

In addition to that, you are able to read parquet and CSV file from Azure storage in Talend Studio integration perspective in Standard job and Big Data Batch job and "tFileInputParquet" component, is available for both DI and BD (Batch/Streaming) jobs.

Hope it helps.

Best regards

Sabrina

JuttuSenthil
Contributor
Contributor
Author

Thank You very much for the response. This was really helpful to understand the availability of the feature.

Anonymous
Not applicable

Hello,

Feel free to let us know if there is any further help we can give.

Best regards

Sabrina

msjian
Employee
Employee

hello
We can support profiling ADLS gen2 file by jdbc driver see 
TDQ-20315 and TDQ-18068

Profiling ADLS gen2: we have doc https://help.talend.com/r/en-US/Cloud/studio-user-guide-api-services-platform/profiling-adls-databri...
thanks