Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Best way to extract the schema from s3 objects

we are trying to use talend to create a data lake. We will be receiving files (in form of csv) from landing zone which is s3 . However I require the schemas to be dynamic as this is to be done for numerous tables or files therefore we want to escape having to define schema for each. The reason for having schema is that we will have to convert csv files to parquet before writing to curated layer of our data lake.  So I am seeking your input for solving this issue, my question is what is the best practice to extract schema? should we ask source to send us manifest file which has the schema? is that a good practice? if so how we can extract the schema (col names along with data type) from manifest file and pass it to the talend components. is there any other alternatives?

Labels (4)
1 Reply
Anonymous
Not applicable
Author

Hello,

Are you able to use tfileinputdelimited component to read your files? Here is a dynamic schema feature in talend subscription solution.

Please refer to this article for more details.

https://community.talend.com/t5/Design-and-Development/How-to-process-changing-data-structure/ta-p/2...

Best regards

Sabrina