Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
we are trying to use talend to create a data lake. We will be receiving files (in form of csv) from landing zone which is s3 . However I require the schemas to be dynamic as this is to be done for numerous tables or files therefore we want to escape having to define schema for each. The reason for having schema is that we will have to convert csv files to parquet before writing to curated layer of our data lake. So I am seeking your input for solving this issue, my question is what is the best practice to extract schema? should we ask source to send us manifest file which has the schema? is that a good practice? if so how we can extract the schema (col names along with data type) from manifest file and pass it to the talend components. is there any other alternatives?
Hello,
Are you able to use tfileinputdelimited component to read your files? Here is a dynamic schema feature in talend subscription solution.
Please refer to this article for more details.
Best regards
Sabrina