tFileInputParquet - How to read generic parquet files and extract schema?

CPorrot1602485748 — Sat, 16 Nov 2024 01:14:22 GMT

Hello.

I want to make a job that reads a parquet file, in order to apply transformations to the data. This job needs to be generic, which means that I cannot know in advance the column names of my parquet files.

Unfortunately, tFileInputParquet is not compatible neither with dynamic fields, nor with reading the lines as a single string, which would allow me to circumvent this issue.

How can I solve this issue without resorting to a custom Spark program? Thank you in advance.

imho, I am very surprised that schema guessing is not already a feature for most components. Since Spark's error stacktrace actually lists the column names in the error message, this looks like a technical possibility. Even default index numbers would help tremendously. As far as I am concerned, industrializing any of our jobs through Talend is either difficult or almost impossible.

topic tFileInputParquet - How to read generic parquet files and extract schema? in Talend Studio

tFileInputParquet - How to read generic parquet files and extract schema?