tFileInputParquet - How to read generic parquet files and extract schema?
Hello.
I want to make a job that reads a parquet file, in order to apply transformations to the data. This job needs to be generic, which means that I cannot know in advance the column names of my parquet files.
Unfortunately, tFileInputParquet is not compatible neither with dynamic fields, nor with reading the lines as a single string, which would allow me to circumvent this issue.
How can I solve this issue without resorting to a custom Spark program? Thank you in advance.
imho, I am very surprised that schema guessing is not already a feature for most components. Since Spark's error stacktrace actually lists the column names in the error message, this looks like a technical possibility. Even default index numbers would help tremendously. As far as I am concerned, industrializing any of our jobs through Talend is either difficult or almost impossible.