Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Everyone,
One of the questions I get whenever I visit any client for a Talend project / assignment is
Can I have one single job which can process multiple files with different schema, format, etc. ? I should be able to carry out following transformations for example -
1) Read Data from File
2) Sort based on column(s)
3) Filter records on some condition(s)
4) Aggregate data
5) Store it in individual table ( each file will have separate table )
Which file to process, which column to use for sorting, condition to filter data, columns to aggregate data - all this information should be able to pass to Talend job through a configuration file / table.
Those who have worked on other ETL tools like Pentaho, Ab intio will know this is very much possible through these tools. In case of Petaho - metadata injection feature allows you achieve this.
I understand - there is a feature available in the form of dynamic schema ( Enterprise Edition ). But it does not really allow to implement the use case mentioned above.
I've had hard time to explain this to one of my client. But on a second thought - it appears feature like this would be useful in situations where multiple source files need to go through set number of transformations.
Therefore just want to understand what community members and Talend team think about this?
Hi Shong,
Thank you very much for quick response. Few reasons I mentioned to the client - more generic job you try to design -
1) Overall design becomes complex and difficult to maintain
2) Testing such jobs also becomes difficult
3) Massive configuration table / file means you need to train people to provide accurate information to the job
4) If such job breaks down - debugging also becomes challenging
Thanks,
Nishad Joshi.