Mapping a file types in Talend Open studio for bigdata
Hi, I have installed Talend Open studio for Bigdata 5.3.1 and what i wanted to achieve in this is explained below. Suppose consider i have 3 files with different formats namely csv,xml and json. For the first time when i load and read these files i will create job components and define the schema for each file. Also i wanted to write some external script such a way that , for second time if the file comes with the same field structure with different data of either csv or xml or json ,my script should call the talend and execute the job particular to that file format. In the sense ,for the 2nd time if the file to be read is xml then it should read the schema created for the xml in the first time, and the file coming is csv then it should use the schema created for the csv file in the 1st time. So my script can be .sh or .bat file.So can i specify opening the talend and running the jobs based on the file type(csv,xml,json) ? Is it possible to do ? Note: Talend open studio does not provide metadata tab under repository manager. So do we need to go for Context variable in this case? Please help me what can be done in this scenario. Thanks, ShreeCS
There are many ways to accomplish what you've described...
Starting with what you have in mind, yes, you can create a job that has a context for the file type, and call the default Talend .sh or .bat and override that context like the following:
./MyTalendJob_run.sh --context_param TODAYDATE=2014-03-24 --context_param FileExt=json
I'm not sure what you mean by TOS does not have metadata in the Repository... It does - at least in my studio
Now instead of even worrying about passing a particular context value - how about: you design your job to have three flows inside; each one starts with checking the extension of the file (.csv, .xml, .json); and depending on the extension, taking one path within the job that uses a particular schema? You could design the job to simply read all files within a starting directory, pick one at a time and process....
tFileList --> tJava (decide extension) --> onSubJob OK --> tFileInputDelimited (in case of .csv) ---> read contents with schema --> do something...
Hi willm,
Using TOS 5.3.1 for Bigdata where i'm not able to see the on SubJob OK in the trigger.Also i have attached the screen shot below.
So in this case what should i do?
Hi Willm,
I followed the job process u suggested me.
Here its finding the file type(extension) and reading this. Now i have used only csv file and xml,later on i ll go for json as well. But the thing is i'm getting some error like "Content is not allowed in prolog. Nested exception: Content is not allowed in prolog" and also not reading the xml file properly.
Also after the tJava component i have 2 flows,one is for csv and other is for xml. I have used tFileInputDelimited and tFileInputXML and connected to jJava using onComponentOK. I'm not sure what i need to use here as i do not get the option like onSubjobOK in trigger.
Also i have attached the screenshots of my job and the error.
Thanks,
ShreeCS
Hi,
Im able to resolve the xml error. I changed xml type to document to do so.
I need one help how we can write expression in IF clause to check for the file ,if it is csv it should go to tFileInputDelimited and if xml then it shold go for tFileInputXML components. I tried writing expression in IF clause in talend ended up with errors.
Here in my case IF is between tJava and tFileInputDelimited.
Hi,
Im able to resolve the xml error. I changed xml type to document to do so.
I need one help how we can write expression in IF clause to check for the file ,if it is csv it should go to tFileInputDelimited and if xml then it shold go for tFileInputXML components. I tried writing expression in IF clause in talend ended up with errors.
Here in my case IF is between tJava and tFileInputDelimited.
Thanks,
ShreeCS