Mapping a file types in Talend Open studio for bigdata
Hi, I have installed Talend Open studio for Bigdata 5.3.1 and what i wanted to achieve in this is explained below. Suppose consider i have 3 files with different formats namely csv,xml and json. For the first time when i load and read these files i will create job components and define the schema for each file. Also i wanted to write some external script such a way that , for second time if the file comes with the same field structure with different data of either csv or xml or json ,my script should call the talend and execute the job particular to that file format. In the sense ,for the 2nd time if the file to be read is xml then it should read the schema created for the xml in the first time, and the file coming is csv then it should use the schema created for the csv file in the 1st time. So my script can be .sh or .bat file.So can i specify opening the talend and running the jobs based on the file type(csv,xml,json) ? Is it possible to do ? Note: Talend open studio does not provide metadata tab under repository manager. So do we need to go for Context variable in this case? Please help me what can be done in this scenario. Thanks, ShreeCS
Another good way is to use directly in tJava
context.File_Ext = ((String)globalMap.get("tFileList_1_CURRENT_FILEEXTENSION"));
context.File_Ext = StringHandling.UPCASE(context.File_Ext);
This is much simpler and less complicated...
Vaibhav
Hi,
One more thing is , for csv files i have defined the schema (field structure) for the 1st time. For the 2nd time , the csv file with the same field structure will be read using the schema defined already. But in case of XML file while reading for the 1st time i have to specify the Loop Xpath Query whre i will specify the root tag of the XML file. For the 2nd time if i read the xml file with the different root tag will not be read. So what can i do in this case ? how can i achieve this ?
Also one more thing is , i wanted save those files after reading. Here i'm using tLogRow to see the output in the console but i wanted to use tFileOutputDelimited for cs and tFileOutputXML for xml files. If i'm reading only one csv file and one xml file , i'm able to save those files . If i reading more than one csv xml file ,i'm not getting the result. For that also again i need to use tJava component and write the code for different output file. How this can be done?
Please guide me on this.
Thanks,
ShreeCS
Hi Shree, Answer to first para question is - root node change means metadata change...--> can't read the file Answer to second para question is - Answer lies in the same thread above... there is a screenshot by willm... Refer that and use similar logic to tweak.. use tIterate link from tFilelist to read multiple files one at a time and based on extension change flow using If clause. Thanks Vaibhav