Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello, i have a large document in tfileinputExcel that pass trough an tMap
in metadata all the fields are String with only two exceptions(prices)
This is my metadata, and when i run my job i got this stupid issue "For input string"
The Field "1 ano" is inside [Garantia_Balcao]
You can check this out in my xls file above
Anyone can help me in this issue?
Wow, that is a nasty Excel file to work with. You have your work cut out. The problems I noticed were as follows...
1) The sheets are littered with merged cells
2) The format of the data is somewhat random in parts. For example, some columns have primarily numbers, but then randomly have text. The Talend metadata can fall over on this sometimes.
3) The sheets have frozen cells
4) The sheets never start at the top of the page
However, I got round this by removing ALL formatting from all sheets, removing the blanks rows at the top and then generating the schemas for all sheets. Then I ran those schemas against the original file and debugged from there.
I found that some of the columns said they were floats, but further down the data seemed to have text. So I manually changed the column types for those. My schema ended up looking like yours, but I used float instead of double.
I then set the tFileInputExcel config up as below. Notice I set one sheet and point to it. If you read all sheets you will get errors. I also set the header to be row 8.
My schema looks like this....
Once I had this setup, I didn't suffer from the issue you were suffering from anymore
Yeah mate, thats a nasty Excel file hahaha. For me whose is starting in talend were insane to work with.
1) I know that excel contains Merged Cells (the file is sent every month from the partner, if i have to treat the data was not fit to my process)
2) I know that issue, i think this make Metadata confuse and get some errors
3) The same issue i told u on topic 1(if i have to treat data, will not fit to my process)
4) My Job already starts at 8 cell, i know that too .
Lets go,
I need to read All Sheets, because i need all those fields, i was working with one metadata for every Sheet, but reading in tFileList(One file)
There is a way to make "dynamic metadata" to use only one Metadata for all columns of each sheet? I believe u saw that has like 8 Sheets and in some cases 2 Sheets contain the same category of another one.
In the fields were float or something else i changed everything to String with only 2 exceptions(Street Price and Gross Price)
Unfortunally i need to read all sheets in Excel Files(Because need to me Dynamic, if someday i got from Partner the excel which contain "Acessorio" instead of "Acessorios" i will have to make all the whole job again. Imagine to do this job every month
Thanks for all the support everyone gave me in this Issue.
Btw i sent you a private message containing the screenshot of whole job to you see how insane it is.
Warms Regards,
What you could try is this. I am not promising it will solve your problem, but I think there is a good chance it will.
1) Create a job which has a different flow for each each Excel sheet. So if you have 10 sheets, create 10 flows (Subjobs). One for each sheet. When you set up the Excel input file for each sheet, limit it to the sheet you want and specify only the schema for that sheet.
2) Create a parent job which will use the tFileList to pull the files in one at a time and iterate through the job I have described above. So essentially you have a tFileList---->tRunJob (calling the child job described above).
First test this with 1 flow in the child job. So something like this....
tFileInputExcel (only sheet1) -----> tLogRow
Then if that works, add the following....
tFileInputExcel (Sheet 1)----->tLogRow
|
| (OnSubJobOk)
|
tFileInputExcel (Sheet 2) -----> tLogRow
Then add a third sheet and fourth, etc.
Please be aware that any changes to the file that is supplied to you will create issues for your job. So it might be a good idea to try and negotiate a slightly better format from your supplier of files.
Thanks for this method of differents sub Jobs, i will try that soon as possible (but i will have to specified every input of sheet right? Cant use "All sheets")
About the issues with the files supplied has changes will have trouble is a known issue. For my boss is not, he thinks can be dynamic as well to "understand" the changes from the Input.
Thanks mate, i will try to convert everything to String and later convert with Parse Double in tMap, i believe this could be the correct form to do this when working with many schemas in the same job(while reading all the file, all the time but using some sheets in each row)
@everyone
There is a way to accept the two answers as solution? Haha you both help me a lot in this issue, Thanks
Kinds Regards,
Yes you will need to untick all sheets and only use one sheet at a time to test my solution. I suspect that the All Sheets option may be contributing to your issues.
We all have somebody we work with who thinks that computers can do magic and "understand" what a human does without A LOT of training/programming. My degree was in artificial intelligence and if computers could "understand" as much as people assume they can, we wouldn't need people like you and me to build things for them.....in fact we wouldn't need them 😉
With regard to accepting multiple solutions, you can do that. As the person who raised the question you can just click on the Accept Solution button at the bottom of each post you think has helped you resolve this.
Ok
Yes i think in IT progamming the hardest thing is to explain to ppl did not know progaming as well some things isnt easy to make. All progaming require some train and a lot of time.
Yes if computers could make all the stuff ppl can make is no needed ppl anymore.