Hello i have a file not delimited and i would like to parse it Would it be possible to split my file(according to row lengths) by using RegEx ? For exemple i want to say: the 1st row is from 1 to 7 char, the 2nd is from 8 to 12 ... Is it possible? Where can i configure it? Than you in advance
Hi Sabrina, Thank you for your attention, So, i will use tHDFSInput (with a single column schema , raw string)-> a tjavaMR (with my csv real columns ) -> tlogRow
Finally, i have used a tHDFSinput followed by a tMap. The tmap does a substring on input rows. Do you think it is a good solution? I am working with very big file (90gb)
Hi,
In case there is any memory issue caused by big file for your job , could you please take a look at the online KB article
TalendHelpCenter:ExceptionoutOfMemory.
Best regards
Sabrina
Hi,
The tMap component is cache component consuming two much memory. You'd better store temp data on disk.
If i directly insert the data in a database. It shouldn't happen no?
It depends on your input data and your design.
There are several possible reasons for an outOfMemory Java exception to occur. Most common reasons for it include:
1:Running a Job which contains a number of buffer components such as tSortRow, tFilterRow, tMap, tAggregateRow, tHashOutput for example
2.Running a Job which processes a very large amount of data.