Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

tStandardizeRow Usage?

Hello i have a file not delimited and i would like to parse it
Would it be possible to split my file(according to row lengths) by using RegEx ?
For exemple i want to say:
the 1st row is from 1 to 7 char, the 2nd is from 8 to 12 ...
Is it possible? Where can i configure it?
Than you in advance
Labels (2)
6 Replies
Anonymous
Not applicable

Hi,
Regarding your previous post https://community.talend.com/t5/Design-and-Development/Big-Data-Positional-File/td-p/85416, it seems you have to use MapReduce job.
If so, TalendHelpCenter:tFileInputRegex haven't supported for MapReduce yet.
Here is a solution for your use case: Put your file into Hadoop firstly then tHDFSInput ---> tMap(tHDFSInput---> tJavaMR).
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

Hi Sabrina,
Thank you for your attention,
So, i will use tHDFSInput (with a single column schema , raw string)-> a tjavaMR (with my csv real columns ) -> tlogRow

Is there something wrong according to you?
_AnonymousUser
Specialist III
Specialist III
Author

Finally,
i have used a tHDFSinput followed by a tMap.
The tmap does a substring on input rows.
Do you think it is a good solution?
I am working with very big file (90gb)

Best regards
Anonymous
Not applicable

Hi,
In case there is any memory issue caused by big file for your job , could you please take a look at the online KB article
TalendHelpCenter:ExceptionoutOfMemory.
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

Thank you Sabrina.
Can you confirm to me a last thibg?
Indeed, mapreduce jobs are played in my cluster, aren't they?

So the memory exception should happen because of the tlog? If i directly insert the data in a database. It shouldn't happen no?

Thank you a lot for your help Sabrina.
Anonymous
Not applicable

Hi,
The tMap component is cache component consuming two much memory. You'd better store temp data on disk.
If i directly insert the data in a database. It shouldn't happen no?

It depends on your input data and your design.
There are several possible reasons for an outOfMemory Java exception to occur. Most common reasons for it include:
1:Running a Job which contains a number of buffer components such as tSortRow, tFilterRow, tMap, tAggregateRow, tHashOutput for example
2.Running a Job which processes a very large amount of data.

Best regards
Sabrina