Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Streamlining user types in Qlik Cloud capacity-based subscriptions: Read the Details
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

tStandardizeRow Usage?

Hello i have a file not delimited and i would like to parse it
Would it be possible to split my file(according to row lengths) by using RegEx ?
For exemple i want to say:
the 1st row is from 1 to 7 char, the 2nd is from 8 to 12 ...
Is it possible? Where can i configure it?
Than you in advance
Labels (2)
6 Replies
Anonymous
Not applicable

Hi,
Regarding your previous post https://community.talend.com/t5/Design-and-Development/Big-Data-Positional-File/td-p/85416, it seems you have to use MapReduce job.
If so, TalendHelpCenter:tFileInputRegex haven't supported for MapReduce yet.
Here is a solution for your use case: Put your file into Hadoop firstly then tHDFSInput ---> tMap(tHDFSInput---> tJavaMR).
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

Hi Sabrina,
Thank you for your attention,
So, i will use tHDFSInput (with a single column schema , raw string)-> a tjavaMR (with my csv real columns ) -> tlogRow

Is there something wrong according to you?
_AnonymousUser
Specialist III
Specialist III
Author

Finally,
i have used a tHDFSinput followed by a tMap.
The tmap does a substring on input rows.
Do you think it is a good solution?
I am working with very big file (90gb)

Best regards
Anonymous
Not applicable

Hi,
In case there is any memory issue caused by big file for your job , could you please take a look at the online KB article
TalendHelpCenter:ExceptionoutOfMemory.
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

Thank you Sabrina.
Can you confirm to me a last thibg?
Indeed, mapreduce jobs are played in my cluster, aren't they?

So the memory exception should happen because of the tlog? If i directly insert the data in a database. It shouldn't happen no?

Thank you a lot for your help Sabrina.
Anonymous
Not applicable

Hi,
The tMap component is cache component consuming two much memory. You'd better store temp data on disk.
If i directly insert the data in a database. It shouldn't happen no?

It depends on your input data and your design.
There are several possible reasons for an outOfMemory Java exception to occur. Most common reasons for it include:
1:Running a Job which contains a number of buffer components such as tSortRow, tFilterRow, tMap, tAggregateRow, tHashOutput for example
2.Running a Job which processes a very large amount of data.

Best regards
Sabrina