Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am reading 15 GB input file in talend, which has 200, "|" delimited fields(columns) out of which I need to use 5 random fields.
To use these 5 random fields, I am read whole 15 GB file with 200 columns using tFileInputDelimited component then I filter unwanted 195 columns using tFilterColumns component, which is time consuming process(It takes approx 4 to 5 mins to read whole 15 GB File).
Can anyone of you please suggest if there is any other alternative way for implementing this.
More specifically is there any way to read only specific fields from delimited file.
try tfileinputregex
https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/CxhF82OjiQKwpRJriKB6~g
try to read data in stream mode. thiscould improve performance.
and set parallelize ( if you have subscription version)
https://help.talend.com/reader/TKUQ4WRBbYZRnl9OyAgr5w/cSnwqkJCdsct_heLy3lrAQ
I'm afraid - in this case, you cannot improve time hardly.
delimited format mean read file row by row, even if you need few columns - you must read row
with an average disk (not NVMe) simple read will take 2+ minutes for 15Gb file
plus some time for parse/filter
Not sure just confirming, As per my understanding when we are reading delimited file using tFileInputDelimited, it will read data row by row and it will create objects for each field of it according to its type. Correct me if I am wrong