Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Which is the best method to store mid data in the job, whether it is in csv file or in buffer memory (hashoutput).
In my scenario, I am getting 4.4 Million records from source and I need to do some operation with this. So I am storing data in the mid of the job because my job contains multiple sub jobs.
I am considering multiple perspective like performance, storage space and there should have any memory issue etc.
Please suggest me the best method to use.
Thanks in advance.
Hi,
Due to the number of records, having multiple intermediate files may help if you can parallelize the operations you need to realize with these records.
Else, having all the records in memory can generate memory issues but it depends most of the global data size than the number of records (are the records long or short?) and of course of the physical available memory.
Also, text (or CSV) file are processed very fast with standard tFileInputDelimited or tFileInputFullRow components, so you don't "really" have to worry about response time when using these components (in my opinion, except if you want to gain few seconds but I don't think this is the first concern in your case).
Hope this helps.
Hi,
Due to the number of records, having multiple intermediate files may help if you can parallelize the operations you need to realize with these records.
Else, having all the records in memory can generate memory issues but it depends most of the global data size than the number of records (are the records long or short?) and of course of the physical available memory.
Also, text (or CSV) file are processed very fast with standard tFileInputDelimited or tFileInputFullRow components, so you don't "really" have to worry about response time when using these components (in my opinion, except if you want to gain few seconds but I don't think this is the first concern in your case).
Hope this helps.