Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
talendtester
Creator III
Creator III

tMap - Is there a way to force the Lookup's temp files to store on disk after job runs?

Our reference file has over 100M rows of data.

The job compares a file with new data against the reference file and rejects the new data to a rejects file.

The tMap stores the reference file's data to a temp folder on disk before the lookup is done.

 

How can I force Talend DI to keep the reference data in the .bin files in the temp folder on disk after the job finishes running, so the job runs quicker next time? Currently, the files in the temp folder are deleted when the job is done and they have to be recreated the next time the job runs.

 

The job looks like:

 

                            tFileInputDelimited_2

                                          

tFileInputDelimited_1 > tMap > tFileOutputDelimited

 

 

Labels (2)
3 Replies
Anonymous
Not applicable

Hi,

 

    You cannot store the temp data permanently as it will beat the original requirement to clear the temp space after processing is complete. Since temp data is also a file, there will be still file I/O operations which you cannot avoid. Another flip side of your approach is the overhead and file management issues when the lookup file gets modified.

 

     Considering the file size and processing need, why dont you do this operation using Bigdata Spark Batch job? It will be much faster for these type of huge file operations.

 

Warm Regards,

 

Nikhil Thampi

talendtester
Creator III
Creator III
Author

I just download and installed TOS Big Data. 

 

The next step is I need to find or create a HDFS HadoopCluster?

 

How is TOS Big Data different than normal TOS DI? 

Do the Big Data components perform differently?

Anonymous
Not applicable

Hi,

 

     You are right. You will have to create a cluster to host your files.

 

      TOS BigData contains all the features of TOS DI + specialized components and job flow to run big data batch jobs.

 

      The Big Data jobs will be using either Mapreduce or Spark framework to process the data flows.

 

Warm Regards,

 

Nikhil Thampi