Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have one doubt...
I'm using the tFileInputMSDelimited and two tMaps in my job (for now). In the tMap lookup settings, I have a get_dates component that will retrieve the dates from database. To a better performance of my job the tMap lookup settings is with the option "Reload at each row (cache)" enabled, to avoid call the select statement in database more than one time for the same date. The question is that I'm using the same get_dates component in two different tMaps, so all the dates that was saved to the cache in the first tMap will be available for the second too? Or it create a cache for each tMap?
Thank you!
I'm afraid cache are separated for each tMap.
Maybe you should to query the dates once and store the result into a tHashOutput and replace existing tDBInput for this table by a tHashInput.
Thank you @TRF,
Yeah, I think on this solution too. The problem in this one is that I will need to read all data inside file 2 times, one to load the thashoutput and the other to do what I'm doing. Do you know another solution to avoid this 2 times read?
Luiz Ramos
Yeah, I know.
What I'm saying is that I will need to read the data from the original file 2 times, one first to populate the thashoutput and the second to do what my job is doing. Follow an image from what I think I need to do if I use thashoutput:
Luiz Ramos
Should be like this, no?
Ahhhh ok, sorry for that, let me share the scenario.
I'm using the get_dates as an example, but I want to perform the same action in all other lookup from the first image. The date table has 10000 records right now (this will grow up). But the other lookups will have more than 1 million of records. So what I don't want to do is get all records from each table and insert in thashoutput, because will crash the memory. What I think to do is to populate the thashoutput only with records that exist in the file that I'm reading, to avoid this memory crash. But for this, I think I need to do what I show in the second image.
For example, if I have 10000 records in date table, but my file has only 2 different dates, I only want to store in thashoutput these 2, and not all 10000.
Getting this same example, I have another question. One way to avoid all the workaround in our last posts, is to delete the second tMap and find a way to send the 2 different output from the original file to the same tMap. With this, I don't need to use the second tMap and the "Reload at each row (cache)" will work successfully. However, the 2 outputs have different schemas, and I know that Talend doesn't support "circle execution flow". Follow below an image of my entire job:
Do you have any suggestion for that?
Thank you!
Luiz Ramos