Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, currently we have this on our Talend. See below:
source (lookup) table(tMSSQLInput ) ---> insert into staging table (tmssqloutput)
on sub job ok --> Delete from table where data is not in source (lookup table) (tMSSQLRow)
Basically we just need to delete data from Destination table if the data is missing from Lookup table. Our look up table can consist of huge amount of data, like 200 million.
When we tested it on the large amount of data, it took us about more than 1 hour just to delete rejected records for just 1 table.
What is the right approach for this? Should we use tMap to delete the rejected data?
source large data (tmap lookup table)
|
Destination table(tMSSQLInput ) --- tmap --- Delete rejects in destination table (tMSSQLOutput)
Or is there a component that can do bulk delete?
Thanks in advance!
xdshi wrote:
Hi,
Have you tried to store the data on disk instead of memory on tMap?
Best regards
Sabrina
tal00000 wrote:
I don't think that it is an issue of whether or not Talend can handle 200m records; but I think 200m is too many rows to have in a look-up.
I think you need to review your overall architecture and see if there is an alternative approach for identifying rows that should not be loaded.
If you have no alternative, then there are some cases where it makes sense to push the join down to your database, and this may be one of them.