Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

Usage for large data tMap lookup and deleting rejects data

Hi, currently we have this on our Talend. See below:
source (lookup) table(tMSSQLInput ) ---> insert into staging table (tmssqloutput
on sub job ok --> Delete from table where data is not in source (lookup table) (tMSSQLRow)

Basically we just need to delete data from Destination table if the data is missing from Lookup table. Our look up table can consist of huge amount of data, like 200 million. 
When we tested it on the large amount of data, it took us about more than 1 hour just to delete rejected records for just 1 table. 
What is the right approach for this? Should we use tMap to delete the rejected data?

source large data (tmap lookup table)
                                                   |
Destination table(tMSSQLInput ) --- tmap ---  Delete rejects in destination table (tMSSQLOutput)


Or is there a component that can do bulk delete?
Thanks in advance!

Labels (2)
6 Replies
Anonymous
Not applicable

Hi,
Have you tried to  store the data on disk instead of memory on tMap?
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

xdshi wrote:
Hi,
Have you tried to  store the data on disk instead of memory on tMap?
Best regards
Sabrina

Yup we have tried it too. We're also having a problem that when we run it on Talend Studio, we can see it's running but when the talend job is called from Web application, it seems that we're stuck on the SELECT statement when we checked on MS SQL Studio Management. Is there an issue with that? They have different behavior. 
Anonymous
Not applicable

Hi,
Is there any error messge printed on console? Could you please post your current job setting screenshots into forum which will be helpful for us to address your issue.
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

Unfortunately, I cannot upload files. Actually I have a username here. I tried to login on this site but when redirecting to forum site, it will say "You are not logged in." But when I tried to click on log in, my username will appear.
Anyway, It's just a simple Mssqlinput component to tMap with look up table and it will delete the rejects data.

look up table --------
                           |
tMssqlInput ----> tMap -----> tMSSQLOutput

The difference is the look up table is more than 200 million. We're using the store to temp drive in tMap but still no luck. The job that's running is stuck on select and the error will be connection closed. I think because of the large data that's why the connection to the database got stuck, what should we do about it? How do you handle millions of data?

Thanks!
Anonymous
Not applicable

I don't think that it is an issue of whether or not Talend can handle 200m records; but I think 200m is too many rows to have in a look-up.
I think you need to review your overall architecture and see if there is an alternative approach for identifying rows that should not be loaded.
If you have no alternative, then there are some cases where it makes sense to push the join down to your database, and this may be one of them.
_AnonymousUser
Specialist III
Specialist III
Author

tal00000 wrote:
I don't think that it is an issue of whether or not Talend can handle 200m records; but I think 200m is too many rows to have in a look-up.
I think you need to review your overall architecture and see if there is an alternative approach for identifying rows that should not be loaded.
If you have no alternative, then there are some cases where it makes sense to push the join down to your database, and this may be one of them.

I agree with you. But we have no choice. The look up table is from other database that's why we're thinking to use tMap.