Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear All,
I am using tmap to lookup two dfferent databases and using exp i am narrowing the lookup values however i am getting a performance of just 7-8 rows per sec.
I want to process around 1 Million records.
Attached is the design, is there anything more than can be done to improve performance.
Note: All indexes are in place.
Thanks
Vidya
OK, your problem is the reload at each row. I suspect that your query being fired is looking through a lot of data and you are firing it a million times. That is guaranteed to be slow. From your diagram it looks like the main source of data and the lookup query are from the same database. If that is the case, do the lookup in the main query. There is absolutely no point joining in Talend if your data starts off in the same database. If it is not in the same database it might make sense to add the lookup data to your main data's database somehow.
You will not get round this with simple tweaks I'm afraid. 1 million queries is a lot of queries. You have to deal with the latency of building, sending and receiving the data for every single row in your main source.
It looks like your Main row is quite slow. Can you test this by removing the other components and testing with just a tLogRow. Also, can you show us your DB component configuration, both Basic and Advanced.
There's quite a few things that can cause a job like this to be slow. You might try creating a test job with just the database connection and a tLogRow (no tMap) and see if it is significantly faster. If it isn't, then tMap isn't the issue.
If tMap is likely the issue, try rewriting your select query so you don't need to use an expression filter. You can include the context variable from globalmap() in a query statement; that way, the db's query engine is doing the work, rather than tMap (which is necessarily going to be slower, because it processes one row at a time, similar to a cursor).
Hope this helps.
Hi Rhall,
Attached is the job with just logrow and db connection and also basic and Advanced settings.
Thanks
Go back to your original job and switch on the "Use Cursor" tick box. I think you will see an improvement.
Hi, ticking "Use Cursor" had no impact on performance, its still the same.
Do you recommend any Cursor Size, i tried from the range 100- 10000
How is your tMap configured? Can you show us a screenshot of this configuration please?
here is the tmap config and db query
With cursor size of 100, the performance was slightly improved from 7 rws/s to 11 rws/s.
Can it be imporved more ?
OK, your problem is the reload at each row. I suspect that your query being fired is looking through a lot of data and you are firing it a million times. That is guaranteed to be slow. From your diagram it looks like the main source of data and the lookup query are from the same database. If that is the case, do the lookup in the main query. There is absolutely no point joining in Talend if your data starts off in the same database. If it is not in the same database it might make sense to add the lookup data to your main data's database somehow.
You will not get round this with simple tweaks I'm afraid. 1 million queries is a lot of queries. You have to deal with the latency of building, sending and receiving the data for every single row in your main source.