Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Write Table now available in Qlik Cloud Analytics: Read Blog
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Improve performance when do a lookup

Hi team,

When i run a subjob, the main table has more than 200+ thousands records. I used "Lookup Model:Reload at each row" to filter Lookup table in tMap. Also added conditions in main table. However, the whole of job still will take about 20min.

 

0683p000009Lu6k.png

I did some filers through globalmap in lookup table.

0683p000009Lu6y.png

 

In tMap, implement "Left Out Join" and "Reload at each row" lookup model.

0683p000009Ltmg.png

 

Complete to run the whole of job will take so long time(20min).

Do you have any idea to do the performance tunning?

BTW are there any good ways to use connection pool in Talend?

Thank you in advance!

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Now we resolve the performance issue is to add more filters in lookup table which is simplest. After i modified it that speed up 50%. It's more better than before.

Many thanks to your help 0683p000009MACJ.png

View solution in original post

16 Replies
Anonymous
Not applicable
Author

First of all. are you getting the data from the same database? If so, you should consider joining and filtering in the DB. There is little point bringing in more data than you need to only throw it away.

TRF
Champion II
Champion II

Hi,
Did you measure response time with these queries from outside of Talend?
Also (will not solve the problem) you should remove all fields from the select part for which you know the value (they are in the where clause). You'll have less data to transfer from the db server.
Anonymous
Not applicable
Author

The data is from different database.

I think i have already added enough conditions to filter in the DB, at the same time, did some deals in the tMap, so i have no idea how to improve the performance through other ways!

Anonymous
Not applicable
Author

The data is from different database.
I think i have already added enough conditions to filter in the DB, at the same time, did some deals in the tMap, so i have no idea how to improve the performance through other ways!
Anonymous
Not applicable
Author

OK, it looks like you are doing the sort of things I would try to speed things up. Another thing to test is where the bottleneck is. Can you disconnect your DB write/update components and just run the code where the join is happening. Does that massively speed things up? If so (as I have seen before) this could very well be a insert/update issue. Also, are you inserting or updating.....or both?

Anonymous
Not applicable
Author

I just tested the joining part and disconnected the update/insert. I think the bottleneck is from the joining part.

0683p000009Lu81.png

Anonymous
Not applicable
Author

OK. Somewhere where you might get a slight improvement is by removing the joins on the columns you are filtering your lookup on. They are not needed. You will only be returning rows that hold the same values as the row that is in coming in in your main flow of data. Therefore you can remove those from the query and from the join. This shouldn't resolve this issue, but will be more efficient.


However, before doing that are you sure that loading the lookup on every row is the most efficient way of doing this? Sometimes it is without question, but if your main flow is 200,000 rows, that is 200,000 queries fired to the DB. I am assuming you have probably looked into this, but it is worth considering if you haven't....and easier to test before removing the joins I advised you to remove above.

 

 

Anonymous
Not applicable
Author

I tried to remove the joins, however looks like there is no more efficient.

0683p000009Ltzp.png0683p000009LtnZ.png

And i think i have to load the lookup on every row because the lookup table has 428,634,831 rows. Or else it should be more slower.

Anonymous
Not applicable
Author

OK, do you  know how many rows are returned for every row that the lookup is fired for? In theory you should only return the rows that you need if you are going to fire 200,000 queries. If you are returning more rows and having to filter them in Java, maybe you can tailor your filtering in your query?

Another solution to this will require some playing around, but may just work. I suspect that you can filter preemptively find some values to filter your lookup query on, before you carry out your lookup once. To do this you will need to change the job to dump your main flow into a tHash component (store it in memory). You have some columns that you need to join on or filter by. Now, if you load the main flow preemptively you *may* be able to find some data to use to filter the lookup by. For example, lets say that your lookup data has a column called "alphabet" which holds every letter from a to z. However, your main flow only returns "a", "g" and "y". If you know this before running the lookup query, you can add an "IN" filter to the query and pass in your comma separated list of "a", "g" and "y" to the SQL query. This might remove 3/4 of yoru lookup data. If you are only loading this once and into memory, you will lose the latency of firing the query for every row and the number of rows to be checked each time is also massively reduced. 

 

Another thing I have just noticed, do you need to match ALL ROWS on the match model? If you don't then switch this off. It will mean that all data will need to be checked for every incoming row.