Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Save an extra $150 Dec 1–7 with code CYBERWEEK - stackable with early bird savings: Register
cancel
Showing results for 
Search instead for 
Did you mean: 
Jacco_De_Zwart
Creator
Creator

Update source table in job

Hi there,

I think I'm doing something wrong, but I don't know what.

Here is my job

0695b00000UzaQqAAJ.png

The tDBInput_1 does a full select of the source table

The tmap component routes the columns to the components HashTheKey (produces a hash over two key columns) and HashTheRow (produces a hash over all the columns)

And at the end I want to update the rows that I have selected with the two columns HashKey, and HashRow. This update is on the same table which is used in tDBInput_1.

The table on the database has primary key columns defined including a clusterd index on those fields. All trough the job the fields are marked as key.

After waiting a long time (more than 1,5 minute) the execution looks like this :

0695b00000UzaRjAAJ.png

Questions are :

1 Why are only 1000 rows selected ?

2 Why does it take a long time to update just 1000 rows ?

3 Is there a work-a-round ?

4 Does anybody know what mistake that I'm making ?

Please help !

Thanks in advance.

Jacco

Labels (5)
1 Solution

Accepted Solutions
Jacco_De_Zwart
Creator
Creator
Author

Think I got it !!

 

Change the initial load of the table, added an new identity field with an auto increment.

Created a clustered index on the new ID field.

 

Used the ID field as key in my job, and . . . . . from 145 row/s to 8192 rows/s !!!

0695b00000UzbfIAAR.png 

Thanks for the hints !!

 

Jacco

View solution in original post

7 Replies
Jacco_De_Zwart
Creator
Creator
Author

Okay, anwser to self.. . . . Found out why only 1000 rows seem to be selected. Has to do with the Batch size of the tDBOutput component. Changed that to 10000 and 10000 rows are selected.

But the problem of the long wait still remains 😔

Anonymous
Not applicable

Well spotted regarding the commit. Sorry I didn't see this earlier.

 

About the time it takes, can you say whether the time is taken with the reading of data or just the writing? You can test this by replacing the output with a tLogRow and running the job to see how fast the data is written to the output window. If it is definitely the writing of the data, could you show us your output db component settings?

 

I should say that you will likely get better performance if you allow the DB components to create their own connections and allow the output component to commit by itself. It doesn't look like you need to wait to commit until the end (going by your job design). This might be worth a try.

Jacco_De_Zwart
Creator
Creator
Author

Thanks for reply !

I tried to strip the job down to its essence. It looks like this now :

0695b00000Uzb0eAAB.pngWhen executing the job, the jobs holds on the first 10000 rows as shown above.

 

After waiting halve a minute the next, and then a long wait and again the following 10000.

 

When I look to the Activity monitor on MS-SQL Manager, I see that the insert is an expensive query. The execution plan looks like this :

0695b00000Uzb18AAB.png 

Using an index on the columns, the execution plan looks like this :

0695b00000Uzb1XAAR.pngTo me it is strange that the index is updated, so now I'm lost.

 

Jacco

 

Anonymous
Not applicable

Just so that I understand, I take that you have set the component to commit by itself here and to use its own connection? If so, did you see any improvement?

 

Also, how big is the table being updated? Updates do generally take a bit longer, but from what you said I presume you already knew that. How often is the table reindexed? Your plans do look expensive for a simple update. Do you have similar performance issues if you replicate the updates using just SQL Server (ie, without Talend)?

 

Other things you can try to tweak this is to modify the batch size (advanced settings) and also, are you using the "update or insert" or "insert or update" option on the table action? Often this can be a pretty heavy load on the job if the table is massive. In this case it is more efficient to identify if the data is an insert or update prior to loading it. Then select two db components, one to insert and one to update.

Jacco_De_Zwart
Creator
Creator
Author

Yes, the components have their own connection.

 

Right now the table hold about 900.000 records, but this is in test fase. In production there will be an estimated 5 miljoen records in the table which need to be updated with a hash over the key fields and an hash over the whole record.

 

The table is not often index. Only the first time, after the insert of all the records the index is created.

 

Can it be that the index is on two character fields (one is char, the other is varchar) ?? I will test if an integer index will speed things up a bit.

 

I will also check to see if I have the same performance issue with an other job doing the same actions.

Jacco_De_Zwart
Creator
Creator
Author

Think I got it !!

 

Change the initial load of the table, added an new identity field with an auto increment.

Created a clustered index on the new ID field.

 

Used the ID field as key in my job, and . . . . . from 145 row/s to 8192 rows/s !!!

0695b00000UzbfIAAR.png 

Thanks for the hints !!

 

Jacco

Anonymous
Not applicable

I think you did all the work there 😉

 

Glad you got it sorted and thanks for coming back to give your solution!