Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
guilherme-matte
Partner - Creator
Partner - Creator

MS-CDC SQL to Azure not applying changes

Hello guys!

I'm having issues with Qlik Replicate and would like to get some insights on it.

We have a MS-CDC SQL on premises configured as source and the target is an Azure Hyperscale database.

We have a job scheduled for a fullload at 2 AM, which it executes flawlessly and is followed by the CDC activation.

The issue is, at around 6 AM, Qlik Replicate stops applying any changes to the Target, building them on Disk as the Azure SQL db CPU goes to 100% non-stop.

Bellow a few images:

Azure DB dedicated to this Qlik Replicate Task, you can clearly see the 2 AM full load and at 6 AM there is this non-stop CPU utilization, and changes are not applied anymore.

guilhermematte_0-1683677602455.png

Replicate side:

guilhermematte_1-1683677719352.png

 

It happens every day around 6AM, 

guilhermematte_2-1683677739928.png

 

On the Source side, a few jobs run in the server between 2 AM to 6 AM, but would this be the responsible for breaking the Task? is there any kind of configuration that could be made to bypass this? I know that some tables are truncated and recreated on the source side (if this could help understanding the root cause).

There are no logs or messages that would shed a light on it, the task seems to be running normally, but something on this might be causing Azure issues which stop receiving CDC changes.

 

Cheers

 

 

Labels (3)
2 Solutions

Accepted Solutions
Heinvandenheuvel
Specialist III
Specialist III

No need for a verbose log from Replicate - you have the query causing the issue from the DB. Good work.

It appears that the table in question  [dbo].[fact_devicereadings] does NOT have a proper primary key only a unique key allowing NULL. I suspect the DB was doing a tablescan for each changes. This is because you need tablescan to find a NULL value as there is are no index entries for NULL. 

Just Clone this table in DEV and fill (real) sample value and test. Set the CDC task setting "Limit the number of changes applied per change processing statement to" a low number like 100 or 1000 and see how long each chunk takes in the reptask log  (set to TARGET_APPLY DEBUG) and compare with target DB top SQL statistics.

You wrote "tables are truncated " - was it really a proper TRUNCATE, or perhaps a full/partial table DELETE. Can they use a truncate on the source if a delete was being used? (maybe this is a consequence of using MS-CDC - I don't know).

Can they change to real Primary Key?

As it is you may want to exclude the table from the CDC task, and stick it into a full-load only task to be run after the source side daily reload (at 6am). 

Hope this helps some,

Hein.

View solution in original post

Heinvandenheuvel
Specialist III
Specialist III

>> They are not using a TRUNCATE on the source, but a DELETE statement (would a Truncate work with MS-CDC? I thought it was one of the limitations)

Right. That's what I thought. The delete will come in, and is treated as 1 delete per row. Without PK this is a tablescan per row which is very slow. With PK it is still slow but better. 

Indeed, truncate may well be an MS-CDC restriction. I don't use/understand MS-CDC so I don't know for a fact.

Another thing to consider is to pre-delete (or indeed truncate) the rows on the target manually (scripted of course). You can possibly even do this on the fly when you see it is 'stuck', to release it out of it's misery and those tablescans will still happen but the table will be empty. Much easier :-).

Hein.

View solution in original post

7 Replies
Heinvandenheuvel
Specialist III
Specialist III

Seems to me you are misinterpreting what you are seeing.

>> Replicate stops applying any changes to the Target.

What makes you say that? It did not stop! It is just waiting for the target to be done before adding the next change request - but the target is never going to give it change, doing its loopydyloop.

>> but would this be the responsible for breaking the Task?

 The task is not broken. The target fails to perform the requested applies, It is trying... but spinning its wheels. Replicate gave it work and it never responded. Your debug task is to enable details (verbose) TARGET_APPLY logging in Replicate to hopefully catch what it is asking the DB to do, and at the same time ask the DB what the active top CPU DB Query is. Do those two match , does that query they give a clue as to  why it never finishes?.

Can you run that query is isolation (after taking a copy of a CSV file or attrep_changes table perhaps.  I'm not saying Replicate might not be doing something  wrong, generating a wrong query, but for now Replicate is right is waiting for the DB to finish what it asked it to do.

>> the task seems to be running normally, 

Correct.

>> Azure issues which stop receiving CDC changes.

It didn't stop receiving - it failed to finish the last request as it is still working on it.

>> ? I know that some tables are truncated and recreated 

That might be trivial on the source, but from the graphs you provided  it looks like 40M changes. At for example (just a WAG) 10,000/second that would be 4000 seconds = 1+ hour worth of work. Not instant, but not the 4+ hours you appear to see either.

Good luck,

Hein

 

guilherme-matte
Partner - Creator
Partner - Creator
Author

Hello Hein

Thanks for the Reply, I've tried to look a bit deeper into Azure, it seems that this query is the responsible for the whole thing:

(@P1 uniqueidentifier,@P2 varchar(36),@P3 int,@P4 int,@P5 int,@P6 int,@P7 float,@P8 float,@P9 datetime2(7),@P10 datetime2(7),@P11 nvarchar(4000),@P12 nvarchar(4000))
 
DELETE FROM [dbo].[fact_devicereadings]
WHERE ([DeviceKey]=@P1 or ([DeviceKey] is NULL
and CAST ( @P2 AS VARCHAR(36) ) is NULL)) AND ([DateKey]=@P3 or ([DateKey] is NULL
and CAST ( @P4 AS INT ) is NULL)) AND ([TimeKey]=@P5
or ([TimeKey] is NULL and CAST ( @P6 AS INT ) is NULL)) AND ([Value]=@P7
or ([Value] is NULL and CAST ( @P8 AS FLOAT ) is NULL)) AND ([DateTime]=@P9
or ([DateTime] is NULL and CAST ( @P10 AS DATETIME2(7) ) is NULL)) AND ([ReadingType]=@P11
or ([ReadingType] is NULL and CAST ( @P12 AS NVARCHAR(4000) ) is NULL))

 

And speaking with the client, this table content is Deleted and Inserted again every morning, so there is a DELETE FROM / INSERT INTO statement, but probably the target got stuck at the DELETE FROM statement

guilhermematte_0-1683687132871.png

 

Does this helps with something?

Since it only happens at 6 AM, I will set a full load earlier in the morning with the verbose logging to see if we get more information. Will keep in touch.

Thanks again,

Matte

Heinvandenheuvel
Specialist III
Specialist III

No need for a verbose log from Replicate - you have the query causing the issue from the DB. Good work.

It appears that the table in question  [dbo].[fact_devicereadings] does NOT have a proper primary key only a unique key allowing NULL. I suspect the DB was doing a tablescan for each changes. This is because you need tablescan to find a NULL value as there is are no index entries for NULL. 

Just Clone this table in DEV and fill (real) sample value and test. Set the CDC task setting "Limit the number of changes applied per change processing statement to" a low number like 100 or 1000 and see how long each chunk takes in the reptask log  (set to TARGET_APPLY DEBUG) and compare with target DB top SQL statistics.

You wrote "tables are truncated " - was it really a proper TRUNCATE, or perhaps a full/partial table DELETE. Can they use a truncate on the source if a delete was being used? (maybe this is a consequence of using MS-CDC - I don't know).

Can they change to real Primary Key?

As it is you may want to exclude the table from the CDC task, and stick it into a full-load only task to be run after the source side daily reload (at 6am). 

Hope this helps some,

Hein.

guilherme-matte
Partner - Creator
Partner - Creator
Author

Hello Hein

They have added an ID as primary key to the table. I've also isolated it in a single for better troubleshooting, but still whenever it gets to the reload part of the table, the latency builds up and nothing happens.

But now the query using the CPU change from a DELETE, to an INSERT:

(@P1 int,@P2 int)INSERT INTO [dbo].[fact_devicereadings] ([ReadingType],[DateTime],[DeviceKey],[Value],[DateKey],[TimeKey],[ID]) SELECT CAST ( [attrep_changes391EBA80F118EA57].[col1] as nvarchar(4000)) COLLATE Latin1_General_CI_AS , CAST ( [attrep_changes391EBA80F118EA57].[col2] as datetime2(7)) , CAST ( [attrep_changes391EBA80F118EA57].[col3] as uniqueidentifier) , CAST ( [attrep_changes391EBA80F118EA57].[col4] as float) , CAST ( [attrep_changes391EBA80F118EA57].[col5] as int) , CAST ( [attrep_changes391EBA80F118EA57].[col6] as int) , CAST ( [attrep_changes391EBA80F118EA57].[col7] as int) FROM [attrep_changes391EBA80F118EA57] WHERE [attrep_changes391EBA80F118EA57].[seq] >= @p1 and [attrep_changes391EBA80F118EA57].[seq] <= @P2

 

They are not using a TRUNCATE on the source, but a DELETE statement (would a Truncate work with MS-CDC? I thought it was one of the limitations)

Would be a solution to use a scheduled a reload for this table after they make their delete/insert on the source?

Kind regards

guilherme-matte
Partner - Creator
Partner - Creator
Author

I've separated table into another task, apparently it run alright this morning. the primary key might have solved the problem? let me add the table again to the regular task and wait the next morning, will keep you posted and mark the answer. Weirdly enough, Friday we still had the issue AND the primary key.

Heinvandenheuvel
Specialist III
Specialist III

>> They are not using a TRUNCATE on the source, but a DELETE statement (would a Truncate work with MS-CDC? I thought it was one of the limitations)

Right. That's what I thought. The delete will come in, and is treated as 1 delete per row. Without PK this is a tablescan per row which is very slow. With PK it is still slow but better. 

Indeed, truncate may well be an MS-CDC restriction. I don't use/understand MS-CDC so I don't know for a fact.

Another thing to consider is to pre-delete (or indeed truncate) the rows on the target manually (scripted of course). You can possibly even do this on the fly when you see it is 'stuck', to release it out of it's misery and those tablescans will still happen but the table will be empty. Much easier :-).

Hein.

guilherme-matte
Partner - Creator
Partner - Creator
Author

Perfectly put, I've just checked today the isolated table and it indeed seem to be happening a slow delete with the PK in place. It takes about 3h to scan and delete the 30kk rows. Im going to tweak a bit with the configs and maybe schedule a reload for those DELETE/INSERT tables from the Task, maybe schedule a script as you proposed!

Any way, this issue seem to be resolved. Thank you for the help and inputs on the matter.

Kind Regards!