- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Qlik Replicate: Implementing UPSERT and MERGE modes by applying a Conflicts Handling Policy
This video demonstrates how to implement UPSERT and MERGE concept in Qlik Replicate by applying Conflicts Handling Policy settings to tasks.
Transcript
ello my name is Michael Litz i work in the Qlik qdi support group today i want to talk about upsert merge which is located in the error handling apply conflicts section of the task four key points i want to go over today are the first one default setting is to log the record to the exceptions table so unless you enable upsert merge all we're going to do is log a record an error if you will to the exceptions table that'll be trying to insert a record and there's a duplicate key on the target we'll log that record to the exceptions table or if you're doing an update and there's no record found we'll log that record to the exceptions table and the task log file will show a warning that there were unprocessed changes the second key point is when upsert merge is enabled it will essentially update a duplicate or insert a missing record so if there's a duplicate key when applying an insert and you're in upsert merge mode you've selected and told the task to update the existing target record so that insert won't fail you won't get an error message you will just flip that insert statement to be an update and you'll update the existing record essentially leaving in place then if there's no record found for applying an update instead of throwing an error the task will switch the update statement to be an insert statement and will insert that record into the target our third key point is change processing mode there are two ways tasks run either in batch apply or transactional apply and the way up certain merge works is radically different depending on which of these modes you're in in batch apply mode the way the upcert merge works it's an unconditional delete and insert of every record in the batch so the task will not try to do an insert and see if there's a primary key or duplicate key out there it'll just delete everything and insert everything and even with an update it won't try to update a record and see that it fails because there's no record in the target it will just delete everything in the batch and then change all the updates to inserts essentially when you're in batch apply mode you never get an update on a record all you get is delete and then insert of a record so it's a brand new record as far as the target's concerned and fourth key concept is when you're in the transaction apply mode upset merge works more like try and catch so what's nice about this is an update statement will actually be issued to the target and if there is a record that record will be updated if there's no record the update statement gets switched to an insert and that record is now inserted where it was missing same sort of thing with an insert statement it'll try to do the insert if there's no primary key error then the insert happens like normal if there is a primary key error a record already exists with the primary key that you're trying to insert again then the task will switch that insert statement to be an update statement and it will update the existing record essentially leaving it in place but updating it with the new information from the insert so what i'd like to do now is switch over show you the database i'm going to be working with and then run through three scenarios of upsert merge one of them in default mode second scenario will be in batch apply mode and the third scenario will be running in transactional apply and we'll look at the results we get in the target database for each and every one of these scenarios okay so let's have a look at our source database and our customer table which i'm going to be using in my examples today it's pretty simple table has cusk key customer name and there are five records the target table has got two additional fields an update time and a create date these two fields are going to make it easier to demonstrate the differences when a task is running in transactional mode or batch mode and has upset merge enabled remember it's significantly different the way the task implements that functionality for the create date i've decided to put in a default value right in the target database it's a function called getdate and this function will only run when a record is inserted into the target the updatetime field is in the target database but i'll populate it using a transformation inside the task which i'll be showing you shortly and the last table on the target that we're going to be looking at is the apply exceptions table because when i run my first pass in default mode where i log errors or exceptions i'm going to do an insert that will fail and i'll do an update that'll fail and we'll take a look at those logged records in the apply exceptions table so let's go have a look over at the task okay here we are back in the task i've called it upsert merge i'll open it up and we can review the settings it's sequel to sql i want to go into task settings i want to get down to error handling and i want to look at apply conflicts this is where we would enable upset merge the default if you remember is to just log record to the exception table and log record to the exception table so our default is to actually alert or warn about the errors when we try to apply an insert and there's a duplicate key or we try to apply an update and there is no record found out there i'll say okay to this the other thing i just want to briefly touch on is the customer table you'll remember i mentioned that in the update time i decided to populate that field from inside of a transformation i'm using a operation indicator and this middle parameter is where you would set what you want to get written to the field when the operation is an update and you can see that if i just load this up over there you can see the first parameter is value on a delete second is value on update and the last one is value on insert so what i've got on value on update is to put a date time out there so we know when the record was updated and remember the create date that's set with the default value out in the actual target database all right so i'll say okay to this i'll say okay to this i'll save my changes and now i'm going to go reload my target and that completed really quick as you remember there's only five rows in the source so i want to go back over the database and just take a quick look at our source and target and then we'll apply some updates to rows that do not exist and some inserts to rows that already do exist that'll generate an error in the log and we'll also be able to look at the error messages in the apply exceptions table okay just to review on our source we had five records we just ran a full load so on our target we should now have the same five records and we've got our create date that triggered showing us they're all at the same create date and nothing's in this exceptions table we haven't done anything wrong yet no bad records were tried to be applied so give me a minute and i'm going to mismatch the source and the target so that we end up with the ability to insert a row that already exists and then update a row that doesn't exist and we'll see all the test behaves when it's in a default log records only mode let me go back and make sure my task is stopped very good so first thing i want to do is i'm going to delete my fifth row on my source then i'm going to come out to my target
and i'm going to delete my second row and i'll delete my fourth row
at this point i'm going to go back into the task i'm going to get the task up and running i'm going to do an advanced run option let it know that the tables are already loaded i'm going to basically skip over the t log events that were just recorded on the source because i don't want to play those i want to leave my database in a mismatched fashion so i'll say okay to the run it's starting up and now i'll go back out to the source database and the first thing i'm going to do is insert record number five which is not on the source but it is on the target and i'll put in the words um number five and mike and then i want to go over to record number two which is not in the target remember i deleted it and i'm going to change this record and next thing we're going to do is take a look at what we've got in the target i expect there to be actually no changes i've got record one three and five so i tried to insert a five i didn't change the record i didn't do anything to five i generated an error in the task we'll see that in a minute and then i tried to update record number two but there was no number two out here so the update failed so let's go have a quick look at what we see in the task and you'll notice over here in the log messages that we got just innocuous little informational warning source changes that would have had no impact were not applied to the target database refer to the att rep apply exceptions table for details so let's go back out to that database and let's look at this att rep apply exceptions table and we're going to see two errors out here that got logged the first error was that insert remember i tried to insert row number five and there's the insert into statement the value number five i tried to change ted to mike and this failed the task was set to log errors so it logged this record in the apply exceptions table and the same sort of thing happened with our update well before i do that i would like to show the actual sql error message for our insert and you can see here that in this sql it's a native error passed back to the task it says there is a violation of primary key constraint remember record 5 already existed in the target didn't exist in the source and it said it cannot insert duplicate key in the object so that all makes total sense and that's because we're logging the errors and if we look at the error code for the update statement it just said zero rows are affected because there wasn't a record out there nothing could be found the update statement affected nothing we got the zero rows affected so that's what to expect and what will happen when your task is set to default mode so what i'd like to do now is go back over to the task and set it up so that i've enabled upsert merge and i'm going to set it up to run in batch mode first and uh let's go take a look at that okay i'm back in the task and what i want to do right now is enable upset merge so once again i go back into error handling apply conflicts and i'm going to set the log record to update and i'm going to set this one to insert those two settings are what enable upset merge to work so now when i insert a row that already exists i'll actually be updating the row when i update a row that doesn't exist i'll be inserting the row and one other thing i want to show you here is under the same task settings if we go to change processing and we look at change processing tuning you'll see in this case i'm working under batch optimized apply remember this is the setting that tells replicate to go out and delete any records that are being updated or inserted any records in the batch delete them and then just do an insert and we'll see how that works out by looking at a create dates in the target so i'll say okay to this i'll say save here and i want to run the task in reload mode or full load and i'm going to repopulate so my targets are identical and now i'll stop my task and we'll go over and take a look at the source and target database and we'll apply update and inserts that will not fail this time they will actually translate to a delete insert and a delete update pair alrighty and back in our database we have the same five rows in the source which i just ran a full load so these same five pros will be over in the target so what i want to do right now is create the same mismatch that i had before so i can demonstrate the same insert and the same update statements and we'll be able to look at the target and see what happened during those updates and inserts because remember now we're in upsert mode running under batch apply okay so what i've done now is i've gone in and i've created mismatches like i did before on the source i deleted record number five and out on the target i deleted records two and four so we're in that same mismatch state record five already exists i'm going to try to insert it and we'll get a demonstration of how upset merge works i'm going to also try to update record number two and i'm going to update record number three and we'll like to look at the correct dates and the update times as these go through so first off let me do the insert of an existing row i'll insert five call it mike i'm going to update amy to be holly and remember that row doesn't exist in the target so that should be inserted and i'm on update three which does exist and i'm going to update three so that you can see what happens when a record does exist and the update goes through normal so if we go out and execute over here we do a refresh we'll notice that record number two that did not exist before now says holly remember over on the source i change it to holly and you'll notice that the create date is different and there is an update time and the reason that is remember in batch mode we do a delete and an insert but since it was an update statement coming in we we populated this in the task the update time but we did a brand new insert of the record that's the key point i'm trying to illustrate here this record was not just updated this record was created on record number three we change it to john you'll notice there's an update time and you'll notice that this record even though he already existed was also created freshly and then obviously with record number five it was a primary key violation it should have failed but in ups merge mode it didn't fail and what we got there was a brand new record created with the new value so let's configure the test for the final run upsert merge in transactional apply mode we'll want to go to task settings under change processing change process tuning and we're going to switch from batch optimized apply to transaction apply i'll save that and i'll start up my task
all right the task is running let's go over to the database side and we'll make a couple of changes and we'll see the results in our target table all right let's have one more look at our mismatch database on the source side i have got rows one two three and four and i do not have a row five on my target side i only have rows one three and five so just like before i'm going to do an insert for row five and it already exists on the target but that's okay we're in upset merge mode so what will happen now is this record will be actually updated transaction mode if you remember works a little differently with upsert merge we don't just do a unconditional delete and insert we actually do a try catch we look for the record if we see the record then we switch the statement to whatever it has to be switched to in this case an insert will get switched to an update and now let's go out and update the missing row in the target and i'll update an existing row in the target
okay so there are updates let's go have a quick look in our target so here's the key thing to look at we updated holly this row this second row didn't exist so the task came out and it attempted to update a row that didn't exist and because it didn't exist it had to create it so that makes sense we have a different create date but because we're in transactional apply mode upsert merge doesn't just do a delete followed by an insert so when we updated row 3 we changed the name to Henry it did not recreate the record it left the existing record once again this is a significant difference in upsert merge between batch apply and transactional mode patch apply unconditional delete and insert transaction apply we do a try catch and on the insert of what should have been a duplicate row because five already existed out in the target it didn't go ahead and delete it and recreate it it just did an update to it so we still have the same create date i hope that clarifies the difference between patch apply and transactional mode when you're running in upsert merge configuration thank you very much!