Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Folks
I have a requirement to avoid dirty reads of a database upon reload, we have and ETL process which is being fired off a trigger, this will update a flag that will force a series of QVW refreshes. At the start of the QVW refresh the flag will be reset and it is entirely possible that the ETL process will kick off again as a result of new transactions in the D/B.
This means that it is possible the D/B will be updated during the QVW reload process, and therefore, dirty reads are possible.
Does anybody know if there is a way to take a D/B snapshot inside the qvw script prior to launching the load.
Thanks for any help,
So far as I know, the only way to get a clean read is via your database management system. You have to tell the DBMS itself that you want a clean read. It should then take the appropriate locks on the data, or make a separate copy and iteratively process updates that occurred while it was reading, or however the DBMS wants to handle it.
In DB2, for instance, I believe you have to specify an isolation level of RR for "repeatable read" to get a clean read. We're doing dirty reads, and use an isolation level of CS for "cursor stability".
Actually, there may be a way to get a "clean-enough" read even when technically doing dirty reads. In one of my systems (not QlikView), I MUST have a complete and consistent set of data. I NEED a clean snapshot. Yet by policy, we NEVER use repeatable read. I'm sure I could have convinced someone that this was an exception to the policy, but instead, I read in the data iteratively. I read all of the data with a dirty read. Then I looked for data changes since the cursor was opened. I reprocessed everything that was affected by these changes. Then I looked for data changes since THAT cursor was opened. This process continued until I detected that NO data had changed during the most recent iteration. I then know that my internal data set is a clean snapshot of the real database. Theoretically, this process may never complete. In practice, in this particular program, three or four iterations is typical, each iteration much faster than the last.
I would think you could do the same thing with an iterative loop in QlikView. Do a dirty read, then keep processing changes until you detect that nothing has changed since the last round of changes. Your database, of course, has to be set up in a way that you can detect changes. Typically, our databases have a last update timestamp on them. So I would read in all records with a timestamp >= when I opened the previous cursor, replacing the records in my existing data set. If deletes are actual deletes rather than a status change, you could inner join back to all of the IDs in the database, but that might be slow enough as to risk never finishing the load. You could create a separate database of deleted IDs with a timestamp and an index on the timestamp, and then purge after a couple weeks, but that would be a lot of trouble. Hopefully you don't have deletes.
Or maybe I'm way off base, or someone has a better idea.
If the source is a transaction processing system, enforcing a lock in the DB level during ETL load may not be practical. So I also feel that the ideal way is to handle with QV logic. I'm adopting two strategies to minimize the impact on this:
1. Refresh the most active fact tables during the best off-peak time slot (mostly in the early morining nightly schedule)
2. After the full & Incremental loads, I'm triggering the incremental load once again (my incremental loads are basd on time stamps in the source transaction system). So fall out percent is very minimal or negligible bcause the second partial load finishes the fetch from DB in few seconds against the first incremental load which used to take minutes.
One thing, I am compromising on deletions during incremental load to compensate the performance overhead.
--Arun
Thanks to both of you for the responses, I am using SQL Server as a source for this so I should be able to set the Isolation Level to Read Committed, my problem I suppose is where would I do this in the script, can I simply place a command at the beginning of the load script?
Cheers,
Well, I don't know exactly how you would set the isolation level in the script, or even if you can. You would probably need to figure out what SQL command you could use to set the isolation level, then execute it like any other SQL statement in the script. Or maybe it's something you set in the ODBC connection, or something you set in the DBMS more directly. I'm afraid I don't know.
Still, perhaps you would be better off doing it entirely inside of QlikView as Arun suggests due to the database locking issues. SQL Server may well accomplish the clean read by locking everyone else out of the database while it reads the data. That seems unlikely to be acceptable. So one solution is to do an incremental reload in a loop until nothing new is read or some maximum loop count is reached (at which point you error out). Arun is just doing two incremental loads in a row, which for his data apparently is apparently enough.
Compromising on deletions is probably common. I don't think we have anything like that at this point, if we ever did, but I do remember considering processing deletes on only a weekly basis for some data where we processed adds/changes daily.
Hi John
Thanks again for the very detailed response.
I found a way to set the isolation level from within QVW, not sure if this applies to other DBMS but in SQL Server we have a standard stored procedure which allows us to run ad-hoc code through the command line, so adding the following line(s) should work for a single script:
SQL exec sp_executesql 'SET TRANSACTION ISOLATION LEVEL SNAPSHOT'
SQL exec sp_executesql 'BEGIN TRANSACTION'
And then at the end of the load script, add the following command:
SQL exec sp_executesql 'COMMIT TRANSACTION'
This method (where SQL Server is the source) would work where the load script is contained in a single QVW/QVS file, however, upon the completion of our warehouse refresh we have 22 (and increasing) QVW load scripts to run, the above solution would not work in this case simply because each one would be dealt with on its own, not as a group.
So, the solution for us this time is to ignore available QVW functionality and use programming outside to ensure there are no dirty reads, the warehouse refresh will set a QVW Refresh flag that will kick off the QVW refresh process, when the QVW refresh process starts it will reset this flag but will set another flag to say it is running. The warehouse refresh will not be allowed to refresh again until the QVW refresh process has completed.
A simple solution, but still, one that we wanted to avoid if possible.
PS: Deletions is not an issue for us as the application does not allow deletions of any kind!
Anyway, thanks to all for the contribution here, as always its much appreciated.
Try the same method to set a three state Flag, Start, In Progress, Completed. You can have a seperate ETL Status Table that contains the status for all Objects in the ETL Process and you could use this to manage ETL Dependancies. The Qvs in the "Working set" could also read this Table.
I will need a similar approach for Oracle apparently
Colin R
In DB2 you could add with UR (or what isolation level you need). You could set the isolation level at the DBM level as well.
Hi John,
Quick question. How do you set the isolation level inside the DB2 connection? Instead of having to add WITH UR to every select statement you can apparently set this in your connection string. Do you perhaps have an idea how to do this?
Nigel,
If you're using SQL Server you should look into using the "Database snapshot" feature of the product. It takes a snapshot of the entire database at a point in time, and the snapshot is guaranteed to not change even if the underlying database is changing.
http://msdn.microsoft.com/en-us/library/ms175158.aspx
Gary