6 Replies Latest reply: Jun 11, 2014 6:20 AM by Alex Gunawan RSS

    Real-time analysis of new incoming transactions with DD

    Christof Schwarz

      An opening statement first: QlikView's Direct Discovery was initally made as one possible way to address Big Data by connecting to its storages. The aspect of real-time analysis was not the initial driver, so some basic assumptions were made. This article is about taking the current Direct Discovery feature behind it's concept and hence is a bit unorthodox. As newer versions will come out (>=QV11.2 SR6) i will revisit this article.

       

      Direct Discovery 2.0 (this is the redesigned Direct Discovery since QV 11.2 SR5 only!) is separating fields from an external SQL table into DIMENSION and MEASURE fields. The help file explains, that QlikView will read the distinct values of dimension fields only(!) at the execution of the reload, NOT later. This means, if new dimension members will arrive after the reload, those will not show up.

       

      This is good for new fact rows to arrive for already known dimensions, like a known MachineID and some incoming machine information. But I like to achieve more than this with Direct Discovery:

       

      Let's assume new orders are coming in every other minute and I like to see those new transactions in QV in real-time (whenever I write about "new" orders I refer to such orders which have been created in the sql source table after the last QV reload; those are not in the memory data model). Yes, I need the unique OrderID as an dimension to work with it. Impossible or not? Well, I wouldn't have started this article, if there was no possibility ;-). Here are a number of steps required to get this work, but finally it will fly!

       

      ═►download "Endresult_Presentation_5min.swf" for a Flash-Video showing the final result of this thread in action

       

      Prerequisite: One prerequisite at this point is that the new OrderIDs must be able to be "guessed". This is the case, when the OrderID is a unique incremental number, so that you know, if the latest OrderID for now is 1024, the next one will be 1025 and so on. If the ID of a new Order cannot be "guessed", I am afraid, then this article cannot help (however, the unique ID could be a new ID column with an increment. It does not need to be the Order ID from the system itself ...)

       

      This is my simple order table "SalesOrderTable" on a SQL Server. OrderID is an auto-incremental key column. The CreationDate is system-filled with the timestamp when the record was inserted. CustomerID, ProductID are foreign keys.

      Clipboard01.gif

      Command to create this table on SQL Server is

      CREATE TABLE SalesOrderTable

      (

          OrderID bigint IDENTITY(1000,1) NOT NULL,

          CreationDate datetime NOT NULL,

          CustomerID int NOT NULL,

          ProductID int NOT NULL,

          OrderValue money NULL,

      CONSTRAINT PK_SalesOrderTable PRIMARY KEY CLUSTERED

           ( OrderID ASC ) ON [PRIMARY]

      )

       

      ALTER TABLE SalesOrderTable

      ADD CONSTRAINT DF_SalesOrderTable_CreationDate

      DEFAULT (getdate()) FOR CreationDate

       

       

       

      Step 1: Get a text message about new orders and list all orders

       

      The approach seemed clear to me, since I just want to count the new orders, I will load all fields (even the OrderID) as MEASURE fields, but Direct Discovery requires at least one DIMENSION field. I created a View "vw_SalesOrderTable" to show all columns plus one calulated column called "AllRecords", which always shows 1.

       

      CREATE VIEW vw_SalesOrderTable AS

      SELECT 1 AS AllRecords, * FROM SalesOrderTable

       

      Now I load that view with DIRECT SELECT command and my only dimension is the AllRecords column. By setting a filter "1" on that field later in the GUI, I will automatically refer to every order in the view.

       

      The data model has two separated tables, the memory orders and the DirectTable orders.

      Clipboard01.gif

      The script sets the Direct Discovery cache to expire every second with that line:

      SET DirectCacheSeconds = 1;

       

      ═►download "DD Show New Orders-0.qvw" for this status of the game

       

      There are only two objects, which can deal with new orders like this: A textbox and a table-object. A chart-object would not!

      • I created a table object, which contains the colum "AllRecords"
      • I created a button with the label "Refresh" which toggles the selection of value "1" in field "AllRecords" (=sets and removes 1 from that field)
      • I created a text object which displays a message as soon as (after clicking Refresh) new orders where found. Formula is

      =If(Count(sql.OrderID) > Count(mem.OrderID), Count(sql.OrderID) - Count(mem.OrderID) & ' new order(s) found since last reload')

       

       

      Clearly, this was not good enough yet. I wanted to select with one click in a field the new orders or the orders found in memory.

       

      Step 2: Get a dimension "IsLoadedInQV" as selection box

       

      The idea is like follows:

      • After loading all Orders into memory I will write back the creation timestamp of the "youngest" order (the one with the highest OrderID) into a separate SQL table "Last_QV_Load"
      • The SQL view, which is subject to the Direct Query, will get another calculated column, which will print "Mem" in case the record  creation timestamp is equal or lower than the saved timestamp from the table "Last_QV_Load", otherwise it will print "New". In fact, not QlikView can tell the "new" from the "mem" Orders, the view itself does!
      • During reload, I will already add the value "New" to a memory table in QV, although at that very moment, there is no "New" in the view yet. But there will be, as soon as more orders will be created after the reload (again: "guessing" a dimension value in advance)

       

      In this step I need two new objects in SQL: A new table "Last_QV_Load" and a stored procedure "sp_Set_QV_LoadTime":

       

      CREATE TABLE Last_QV_Load

      (

          TableName nvarchar(50) NOT NULL,

          Last_Loaded_Record datetime NOT NULL,

          QV_Load_Time datetime NOT NULL,

          CONSTRAINT PK_Last_QV_Load PRIMARY KEY CLUSTERED

          ( TableName ASC    ) ON [PRIMARY]

      )

       

       

      CREATE PROCEDURE sp_Set_QV_LoadTime

          @TableName NVARCHAR(50)

          ,@Cutofftime DATETIME = NULL

      AS

      BEGIN

          DELETE FROM Last_QV_Load WHERE TableName = @TableName;

       

          INSERT INTO Last_QV_Load

          SELECT @TableName, @Cutofftime, GetDate();

      END

       

      I need to alter the view "vw_SalesOrderTable" as well and add the new column:

       

      CREATE VIEW vw_SalesOrderTable AS

      SELECT

          1 AS AllRecords

          ,CASE WHEN o.CreationDate <= l.Last_Loaded_Record THEN 'Mem' ELSE 'New' END AS IsLoadedInQV

          ,o.*

      FROM

          SalesOrderTable AS o

      LEFT OUTER JOIN

          Last_QV_Load AS l

          ON l.TableName = 'SalesOrderTable'

       

       

      The load script is creating a data model as follows. If you like to see the Load Script,

      ═►download "DD Show New Orders-1.qvw"

       

      My frontend has now a new field "%IsLoadedInQV" with two values "Mem" and "New" which I can click to see the right Orders in the table object. But please note: You could not use a chart object to display orders here, because there is no dimensionality on the OrderID (this is a measure right now) ...!

       

       

      Step 3: Dynamically add new OrderIDs to a listbox and create a chart object


      The final steps are to link the two data islands via the common OrderID and to "guess" some future OrderIDs already in the script. As written above already, it is necessary to do that in the script at load-time. The ideal situation is an auto-incremental OrderID field, so you will easily know the number range of the next IDs.

       

      Yes, Direct Discovery will be able to link the next new orders to the in-memory model as they come. BUT ... there will only one 'disturbance' in the scene: The QlikView native caching of in-memory queries. We turned off the SQL cache, yes, but the new OrderIDs (the ones arriving after the reload) are in memory, not on SQL. And the memory cache can NOT be set to expire! It can only be invalidated by creating new and new artificial selection combinations, so that QlikView never can use that cache (this is the job of table "InvalidateCache" in the below model). For the current Service Release (11.2 SR5), this is my "clean" workaround (not mentioning dirty ways of disabling the cache with specific internal settings)

      The field InvalidateCacheToken contains for each user some hundred of "tokens". Each time, the user clicks on "Refresh" button in the frontend, the next token will be selected and QlikView will (re)start both, the in-memory and the SQL queries to find out about the new situation.

       

      By the way, I don't like to show the field "%OrderID" in a listbox in the frontend, because this will show not only existing OrderIDs but also the next 1000 (or so) "guessed" OrderIDs. I only want to see such %OrderIDs in my listbox which are at the very point in time already existing in the SQL table (before or since the reload). The formula for a selection box to do so is this:

      =Aggr(Min(sql.OrderID), %OrderID)

       

       

       

      download "DD Show New Orders-Final.qvw" to see the load script and frontend

      ═►download "Endresult_Presentation_5min.swf" for a Flash-Video showing the final result of this article in action


      Oh, almost forgot to mention, that the view "vw_SalesOrderTable" got one more column, a copy of the OrderID called "OrderID_Dim". OrderID is loaded twice in the data model, as a MEASURE FIELD (as in previous steps) but also as a DIMENSION field in the DIRECT SELECT statement. Therefore, the view shows the OrderID twice, as "OrderID" and as "OrderID_Dim".

       

      CREATE VIEW vw_SalesOrderTable AS
      SELECT
          1 AS AllRecords
          ,CASE WHEN o.CreationDate <= l.Last_Loaded_Record THEN 'Mem' ELSE 'New' END AS IsLoadedInQV
          ,o.*
          ,o.OrderID AS OrderID_Dim
      FROM
          SalesOrderTable AS o
      LEFT OUTER JOIN
          Last_QV_Load AS l
          ON l.TableName = 'SalesOrderTable'