7 Replies Latest reply: Mar 9, 2010 2:01 PM by Corrado Lorefice RSS

    Enterprose level scale (10,000+ reports)

    alexdorman

       

      I need a recommendation how to build report processing system that would scale to 10,000+ account reports. The requirements are following: there are 10,000 account with history going back 1 year. An average each account has 400,000 historic records but some accounts have as many as 20mln records. Reports are accessed through our Web portal using AJAX client to QV. Data in reports should be relatively fresh, so the report should be no older than 1 hour. All reports share the same design. Many new accounts are created on daily basis.

      What we have done so far:

      Data Preparation: segment all data to AccountId_Day text files. Convert each to QVD using QVE.EXE /vAccountId=12345. Combine all but last 2 days to AccountId_historic.QVD. Generate AccountId.QVW using QVE.EXE based on some Master.QVW that has the layout,

      LOAD AccountId_historic.QVD,

      LOAD AccountId_todayay-1.QVD

      LOAD AccountId_todaya.QVD

      Every time any of the source text files has changed, we regenerate daily AccountId_Day.QVD, AccountId_Historic.QVD, AccountId.QVW. Depending on application this is done either by automatically or by request from portal/user.

      Now, while it does work, this solution is not stable. We have it in production just for few of our customers and at least twice a week we have to manually intervene to "push" the flow or reload the box. Issues:

      • Does not matter what flags we use to suppress error message in QVE.EXE, it still in certain scenarios shows modal popup dialog box. LOAD FAILED. And QVE.EXE process becomes frozen.
      • Any small temporary blip in network - QVE is failing with no good message why. In other cases we have half-baked QVD files with not all the data processed
      • We ended up building our processing framework to ensure that Qlikview boxes are evenly loaded, bit not overloaded. We really don't want to be in this business
      • We would love to use QV Publisher to manage this and trigger reloads using EDX but we (and Qliktech so far) was not able to show us how to use QVPublisher to refresh a QVW without creating a task for each individual QVW. (10,000, new accounts are added every day through our portal app). We cannot pass a variable to EDX that would specify AccountId the way we do it with QVE.EXE. Also EDX is an asynchronous service, I cannot get success/error and error code using it.
      • Most of the processing time (90+%) goes to TXT->QVD conversion. If there would be an utility just to convert text file to QVD, we could have offloaded this simple process to a cluster of machines and process it much faster without dealing with QVE.EXE or QVPublisher

       

      We did try to get help from Qliktech on different levels, but 18 months later no much progress. I would love to hear from other people who either run into the same architectural issues or, even better, were able to resolve them.

       

      thanks

      Alex Dorman

      ContextWeb

       

       

        • Enterprose level scale (10,000+ reports)

          Hello Alex,

          As you will understand it is hard for me to give you 100% interesting advises when I'm not in front of your system.

          However there are different things I can tell you:

          - QV.exe is less robust than publisher, That means in some cases, it freeze or crash. And sometimes the server end the day with Qv.exe in memory.

          - If you need to run as many QVW as you have IDs ( for example account IDs or customers id, or anything else) you can use the Loop process in Publisher to create automatically as many applications you need. Besides you can reduce the data for each of them in order to gain memory, size and access time for users.

          - Regarding EDX, yes it is asynchroneous, however I made an application that can run an EDX transaction and wait for the result. Then it take the result and write it in a log. ( Success, Failed, unknown). This is working fine. But you need to implement it. ( This script is developped for 8.5 for the timle being).

          - Regargind the different QVD, you creates it seems a good approach, but you need also ton think on Incremental updates, aggregate your data, etc... There should have a solution not too complicated that should exist for your needs.

           

          Finally I would advise that you ask for an external eye, or maybe an external consultant to give you some advises, and tell you what is wrong.

          I hope my solutions can help you.

          Any way, feel free to contact me if I can help in more details.

          Rgds,

          Sebastien.

           

           

            • Enterprose level scale (10,000+ reports)
              alexdorman

              Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Thank you Sebastien,


              LOOP might help somewhat. The problem that I see with this approach - it would require to refresh all accounts (10,000+). Let's do a quick back-of-the-envelope calculations. Assuming it takes 10 seconds per average account, it would take 28 CPU hours to process all 10,000. This is where we need to be able to reload just accounts with changed data. EDX with accountId as a variable would do the trick; I would be able to reload only accounts that changed in last hour.


              Also, correct me if I am wrong, all the discussion above assumes that I have one QVW and LOOP would distribute one QVW to list of recipients with optional reduction of data. Let's do the math to see if we can fit all my data in one QVW: on average 400,000 records per account, 10,000 accounts. 4bln records. I don't think it would fit into memory.


              Cheers
              Alex

                • Enterprose level scale (10,000+ reports)

                  You're right loop is done on one document containing everything.

                  I see different options:

                  1) Optimize the reload of your data by creating already aggregated data in qvds...

                  2) Maybe create one qvd per kind of accounts and per day in order to keep them splitted and be able to clean easily some of them.

                  3) Customize the EDX reload by adding an account ID variable that would be located in an updated external file when needed. In this case, you could keep the loop feature based on this account ( because only the accouns specified in the external file or database would be used).

                  I think this last point should not be hard to do. You just have to create a process that will update a file with account ID to refresh, then store it in a file ( using a simple QV.Exe + macro (8.5) or store the variables using Store feature in 9.0).

                  Then your normal QV application will be reload by EDX or automatic publisher in order to load all accounts specified in the external file.

                   

                  Do you think it should be efficient ?

                  Rgds,

                  Sébastien

                   

                    • Enterprose level scale (10,000+ reports)
                      alexdorman

                      While this might work, i will end up again writing my own QVD/QVW re-processing framework.It would not be efficient because i have to compromise between reprocessing only what has changed vs. having too many small entities (qvds, qvw) to manage with no direct way to request QVPublisher to reprocess X and only X.

                      I am not clear on 3) proposal. it sounds like I still have to bundle data for multiple accounts into 1 QVW and will be able to publish an individual account out of it.

                      Basically it goes back to issue with scale.

                       

                      Thanks

                      alex

                       

                        • Enterprose level scale (10,000+ reports)

                          Using the third point publisher will be able to open the qvd regarding a particular account.

                          the qvw opened by the publisher will load the external file then take the account number to reload.

                          Then it will load only the qvd referenceing the account number ( even if you have a lot of qvd, it is not a problem as far as I know).

                          Using a Loop will only be here to specify the name of your Qvw file to display in the portal.

                          It seems for me that this idea is not toos bad, but mayber there are some informations I don't know that could prevent it to work...

                           

                            • Enterprose level scale (10,000+ reports)
                              alexdorman

                              Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

                              I just got some good news from Qliktech. It would make our life easier once released.

                               

                              "I have just found out that QlikView 9 SR2 has the facility to take a parameter in the EDX request that can be used as a QlikView variable. This could be one of the key pieces that we have been missing. Of course we will want to prove it out in production. Please note that the QV version 9 way of calling an EDX job is quite different from prior versions. Here is the info from the reference manual. I'm working on getting you more detailed info."

                               

                               

                              26.2 Triggering EDX Enabled Tasks

                              In order to start the tasks that have an external event as trigger, you must make two

                              POST type request calls to the QlikView Distribution Service that has been assigned

                              the task.

                              The body of the first request call must contain the following:

                              <Global method="GetTimeLimitedRequestKey" />

                              The reply will contain the following important entry:

                              <GetTimeLimitedRequestKey>

                              <GetTimeLimitedRequestKeyResult>

                              zLavfNlancWoyhACGlpaE5sWOy8kicLa</GetTimeLimitedRequest-

                              KeyResult>

                              </GetTimeLimitedRequestKey>

                              The value of the entry GetTimeLimitedRequestKey is then used for the next

                              request:

                              <Global method="RequestEDX"

                              key="zLavfNlancWoyhACGlpaE5sWOy8kicLa">

                              <i_TaskIDOrTaskName>MyTask</i_TaskIDOrTaskName>

                              <i_Password>MyPassword</i_Password>

                              <i_VariableName />

                              <i_VariableValueList />

                              </Global>

                              The status of the task is returned in xml format.

                              The attributes in the second request are:

                              TasIDOrTaskName - The name or ID of the task you want to start

                              i_Password - The password you set when you created the trigger. If no password

                              was set the attribute must still be included but can be left empty.

                              i_VariableName - The name of the variable you wish to change. The attribute may

                              be left empty.

                              i_VariableValueList - The values you want to assign the variable. The variables

                              are entered according to the same pattern as in QEMC. The attribute may be left

                              empty.