I need a recommendation how to build report processing system that would scale to 10,000+ account reports. The requirements are following: there are 10,000 account with history going back 1 year. An average each account has 400,000 historic records but some accounts have as many as 20mln records. Reports are accessed through our Web portal using AJAX client to QV. Data in reports should be relatively fresh, so the report should be no older than 1 hour. All reports share the same design. Many new accounts are created on daily basis.
What we have done so far:
Data Preparation: segment all data to AccountId_Day text files. Convert each to QVD using QVE.EXE /vAccountId=12345. Combine all but last 2 days to AccountId_historic.QVD. Generate AccountId.QVW using QVE.EXE based on some Master.QVW that has the layout,
Every time any of the source text files has changed, we regenerate daily AccountId_Day.QVD, AccountId_Historic.QVD, AccountId.QVW. Depending on application this is done either by automatically or by request from portal/user.
Now, while it does work, this solution is not stable. We have it in production just for few of our customers and at least twice a week we have to manually intervene to "push" the flow or reload the box. Issues:
We did try to get help from Qliktech on different levels, but 18 months later no much progress. I would love to hear from other people who either run into the same architectural issues or, even better, were able to resolve them.
As you will understand it is hard for me to give you 100% interesting advises when I'm not in front of your system.
However there are different things I can tell you:
- QV.exe is less robust than publisher, That means in some cases, it freeze or crash. And sometimes the server end the day with Qv.exe in memory.
- If you need to run as many QVW as you have IDs ( for example account IDs or customers id, or anything else) you can use the Loop process in Publisher to create automatically as many applications you need. Besides you can reduce the data for each of them in order to gain memory, size and access time for users.
- Regarding EDX, yes it is asynchroneous, however I made an application that can run an EDX transaction and wait for the result. Then it take the result and write it in a log. ( Success, Failed, unknown). This is working fine. But you need to implement it. ( This script is developped for 8.5 for the timle being).
- Regargind the different QVD, you creates it seems a good approach, but you need also ton think on Incremental updates, aggregate your data, etc... There should have a solution not too complicated that should exist for your needs.
Finally I would advise that you ask for an external eye, or maybe an external consultant to give you some advises, and tell you what is wrong.
I hope my solutions can help you.
Any way, feel free to contact me if I can help in more details.
Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Thank you Sebastien,
LOOP might help somewhat. The problem that I see with this approach - it would require to refresh all accounts (10,000+). Let's do a quick back-of-the-envelope calculations. Assuming it takes 10 seconds per average account, it would take 28 CPU hours to process all 10,000. This is where we need to be able to reload just accounts with changed data. EDX with accountId as a variable would do the trick; I would be able to reload only accounts that changed in last hour.
Also, correct me if I am wrong, all the discussion above assumes that I have one QVW and LOOP would distribute one QVW to list of recipients with optional reduction of data. Let's do the math to see if we can fit all my data in one QVW: on average 400,000 records per account, 10,000 accounts. 4bln records. I don't think it would fit into memory.
You're right loop is done on one document containing everything.
I see different options:
1) Optimize the reload of your data by creating already aggregated data in qvds...
2) Maybe create one qvd per kind of accounts and per day in order to keep them splitted and be able to clean easily some of them.
3) Customize the EDX reload by adding an account ID variable that would be located in an updated external file when needed. In this case, you could keep the loop feature based on this account ( because only the accouns specified in the external file or database would be used).
I think this last point should not be hard to do. You just have to create a process that will update a file with account ID to refresh, then store it in a file ( using a simple QV.Exe + macro (8.5) or store the variables using Store feature in 9.0).
Then your normal QV application will be reload by EDX or automatic publisher in order to load all accounts specified in the external file.
Do you think it should be efficient ?
While this might work, i will end up again writing my own QVD/QVW re-processing framework.It would not be efficient because i have to compromise between reprocessing only what has changed vs. having too many small entities (qvds, qvw) to manage with no direct way to request QVPublisher to reprocess X and only X.
I am not clear on 3) proposal. it sounds like I still have to bundle data for multiple accounts into 1 QVW and will be able to publish an individual account out of it.
Basically it goes back to issue with scale.
Using the third point publisher will be able to open the qvd regarding a particular account.
the qvw opened by the publisher will load the external file then take the account number to reload.
Then it will load only the qvd referenceing the account number ( even if you have a lot of qvd, it is not a problem as far as I know).
Using a Loop will only be here to specify the name of your Qvw file to display in the portal.
It seems for me that this idea is not toos bad, but mayber there are some informations I don't know that could prevent it to work...
Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4
I just got some good news from Qliktech. It would make our life easier once released.
"I have just found out that QlikView 9 SR2 has the facility to take a parameter in the EDX request that can be used as a QlikView variable. This could be one of the key pieces that we have been missing. Of course we will want to prove it out in production. Please note that the QV version 9 way of calling an EDX job is quite different from prior versions. Here is the info from the reference manual. I'm working on getting you more detailed info."
26.2 Triggering EDX Enabled Tasks
In order to start the tasks that have an external event as trigger, you must make two
POST type request calls to the QlikView Distribution Service that has been assigned
The body of the first request call must contain the following:
<Global method="GetTimeLimitedRequestKey" />
The reply will contain the following important entry:
The value of the entry GetTimeLimitedRequestKey is then used for the next
The status of the task is returned in xml format.
The attributes in the second request are:
TasIDOrTaskName - The name or ID of the task you want to start
i_Password - The password you set when you created the trigger. If no password
was set the attribute must still be included but can be left empty.
i_VariableName - The name of the variable you wish to change. The attribute may
be left empty.
i_VariableValueList - The values you want to assign the variable. The variables
are entered according to the same pattern as in QEMC. The attribute may be left