Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm looking for some data handling strategy feedback.
Scenario:
I have regional data (same format) sitting on 5 separate servers. Each region has 1 month of log data, made available in Hive. The data volume is huge, so I need to:
(1) transform/aggregate it during load and
(2) store the aggregated content for up to a year.
Current Status:
I have a working incremental load script (using a qvd file) for one region (one ODBC connection).
Challenge:
Because loads from each region can fail independently, I would like to keep the regional data in separate qvd files, so that each can be corrected/updated on the subsequent incremental load execution. This means that for EACH connection/region I have to track start/end dates for both qvd file and the current hive load.
...I'm assuming I would have to edit [date] variable names so they're different for each connection e.g. vHiveLoadStartDateRegionA, vHiveLoadEndDateRegionA, vHiveLoadStartDateRegionB, vHiveLoadEndDateRegionB, etc. (I understand QV does not have a method of restricting variable scope).
Question:
What's the best way to handle this?
Should I have 5 copies of the same connection script but each with different connection, file, and variable names?
Should I apply some sort of a loop, where the connection, file, and variable names are auto-generated on each iteration?
Regardless of the strategy, what's the best way to merge the regional data for QV visualization, once incremental loads are done?
Thanks,
J.
How about :
I create discrete qvd's for each region, each having a column for [Region]. It is done in a loop using a spreadsheet holding the Region, Connection String etc.
Then in the dashboard qvw I concatenate them all together.
Use a loop as you are thinking. You can maintain the Region and Connection string in a table, inline or external as Bill suggested.
However, I wouldn't try to keep track of start/end range with variables, as the variables are in the QVW and that doesn't reliably match the QVD. Instead get the start/end by looking at the QVD each time. Here's a fast performing method to manage that:
Super Fast Method to Retrieve QVD High Value | Qlikview Cookbook
Thanks bill.markham & rwunderlich,
I will commence spreadsheet / temp table experimentation tomorrow.
I expect variable management will be the biggest headache...so I'll be back here crying for help in no time!
For some added context, I attached my single-connection code.
J.
If I understand, you have your source data on 5 different server (Hive).
You can reduce the load window starting 5 parallel task; the the last task (6th) will synchronize (wait all the regional) and merge the qvd for the visualization.
Running the jobs asynchronously would be preferred, but based on what I can find in the QV docs, an open connection is automatically disconnected when a new 'connect' statement is encountered (from the help doc on the 'disconnect' keyword). Is there a way to force parallel execution with multiple connections open simultaneously?
You're right if you think of one .qvw.
But you can use, in server env, 5 parallel task (5 different .qvw loading data in parallel).
You can run tasks [aka jobs] in parallel, assuming you have enough server grunt to handle the load.
You could still have a single qvw for the loads with a parameter for Region so it picks up the connection string etc.. from your connection details spreadsheet / table for the row for that Region.
For keeping track of start / end dates I store them all in a qvd, concatenating on extra rows when needed. This qvd can then be read and variables set to the values stored in it.
I think I can wrap my head around the multi qvw implementation maxgro suggested (assuming I can set task dependencies in QV when scheduling the final/aggregate qvw), but I haven't quite figured out how a parametrized solution will help overcome the sequential nature of ODBC connect statements. Any chance you can manufacture some high-level pseudo code to demo the flow?
(Keep in mind this is my 2nd QV dashboard, with the 1st one dating back about 2 years, so I may at times be missing the obvious).
Is their just a single ODBC connection for each of your Regions ?
And is this different for each Region ?