Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
jwaligora
Creator II
Creator II

Incremental loading and merging of data from multiple connections

Hi,

I'm looking for some data handling strategy feedback.

Scenario:

I have regional data (same format) sitting on 5 separate servers. Each region has 1 month of log data, made available in Hive. The data volume is huge, so I need to:

(1) transform/aggregate it during load and

(2) store the aggregated content for up to a year.

Current Status:

I have a working incremental load script (using a qvd file) for one region (one ODBC connection).

Challenge:

Because loads from each region can fail independently, I would like to keep the regional data in separate qvd files, so that each can be corrected/updated on the subsequent incremental load execution. This means that for EACH connection/region I have to track start/end dates for both qvd file and the current hive load.

...I'm assuming I would have to edit [date] variable names so they're different for each connection e.g. vHiveLoadStartDateRegionA, vHiveLoadEndDateRegionA, vHiveLoadStartDateRegionB, vHiveLoadEndDateRegionB, etc. (I understand QV does not have a method of restricting variable scope).

Question:

What's the best way to handle this?

Should I have 5 copies of the same connection script but each with different connection, file, and variable names?

Should I apply some sort of a loop, where the connection, file, and variable names are auto-generated on each iteration?

Regardless of the strategy, what's the best way to merge the regional data for QV visualization, once incremental loads are done?

Thanks,

J.

1 Solution

Accepted Solutions
Anonymous
Not applicable

How about :

  • Input parameter of Region to a single qvw

  • Spreadsheet with 1 row per Region
    • 1st column is Region
    • 2nd column is the ODBC connection string for the region
    • .. any other required columns

  • In qvw read in spreadsheet row where Region = parameter region
    • This will produce just one row
    • Put the ODBC connection string into a variable
    • Dollar expand this variable to execute it to do the connection

  • Create a Publisher Task for each Region
    • Passing in the Region as a parameter
    • Run all these Publisher Tasks in parallel


  • Have your dashboard reload Task dependent on all the Region Tasks having run successfully

View solution in original post

13 Replies
Anonymous
Not applicable

I create discrete qvd's for each region, each having a column for [Region].  It is done in a loop using a spreadsheet holding the Region, Connection String etc.

Then in the dashboard qvw I concatenate them all together.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Use a loop as you are thinking. You can maintain the Region and Connection string in a table, inline or external as Bill suggested.

However, I wouldn't try to keep track of start/end range with variables, as the variables are in the QVW and that doesn't reliably match the QVD. Instead get the start/end by looking at the QVD each time. Here's a fast performing method to manage that:

Super Fast Method to Retrieve QVD High Value | Qlikview Cookbook

jwaligora
Creator II
Creator II
Author

Thanks bill.markham & rwunderlich,

I will commence spreadsheet / temp table experimentation tomorrow.

I expect variable management will be the biggest headache...so I'll be back here crying for help in no time!

For some added context, I attached my single-connection code.

J.

maxgro
MVP
MVP

If I understand, you have your source data on 5 different server (Hive).

You can reduce the load window starting 5 parallel task; the the last task (6th) will synchronize (wait all the regional) and merge the qvd for the visualization.

jwaligora
Creator II
Creator II
Author

Running the jobs asynchronously would be preferred, but based on what I can find in the QV docs, an open connection is automatically disconnected when a new 'connect' statement is encountered (from the help doc on the 'disconnect' keyword). Is there a way to force parallel execution with multiple connections open simultaneously?

maxgro
MVP
MVP

You're right if you think of one .qvw.

But you can use, in server env, 5 parallel task (5 different .qvw loading data in parallel).

Anonymous
Not applicable

You can run tasks [aka jobs] in parallel, assuming you have enough server grunt to handle the load.

You could still have a single qvw for the loads with a parameter for Region so it picks up the connection string etc.. from your connection details spreadsheet / table for the row for that Region.

For keeping track of start / end dates I store them all in a qvd, concatenating on extra rows when needed.  This qvd can then be read and variables set to the values stored in it.

jwaligora
Creator II
Creator II
Author

I think I can wrap my head around the multi qvw implementation maxgro‌ suggested (assuming I can set task dependencies in QV when scheduling the final/aggregate qvw), but I haven't quite figured out how a parametrized solution will help overcome the sequential nature of ODBC connect statements. Any chance you can manufacture some high-level pseudo code to demo the flow?

(Keep in mind this is my 2nd QV dashboard, with the 1st one dating back about 2 years, so I may at times be missing the obvious).

Anonymous
Not applicable

Is their just a single ODBC connection for each of your Regions ?

And is this different for each Region ?