13 Replies Latest reply: Dec 2, 2015 2:04 PM by Joanna Waligora RSS

    Incremental loading and merging of data from multiple connections

    Joanna Waligora

      Hi,

       

      I'm looking for some data handling strategy feedback.

       

      Scenario:

      I have regional data (same format) sitting on 5 separate servers. Each region has 1 month of log data, made available in Hive. The data volume is huge, so I need to:

      (1) transform/aggregate it during load and

      (2) store the aggregated content for up to a year.

       

      Current Status:

      I have a working incremental load script (using a qvd file) for one region (one ODBC connection).

       

      Challenge:

      Because loads from each region can fail independently, I would like to keep the regional data in separate qvd files, so that each can be corrected/updated on the subsequent incremental load execution. This means that for EACH connection/region I have to track start/end dates for both qvd file and the current hive load.

      ...I'm assuming I would have to edit [date] variable names so they're different for each connection e.g. vHiveLoadStartDateRegionA, vHiveLoadEndDateRegionA, vHiveLoadStartDateRegionB, vHiveLoadEndDateRegionB, etc. (I understand QV does not have a method of restricting variable scope).

       

      Question:

      What's the best way to handle this?

      Should I have 5 copies of the same connection script but each with different connection, file, and variable names?

      Should I apply some sort of a loop, where the connection, file, and variable names are auto-generated on each iteration?

      Regardless of the strategy, what's the best way to merge the regional data for QV visualization, once incremental loads are done?

       

      Thanks,

      J.