Data Reduction After 3-Tier Load

steverosebrook — Sat, 21 Apr 2018 16:15:31 GMT

We've developed a fairly comprehensive 3-tier data model app that includes all of our company's products, channels, sales transaction data, account balance data, etc. For our end-user apps, we then just do a binary load from that 3-tier data model. The problem we have is that most of our apps focus on specific products, channels, etc. - meaning we don't need every bit of data in our 3-tier data model.

So we end up doing a data reduction step in our end-user apps, to subset the data down to just what we need. Removing data model width (tables, columns, etc.) is easy - we can quickly and easily drop whole tables we don't need, drop unneeded fields, etc. using drop statements.

Reducing the "depth" of the data has been problematic, though. When we want to build an app for a specific product, for example, we typically reload the main dimension table using a resident load, and use a where clause to limit the data. Unfortunately, that leaves us with a lot of orphan fact table data: sales transactions for other products, account balances for accounts no longer included, etc. So when we put up a KPI showing Total Sales YTD, for example, the KPI includes all that orphan data and overstates the total.

We would rather not have to do a lot of set analysis coding in measures in the app, so we've been trying to use code in the load script to remove the orphan data from the various fact tables. We've been using right joins to do this - but it's code I found online somewhere and I don't really understand how or why it works - and the problem is that it doesn't always seem to work:

//Reduce overall data model to Product A.

//Filter main dimension table down.

Main_Dimension_Table_New:

NOCONCATENATE load *

resident Main_Dimension_Table

where Product='Product A';

DROP TABLE Main_Dimension_Table;

Rename Table Main_Dimension_Table_New to Main_Dimension_Table;

//Do right join with Sales Data to get rid of records that aren't in the main dimension table.

right Join (SalesData)

SalesData_Dedup:

load distinct

Primary_Key

resident Main_Dimension_Table;

So my question is: Is the above code the best way to reduce data after a binary load, or are there better alternatives?

Thanks,

Steve

topic Data Reduction After 3-Tier Load in App Development

Data Reduction After 3-Tier Load