Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
DennisNorton
Partner - Contributor III
Partner - Contributor III

Combining two data sources into single fact table

Trying to design a star schema and having trouble, still struggling understanding how to join multiple sources into a single fact table. I have 2 Fact-type tables and 1 dimension table.

HEADER:

TestIdNum & Module & Enrolldate = PrimeKey,

endate      AS EnrollmentDate,

testdate     AS TestingDate,

PersonID,

DiscAmount

//EARNED:

Join (HEADER)

TestIdNum & Module & Enrolldate       AS PrimeKey,

enrollment_date                                  AS EnrollmentDate,

test_date                                            AS TestingDate,

EarnAmount                                       AS [Earn Amount]

PERSON_DIMENSION

PersonID,

PersonFullName,

PersonAddress,

PersonCity

I basically want to be able to show the sum of Earn Amount for each PersonFullName in a table. Also the two date fields in each table should relate. I thought I would need to concatenate but then Person's Name doesn't have a field from the Earned table data.

1 Solution

Accepted Solutions
marcus_sommer

Maybe a mapping could be an alternative to the join, I mean something like this:

Map_EARNED:

mapping load

TestIdNum & Module & Enrolldate       AS PrimeKey,

EarnAmount                                       AS [Earn Amount]

From EARNED;

HEADER:

load

applymap('Map_EARNED', TestIdNum & Module & Enrolldate, '#NV') as [Earn Amount],

TestIdNum & Module & Enrolldate AS PrimeKey,

endate      AS EnrollmentDate,

testdate     AS TestingDate,

PersonID,

DiscAmount

from HEADER;

You could use a quite similar mapping approach to add the PersonID to the EARNED table while concatenating them to the HEADER table whereby I assume that the pure mapping approach would be more suitable for you.

ps: You need to ensure that your key is really valid - most often you need to use a delimiter like ... & '|' & ... between the fields of this composite key to be sure of them.

- Marcus

View solution in original post

3 Replies
marcus_sommer

Maybe a mapping could be an alternative to the join, I mean something like this:

Map_EARNED:

mapping load

TestIdNum & Module & Enrolldate       AS PrimeKey,

EarnAmount                                       AS [Earn Amount]

From EARNED;

HEADER:

load

applymap('Map_EARNED', TestIdNum & Module & Enrolldate, '#NV') as [Earn Amount],

TestIdNum & Module & Enrolldate AS PrimeKey,

endate      AS EnrollmentDate,

testdate     AS TestingDate,

PersonID,

DiscAmount

from HEADER;

You could use a quite similar mapping approach to add the PersonID to the EARNED table while concatenating them to the HEADER table whereby I assume that the pure mapping approach would be more suitable for you.

ps: You need to ensure that your key is really valid - most often you need to use a delimiter like ... & '|' & ... between the fields of this composite key to be sure of them.

- Marcus

DennisNorton
Partner - Contributor III
Partner - Contributor III
Author

Thanks Marcus, I think the mapping of PersonID on the Earned and concatenation sounds like what I'm looking for.

In general, when I am combining multiple sources into one fact table, I have a hard time understanding when I should concatenate and when to just join. If I had more dimension keys in my HEADER table, and missing some from the table I was concatenating in, is it best to somehow map all the keysbefore concatenating another table with additional measures?

marcus_sommer

There is no general rule when to concatenate fact-tables or to merge them per joining or mapping - it will always depend on your data. Joining is the most difficult approach because you could quite easily and unwanted remove or duplicate records which are problems which mapping would avoid (and you could also use several mappings parallel and/or using combined mapping-values which could be then splitted per subfield() again).

Most often is concatenating the easiest way to merge the facts whereby you will need in each case a strategy how to handle missing key-values - maybe by adding further load-steps to get them, by replacing them with a default value like 'Missing' or just by removing the record or by flagging them as invalid.

Although I personally prefer to use a single fact-table (mostly created by a concatenation) there are cases where a link-table model with two or more fact-tables could be more suitable - but like said before it depends ...

- Marcus