3 Replies Latest reply: Jan 17, 2017 9:55 AM by Marcus Sommer RSS

    Combining two data sources into single fact table

    Dennis Norton

      Trying to design a star schema and having trouble, still struggling understanding how to join multiple sources into a single fact table. I have 2 Fact-type tables and 1 dimension table.

       

      HEADER:

      TestIdNum & Module & Enrolldate = PrimeKey,

      endate      AS EnrollmentDate,

      testdate     AS TestingDate,

      PersonID,

      DiscAmount

       

      //EARNED:

      Join (HEADER)

      TestIdNum & Module & Enrolldate       AS PrimeKey,

      enrollment_date                                  AS EnrollmentDate,

      test_date                                            AS TestingDate,

      EarnAmount                                       AS [Earn Amount]

       

       

      PERSON_DIMENSION

      PersonID,

      PersonFullName,

      PersonAddress,

      PersonCity

       

      I basically want to be able to show the sum of Earn Amount for each PersonFullName in a table. Also the two date fields in each table should relate. I thought I would need to concatenate but then Person's Name doesn't have a field from the Earned table data.

        • Re: Combining two data sources into single fact table
          Marcus Sommer

          Maybe a mapping could be an alternative to the join, I mean something like this:

           

          Map_EARNED:

          mapping load

          TestIdNum & Module & Enrolldate       AS PrimeKey,

          EarnAmount                                       AS [Earn Amount]

          From EARNED;

           

          HEADER:

          load

          applymap('Map_EARNED', TestIdNum & Module & Enrolldate, '#NV') as [Earn Amount],

          TestIdNum & Module & Enrolldate AS PrimeKey,

          endate      AS EnrollmentDate,

          testdate     AS TestingDate,

          PersonID,

          DiscAmount

          from HEADER;

           

          You could use a quite similar mapping approach to add the PersonID to the EARNED table while concatenating them to the HEADER table whereby I assume that the pure mapping approach would be more suitable for you.

           

          ps: You need to ensure that your key is really valid - most often you need to use a delimiter like ... & '|' & ... between the fields of this composite key to be sure of them.

           

          - Marcus

          • Re: Combining two data sources into single fact table
            Dennis Norton

            Thanks Marcus, I think the mapping of PersonID on the Earned and concatenation sounds like what I'm looking for.

             

            In general, when I am combining multiple sources into one fact table, I have a hard time understanding when I should concatenate and when to just join. If I had more dimension keys in my HEADER table, and missing some from the table I was concatenating in, is it best to somehow map all the keysbefore concatenating another table with additional measures?

              • Re: Combining two data sources into single fact table
                Marcus Sommer

                There is no general rule when to concatenate fact-tables or to merge them per joining or mapping - it will always depend on your data. Joining is the most difficult approach because you could quite easily and unwanted remove or duplicate records which are problems which mapping would avoid (and you could also use several mappings parallel and/or using combined mapping-values which could be then splitted per subfield() again).

                 

                Most often is concatenating the easiest way to merge the facts whereby you will need in each case a strategy how to handle missing key-values - maybe by adding further load-steps to get them, by replacing them with a default value like 'Missing' or just by removing the record or by flagging them as invalid.

                 

                Although I personally prefer to use a single fact-table (mostly created by a concatenation) there are cases where a link-table model with two or more fact-tables could be more suitable - but like said before it depends ...

                 

                - Marcus