Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am looking for some feedback on best way to organize Star schema. In my use case I have 3 Entities Organization, Employee, Claims filed by Employee.
Org <> Employee Relation
Employee <> Claim Relation
When looking to create FACT, DIMENSION Table how do I establish link between Org & EE. Is link between Dimension Table allowed or appropriate?
Or do i just combine Org & EE Table. Org table will have all active organization. EE Table will have all EE information of those active org. Not all EE will be on claim. Claim Fact table will claims of EE
Hi @Sib,
That will act like you are expecting. It will not populate any null values in the fact table.
Hi @Sib ,
As per my knowledge in start schema you don't need to connect the dimension tables with each other.
you can connect both tables with a fact table that will create a star schema.
Above attached diagram in your question is a perfect example of a Star Schema.
In above only problem is connection between Organization & Employee is then established via FACT table. For example not every Employee of that organization would have filed claim. Since no direct relation or Link exist between Org Dim and EE Dim that means i cannot find EE of an organization who have never filed a claim.
Thanks @abhijitnalekar . In above model all will work fine if i wanted to know about EE to Org connection for those EE who are on claim. If person is not on claim then since their wont be any entry in FACT table will it still make the connection. I dont want to populate FACT tables with lot of Nulls rows simply to know EE <> Organization connection. Thank you for your response.
Hi @Sib,
That will act like you are expecting. It will not populate any null values in the fact table.
For many scenarios your data-model will work fine. Quite often is it even preferable to remove dim-values which have no fact-data, for example with a where exists() clause. An exception are NULL analysis but IMO the usefulness of them are heavily overestimated. If this isn't an important must-requirement you shouldn't implement it.
But if you need it is populating the appropriate data within the fact-table often the most practically way - especially as you could often create them on a higher granularity as the atomic level or that these information are only valuable for the most current periods. This means the amount of the populating data might not too big as to become a showstopper.
- Marcus