These therapeutic areas are in disparate data sources and have disparate relationships with other entities (doctor, hospital,...). I don't have a unique data source with all those areas, so what I'm doing is a concatenation of all the different data sources where the therapeutic areas appear (with all the transformation operations it needs). Like this:
Now, associated with these areas are the diseases/patologies. These, too, are being concatenated in the same table of the areas, like a father-child hierarchy:
The problem I'm facing is: not all the areas appear in ALL the data sources, and the same happens with the diseases. I need to have the combination (composite key) of every TA-disease pair, including the combination where the disease doesn't occur, in order to map relationships with all the other entities in my model that are only associated with a therapeutic area, not a disease. Like:
Imagine that the key Immunology|<NULL> doesn't appear in any data source, only Immunology|Rheumatoid Arthritis, but an entity (like the doctor) only requires Immunology|<NULL> (without diseases)... How can I model this relationship?
I sense I will have to apply the concept ofGeneric Keys, but I don't know exactly how to apply! Any direction?
Re: Modeling all the different combination of dimension fields using composite key
@Henric_Cronström, so sorry to insist this way, but I've read that you are the expert in these matters. Can you give your input regarding the use of Generic Keys for this case? Would greatly appreciate.
I'm attaching an excel file with random data of the different data sources I'm trying to integrate. The data sources are numbered. The 4 therapeutic areas in case are Immuno, Neuro, Vaccines and Onco.
For the dimension Therapeutic Area, I'm concatenating all the data sources from 1 to 3 to have dimension values as complete as possible. Actually, Disease should be the name for this dimension since it is more granular (one Therapeutic Area can have multiple Diseases associated, but one Disease is associated with one Therapeutic Area only).
At the end of this concatenation, I'm creating the composite key as follows:
[Therapeutic Area]&'|'&Disease as %ID_Disease
Now consider these 2 situations: 1) datasource3 goes to therapeutic area level only (no diseases) and does not have 'Onco' therapeutic area associated with it. Because of that, when I integrate this data source with the rest, I will only come up with these %ID_Disease:
How do I come up with the Onco| key?
2) I have the datasource4 which gives me doctor-related information (his opinions on a subject). He's related to one therapeutic area only (his specialty). How can I map the relationship Doctor-Therapeutic Area? How do I create the %ID_Disease for the doctor? I have a dimension Doctor with some attributes and don't know if Therapeutic Area should be another one of its attributes or if I should collapse it in the fact table.