Discussion board where members can learn more about Qlik Sense App Development and Usage.
I'm trying to come up with a uniform table/dimension of therapeutic areas around an organization. Like this:
THERAPEUTIC AREA: Immunology Neuroscience Vaccines Oncology
These therapeutic areas are in disparate data sources and have disparate relationships with other entities (doctor, hospital,...).
I don't have a unique data source with all those areas, so what I'm doing is a concatenation of all the different data sources where the therapeutic areas appear (with all the transformation operations it needs). Like this:
Therapeutic Area: LOAD Therapeutic Area ,... ,... FROM datasource1; CONCATENATE([Therapeutic Area]) LOAD Therapeutic Area, ,... ,... FROM datasource2;
Now, associated with these areas are the diseases/patologies. These, too, are being concatenated in the same table of the areas, like a father-child hierarchy:
Therapeutic Area Disease
Immunology Rheumatoid arthritis
Immunology Ankylosing spondylitis
Neuroscience Parkinson’s Disease
Neuroscience Multiple Sclerosis
The problem I'm facing is: not all the areas appear in ALL the data sources, and the same happens with the diseases. I need to have the combination (composite key) of every TA-disease pair, including the combination where the disease doesn't occur, in order to map relationships with all the other entities in my model that are only associated with a therapeutic area, not a disease. Like:
<NULL>|<NULL> Immunology|<NULL> Immunology|Rheumatoid Arthritis Immunology|Ankylosing spondylitis Immunology|Osteoarthritis Neuroscience|<NULL> Neuroscience|Parkinson’s Disease Neuroscience|Multiple Sclerosis
and so on...
Imagine that the key Immunology|<NULL> doesn't appear in any data source, only Immunology|Rheumatoid Arthritis, but an entity (like the doctor) only requires Immunology|<NULL> (without diseases)... How can I model this relationship?
I sense I will have to apply the concept of Generic Keys, but I don't know exactly how to apply!
@Henric_Cronström, so sorry to insist this way, but I've read that you are the expert in these matters. Can you give your input regarding the use of Generic Keys for this case? Would greatly appreciate.
From your description it sounds as if Generic Keys could be a solution. Or perhaps a Hierarchy Load...
Could you post some sample files?
I'm attaching an excel file with random data of the different data sources I'm trying to integrate. The data sources are numbered. The 4 therapeutic areas in case are Immuno, Neuro, Vaccines and Onco.
For the dimension Therapeutic Area, I'm concatenating all the data sources from 1 to 3 to have dimension values as complete as possible. Actually, Disease should be the name for this dimension since it is more granular (one Therapeutic Area can have multiple Diseases associated, but one Disease is associated with one Therapeutic Area only).
At the end of this concatenation, I'm creating the composite key as follows:
[Therapeutic Area]&'|'&Disease as %ID_Disease
Now consider these 2 situations:
1) datasource3 goes to therapeutic area level only (no diseases) and does not have 'Onco' therapeutic area associated with it. Because of that, when I integrate this data source with the rest, I will only come up with these %ID_Disease:
Immuno| Neuro| Vaccines|
How do I come up with the Onco| key?
2) I have the datasource4 which gives me doctor-related information (his opinions on a subject). He's related to one therapeutic area only (his specialty). How can I map the relationship Doctor-Therapeutic Area? How do I create the %ID_Disease for the doctor? I have a dimension Doctor with some attributes and don't know if Therapeutic Area should be another one of its attributes or if I should collapse it in the fact table.
This is my model (for now):