Joining tables issue

slihp67 · ‎2020-04-24

I have three tables I want to set up, Student with an inner join to schools (each student has a school) and also left join to Address (each student may/may not have an address).

[Student$]:
LOAD * INLINE [
SchoolID, AddID, Name
1, 3, Paul Smith
2, 2, William Philp
3, 1, Mark Turner
4, 7, Simon Easton
5, 9, Lubo Moravcik];

//Inner Keep ([Student$])

[School$]:
LOAD * INLINE [
SchoolID, SchoolName
1, St Johns
2, St Pats
3, Our Ladys
4, Belgrove Academy ];

//Left Keep ([Student$])

[Home Address$]:
LOAD * INLINE [
AddID, AddressName, Country
1, London, England
2, Amsterdam, Holland
3, Paris, France];

As I understand it, qlik will automatically join tables with similar key cols using a full outer join (keeps commented out). When I look at individual tables I see all the rows I expect but when I compare fields from two tables I see the full outer join result.

If I then specify inner and left joins between tables in my script (remove comments in script) my multi table-field results look correct but my single table counts are wrong now. is there anything that can be done to get the best of both worlds

the example above is a representation of the type of things users will want to do and as I cant model the data further just now I am trying to evaluate if qlik will be a good tool for this project with this data as-is.

thankyou in advance.

ArnadoSandoval · ‎2020-04-24

Hi @slihp67

Welcome to the Qlik's world of associative joins data models, it takes a while to get used to its benefits; when we implement traditional SQL driven data models, they seem robust, everything it tight together, the relationships between entities enforce that the data represented by the model is complete, a great experience for the user.

Let's analyze the traditional SQL data model (second screenshot):

It tells us the school has 4 students, instead of 5; "Lubo Moravcik" disappeared; whoever is planning for this school will take decisions for 4 students, in other words, just 80% of the school population; if the school has 1000 students, any plan will be based just in 800 of them.
The same scenario could happen with other dimensions, like customers or products and transnational data could be obscure.
Let's pretend that Mrs Moravcik calls the school to inquire something about her child, or the school is planning some activities for its student, Moravcik will miss all the fun as he does not exist.
Flaws with the business applications supplying this data are hidden as well, it will take a bit longer to resolve them, somehow "Lubo Moravcik" record is not complete for many reasons, but the planners and decisions makers for this school (or Supply and Chain managers) could base their work with an incomplete data set.
Of course, the IT team could write their SQL queries in ways to assign defaults to those incomplete records, some extra work, and perhaps a reactive approach.

Associative data models:

It contains all the records available at the data source application, good records, bad records, broken records. It seems dirty, but it will add value to the solution. The stakeholders will appreciate that, they know their business.
It is not that the dirty records jeopardize the outcome of the model, you can handle them, as well as stakeholders and endusers with the Qlik User Interface. It is not that hard for them, they do wonders with Excel already.
"Lubo Moravcik" record will be there, broken or incomplete, whoever is working with his/her record will be able to find that the record is missing the address or school, perhaps they will be able to pinpoint the reason as a missing school or address in the data source, I am sure they know who is responsible for maintaining schools or addresses in their application.

New fields, dimensions or entities:

They happen all the time, think about additional data attributes that departments have, they could be regarding students, products, customers, etc. these bits of information did not make it into the business application, but they are key for the successful operation of teams or departments. Well, with traditional SQL data models the department or team will have a hard time getting their precious attributes, while with Qlik they could easily integrate with the associative data model. They are wizards doing that.
With new trends, I am sure the current crisis with COVID-19 will introduce new entities for some business, additional attributes to be included with their existing data models, with Qlik they could quickly implement them and make decisions instead of a long wait for traditional SQL data models architectures.

While researching for this reply I found these articles:

The learning curve to work with associative data models is not that big once we understand the value added by the way it integrates data and the flexibility stakeholders and endusers gain in return.

Hope this helps,

Arnaldo Sandoval

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

slihp67 · ‎2020-04-30

Spoiler

Thanks, Arnaldo for the great and detailed example. So this just of your post is to accept the full outer join relationships between tables and handle these within Qlik? this leads me to another question then - How is best to do this? I have only seen the "Include Null values" checkbox option for each dimension.

sorry for the delay in my reply.

Thanks, Arnaldo for the great and detailed example. So this just of your post is to accept the full outer join relationships between tables and handle these within Qlik? this leads me to another question then - How is best to do this? I have only seen the "Include Null values" checkbox option for each dimension.sorry for the delay in my reply.

ArnadoSandoval · ‎2020-04-30

@slihp67

Qlik uses column-name driven association between your tables, so your tables presenting the data may look ugly with those holes, that is the traditional way of thinking working with conventional SQL data model, everything is normalized; now you should introduce your users with this extra information, you still can present your data nicely with the Include Nulls, but if your users become aware that the data model will be able to answer questions about students with incomplete attributes, they will start to ask new questions, and you will be able to provide the answers, because the data model may produce a table with students pending their school assignation or students missing their address; this is meaningful information for them, and they know the team able to resolve these issues, and believe me, it will be operations and no information technologies.

The Null handling (exclude or include them) add power to the data model, as it will be able to answer a broader range of questions, you only need to handle the nulls when building the dashboard.

With the Students-Schools association you will be able to answers these questions:

Total number of students (this is by counting all the students with and without associated school).
Total Students enrolled in schools. (just exclude the nulls)
Total Students pending enrollment to schools. (just include only nulls)

I hope this helps,

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

Related Topics