Solved: Re: Similarity analysis - Qlik Community

mattphillip · ‎2023-02-22

I'm hoping someone might be able to assist. We have a dataset featuring academic programme titles in one column and modules noted in another. A programme may have up to ten modules. Some of these modules are shared between programmes. I have been asked to find a way to deliver a heat map/matrix detailing how similar (by %) each programme is to one another.

I can't seem to get my head round this so would appreciate any help. We have apps which look at which modules are shared (by a Y/N flag) and have a table which features a count of distinct programme titles per module.

Any help would be most appreciated.

Yours,

Matt

mattphillip · ‎2023-02-24

I'm pleased to confirm we now have a solution. Looks like I was too busy expecting a complicated solution and a far more simplistic one was available.

Basically we took the programme information data table and qualified it within the load script. This table contains info including school, stage, modules etc. We then created a composite key by combining the module field with the programme code field ([Mod-Prog Combo]).

We then loaded a second table from the same source, this time loading and renaming the programme code field (from [<table name>.Prog Code1] to [<table name>.Prog Code2]), loading the module field whilst retaining the same name, and finally loading the same composite key field but adjusting the name slightly ([Mod-Prog Combo 2]). We joined those tables with a basic join statement.

Finally, we added in a third table, again from the original source and reloaded the programme structures field with the same name as the second table (e.g. [<table name>.Prog Code2]) and added a new calculated field (count(distinct [Module]) as [Module count]), grouped by [<table name>.Prog Code1].

Our table then includes both the [<table name>.Prog Code1] field, labelled as 'Programme Code (Primary)' and the [<table name>.Prog Code2] field, labelled as 'Programme Code (Comparator). Our 'Similarity %' measure then comprises of Count(Distinct[<table name>.Module])/[Module Count].

Hope this helps someone!

View solution in original post

petter · ‎2023-02-22

The load script:

I can't help thinking in bits (and bytes) so I made an example showing how to use certain bit positions to represent each module. As long as you have just a limited number of modules bit # 1 which represents 1 in the binary system is module #1, 2^0 or 2^(module#-1).

Module #2 would be 2^1, module # 3 would be 2^2, module #4 would be 2^3 and so forth.

Then BitCount() function and BitAnd() can be used to good effect to calculate the number of overlapping or common modules between two or more programmes.

I have included the application from which I took the screenshots above...

mattphillip · ‎2023-02-23

Thanks for your help Petter. I like the idea. However, I've just checked and we've got over 100 programmes and around 600 modules so I'm not sure it's going to work.

I had another chat with my director and they've complicated things further. They would like something like the below. I don't even know how to do this in several steps to make it possible. Don't suppose you have any ideas how this could work? I'm leaning towards pushing for some sort of simplification.

Thanks again for your time.

Matt

% Similarity of Programmes (Based on Modules)
	Programme 1	Programme 2	Programme 3	Programme 4	Programme 5
Programme 1	100%	50%	60%	10%	0%
Programme 2	50%	100%	25%	25%	10%
Programme 3	60%	25%	100%	60%	80%
Programme 4	10%	25%	60%	100%	5%
Programme 5	0%	10%	80%	5%	100%

mattphillip · ‎2023-02-24

I'm pleased to confirm we now have a solution. Looks like I was too busy expecting a complicated solution and a far more simplistic one was available.

Basically we took the programme information data table and qualified it within the load script. This table contains info including school, stage, modules etc. We then created a composite key by combining the module field with the programme code field ([Mod-Prog Combo]).

We then loaded a second table from the same source, this time loading and renaming the programme code field (from [<table name>.Prog Code1] to [<table name>.Prog Code2]), loading the module field whilst retaining the same name, and finally loading the same composite key field but adjusting the name slightly ([Mod-Prog Combo 2]). We joined those tables with a basic join statement.

Finally, we added in a third table, again from the original source and reloaded the programme structures field with the same name as the second table (e.g. [<table name>.Prog Code2]) and added a new calculated field (count(distinct [Module]) as [Module count]), grouped by [<table name>.Prog Code1].

Our table then includes both the [<table name>.Prog Code1] field, labelled as 'Programme Code (Primary)' and the [<table name>.Prog Code2] field, labelled as 'Programme Code (Comparator). Our 'Similarity %' measure then comprises of Count(Distinct[<table name>.Module])/[Module Count].

Hope this helps someone!

Similarity analysis

Developers

dimension

expression

Script