Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey QlikView Community
I'm trying to create a matrix where the two axis are the same field, and the calculation is of how similar each object is to another determined by a certain algorithm. I.e. something like:
| A | B | C | |
| A | 100% | 30% | 55% |
| B | 30% | 100% | 70% |
| C | 55% | 70% | 100% |
But I cannot get each value to be equal to all values. This means I can only get it to evaluate down the middle diagonal (i.e. the 100% values). Like:
| A | B | C | |
| A | 100% | Missing | Missing |
| B | Missing | 100% | Missing |
| C | Missing | Missing | 100% |
As the algorithm is determined by an aggregation over a set of attribute fields from another connected table, it is not possible to just create a alias field with the second names. How do I get A, B, and C to be an option of A, so that the A-B, A-C and so forth calculations are not just missing?
Still not entirely it, but this helped me get it right!
I extracted my data so I have one field of equipment and then one field per area with the count of locations per equipment. Using crosstable I was then able to create a table with equipment, area and location count.
I saved this table in a temp qvd then loaded it in twice in qualified versions as table_1 and table_2 with equipment unqualified.
The pivot table then has the dimensions table_1.area and table_2.area, with the table_2.area field across.
The expression is then the following:
= If( Table_1.area = Table_2.area, 100, Num((Sum( Aggr((Sum( Table_1.No_Of_Locs) + Sum( Table_2.No_Of_Locs)) * (Count( if( Table_1.No_Of_Locs <> 0, 1)) + Count( if( Table_2.No_Of_Locs <> 0, 1))), Table_1.area, Table_2.area, Equipment)))
/( Sum(Table_1.No_Of_Locs) + Sum(Table_2.No_Of_Locs) )
/ (Count( if( Table_1.No_Of_Locs <> 0, 1)) + Count( If( Table_2.No_Of_Locs <> 0, 1))) * 100, '##,00'))
As the algorithm cannot calculate the similarity between two similar areas (it will not end up being 100%), the if statement sets this value. The num() function helps limit the number of decimalpoints.
can you share your app with us ?
Unfortunately not, the data is classified. The expression I'm using to calculate is the following:
Num((Sum( Aggr(Count( {[System]} DISTINCT Location) * Count( {[System]} DISTINCT Area), Category_1, Category_2, Category_3))
/ Sum( Aggr(Count( {[System]} DISTINCT Location), Category_1, Category_2, Category_3)))
/Sum( Aggr( Count( {[System]} DISTINCT Area), Category_1, Category_2, Category_3))*100, '#,00')
I'm calculating the commonality among areas of equipment. The areas are the ones indicated by A, B, and C in my original post. The categories are descriptions of the locations of equipment. {[System]} refers to a listbox, which is a dynamic choice of which collection of equipment is being viewed.
When I try to reload the area names under a different field name (e.g. Area_Name_2) there is no connection to the equipment descriptions, because the field that connects the original area field to the table with the descriptions is not loaded in. When I do load the connecting field I get the same problem as in the second table, because now they are once again only linked A-A, B-B, and C-C.
Hi
yes could you provide some data and the algoritm
it should be possible with an outer join of table on itself in script first
regards
Hi
just a dummy example,
in your matrix above, how do tou count the % (in english not in aggr) !
for A-B by example : 30% is count of B with commun category with A / count (A+B) ?
thanks
I multiply the number of locations with the number of areas for each combination of category 1, 2, and 3, and then sum those values together. This is then divided by the sum of the number of locations per combination of category 1, 2, and 3.
As the boundary of this calculation is the sum of the number of areas per combination of the categories, I then divide by this sum to get the percentage of the max possible amount.
If all locations are the same in two areas, the percentage should end at 100.
Hi,
this a first step to make a cross table of area on itself.
in script load twice data :
qualify *;
data:
LOAD * FROM your data source
data2:
LOAD * FROM your data source
unqualify *;
with these dummy data :

then as dimensions :
data.area
data2.area
a) first solution (asymetric) :
expression :
sum(aggr( max( if(data.category=data2.category,1,0)), data2.area,data.ident))
/ count(total <data.area> data.ident)

b) second (symetric) :
(sum(aggr( max( if(data.category=data2.category,1,0)), data2.area,data.ident)) +
sum(aggr( max( if(data.category=data2.category,1,0)), data.area,data2.ident)) )
/ (count( total <data.area> data.ident)+ count( total <data2.area> data2.ident))

without an example, I'm not sure to understand exact calculation
regards
That is unfortunately not entirely how the categories work.
The categories (1, 2, and 3) are based on an ISO-Standard and go together to describe the equipment at a location. To simplify, I have created a dummy data set where the categories are collected in one field. The data is structured as below:
Equipment* | Area A | Area B | Area C |
Valve | 3 | 3 | 6 |
Crane | 0 | 1 | 0 |
Motor | 2 | 3 | 0 |
Where the numbers in the area columns is the number of locations for each piece of equipment on each of the areas.
To calculate the similarity across each of the areas the algorithm comes in.
E.g. to calculate the similarity between area A and B in excel it would look like:
Equipment | Area A | Area B | Sum of locations | No of areas with equipment | Locations * Areas |
Valve | 3 | 3 | 6 | 2 | 12 |
Crane | 0 | 1 | 1 | 1 | 1 |
Motor | 2 | 3 | 5 | 2 | 10 |
Sum | 12 | 5 | 23 |
The commonality index would then be:
The sum of Locations * areas divided by the sum of the locations, i.e. 23/12 = 1,9. The maximum value of the calculation is the sum of the number of areas with the equipment, i.e. 5. This means, that the commonality percentage ends at 1,9/5*100 = 38%.
The percentage is what I want shown in the matrix.
The matrix should end up symmetric, as the commonality between A-B and B-A should be the same.
I have attached a QlikView file where the dummy data is loaded in.
* Using the ISO-Standard the names would be something like, "Mechanical, Valve, Manual Valve" for category 1, Category 2, and Category 3 respectively. The names here are simplified.
Hi Julie,
thank's for the complements, but unfortunately I can't read qvw I've free version QV.
I made test with these data :

to make a cross product pivot table, in script :
qualify *;
data:
LOAD * from your data source;
data2:
LOAD * from your data source;
unqualify *;
equip:
load distinct data.equipment resident data;
area:
load distinct data.area resident data;
outer join (equip) load * resident area;
drop table area;
exit script;
then make a pivot table with as dimension :
data.area
data.equipment
data2.area
I retrieve 38% for A/B :

expressions :
result :
=locxarea/totarea/[totH+totV]
locxarea :
sum(total <data.area,data2.area> aggr((if(count(distinct data.id)>0,1,0) +
if(aggr(sum(if(data2.equipment=data.equipment,1,0)),data.area,data.equipment,data2.area)>0,1,0))
* (aggr(sum(if(data2.equipment=data.equipment,1,0)),data.area,data.equipment,data2.area)
+ count(distinct data.id)),data.area,data.equipment,data2.area)
)
totarea :
sum(total <data.area,data2.area> aggr(count(distinct data.id),data.area,data.equipment,data2.area))
+ sum(total <data.area,data2.area> aggr(sum(if(data2.equipment=data.equipment,1,0)),data.area,data.equipment,data2.area))
[TotH + TotV] :
sum(total <data.area> if(aggr(count(distinct data.id),data.area,data.equipment)>0,1,0)) +
sum(total <data.area,data2.area> aggr(max(if(aggr(sum(if(data2.equipment=data.equipment,1,0)),data.area,data.equipment,data2.area)>0,1,0)),data.area,data.equipment,data2.area))
I hope you can continue with this,
otherwise attach just few rows in xslx
regards
could you attach too some aspirin 😉 ?