Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everbody,
i have the following problem:
My data is sorted into different orders which contain various binary attributes.
I want to display the data in a graphic which shows the corrleations of single attributes to other single attributes.
The size of the Bubble should be a measurement for correlation. The problem is, that i have the same axe in one diagram twice.
I tried it with the Matrixdiagram and the Pointdiagram.
The Problem of the Pointdiagram is that its axes are formulas and I am not able to categorize to the specific attributes.
The Pointdiagram looks pretty fine, but im not able to insert the same dimension twice.
It would be very nice if you can give me any hints or tipps or just the side of the manual
Down is displayed the logic for the correlation:
Sodele,
=if(SACode<>SACode5,
count(aggr(NODISTINCT if(ProdNummer=%ProdNummer5,1),ProdNummer,%ProdNummer5)) / count(ProdNummer)
)
One possible approach is to load your dimension twice, something like
Transactions:
LOAD
recno() as TransID,
ceil(RAND()*100) as PersonID,
chr(floor(RAND()*10+65)) as Category
AutoGenerate 150;
LOAD
PersonID,
Category as Category2
resident Transactions;
if you want to use Category for axis. You can easily relabel Category2 to Category in dimension tab.
See also attached,
Stefan
Thank you very much swuehl for the fast reply 😃
I'll try to load the dimension twice.
But there is another cirumstand:
There is a difference between the Correlation A to B and B to A.
example:
order1: a , b , c
order2 a , b , d
order3 a , c , d
now is A to B 2/3 ~ 66 % correlation
B to A is 100% correlation
how can I implemt this into my model?
Thank you very much 😃
Another problem is that i have a 100% correlation between the identical attributes:
How can I exclude them from the selection?
Maybe like attached?
Thank you very much swuehl.
I am very impressed that you get the right solutions that fast! Im working with QV for about 1 Month but you are amazing ! =). The model is exactly what i need - Respect.
Im trying to adopt your model to my main model, but i have a question im struggeling with for hours..
In the picture i described your formula with my comments, in red is the part of the formula i don't understand.
It would be very helpful if you can explain that part to me, or correct me if i explained something wrong.
You got it mostly right, except the part with the aggr() dimensions.
In my expression
if(Bin<>Bin2,count(aggr(NODISTINCT if(Order=Order2,1),Bin,Bin2,Order,Order2)) / count(Order))
I used an advanced aggregation with expression if(Order=Order2,1) and dimensions Bin,Bin2, Order, Order2.
You can imagine the advanced aggregation like a table inside the expression.
In fact, since we are already using Bin and Bin2 as dimesions to our chart, you don't need to state them here (but it won't harm, the results of this aggr() table will be automatically limited to the values of the outer dimensions.
Alternatively, if you keep all four dimensions, you might remove the NODISTINCT qualifier.
So if you look at one cell in our chart, e.g. Bin=a and Bin2=b, the aggr() will iterate over all possible combinations of Order and Order2 values and check the expression, which results in two records with 1:
Order Order2 Bin Bin2 count(if(Order2=Order,1))
2
order1 order1 a b 1
order1 order2 a b 0
order2 order1 a b 0
order2 order2 a b 1
order3 order1 a b 0
order3 order2 a b 0
Then you devide by the count of Order to get the percentage (which is 3, not 6, because the count is outside the aggr(), so not all combinations are taken into account). So we get a result for our chart of 2/3.
To iterate over all possible combinations, it is important that our tables are not linked, this is an important change to my first example.
I hope I made myself somewhat clear.
Regards,
Stefan
Hello Stafan,
thank you very much for your solution.
But i wasn't able to assign it to my solution.
I uploaded my model, the structure is a bit different. It would be very helpfull for me because im searching the mistake for about 2 days =(.
In my model are 3 Pages, the first for the overview, the second is a time analysis and the third the correlation.
In the correlation tab i tried to implement your second solution, but it didnt work.
In the time correlation i search for the correct expression. in the first graph is shown the sold orders. In the second graph i want to display the percentage of single elements of the order.
When i display it absolutely the singel elemets follow the deviation of the total sales so i cant get any information out it.
So i want to divide it through the sales to get a percentage. But it divides only through the sales which contain that specift element so i get a correlation from 100%.
Thank you very much for your expert knowledge
greetings marko
I think you need to keep the Prodnummer also in your data island table for the correlation, not only the SACode.
Something like
//Laden unverknüpft für Korrelation Signifikanz
LOAD
[Produktionsnummer] as %ProdNummer5,
text(subfield(purgechar([Prod.codes],'" '),'#')) as SAcode5
FROM
(txt, codepage is 1252, embedded labels, delimiter is ';', no quotes);
Then you should be able to change
=if(SACode<>SAcode5,count(aggr(NODISTINCT if(ProdNummer<>ProdNummer,1),Prodnummer,) / count(ProdNummer))
to
=if(SACode<>SACode5,
count(aggr(NODISTINCT if(ProdNummer=ProdNummer5,1),ProdNummer,ProdNummer5)) / count(ProdNummer)
)
I implemented the changes but i didnt mange it to run.
I added the data file so you can start imports..
It would be very nice if you could help me again 😃