Well... it's not perfect, but I think it's pretty close.
The attached example creates a radial chart from the raw data you showed (though I substituted more meaningful product names). For each product, it calculates evenly-spaced X and Y coordinates on a circle. A scatter chart displays those data points, so puts the products around a circle in a scatter chart. To get the connections, I use a separate dimension, a field that indentifies the two products purchased together plus the count. I then color the line based on the count of the number of times purchased together, so it highlights the stronger connections.
Unfortunately, there are no labels, which seems like the biggest drawback. You can see the products and counts data when you mouse over a data point, but it's not very well formatted.
Still, maybe it's close enough, or maybe you or someone else can figure out how to clean it up a little better.
Thank you. This answer is extremely helpful, John. I'll get back in touch after I have the chance to back engineer what you submitted. I see what you mean about the labels, but this is nevertheless very close.
One more question for you. The ideal solution would adjust the thickness or appearance of the edges/lines based on the value in the upload list, with thicker lines or gradients representing combinations with a higher value of interactions. It looks like your chart is already emulating that effect. Could you point out where I adjust the formatting for this effect? To complicate things a bit further, the scale should be logarithmic since some pairs I use will be very skewed (a scale of 1000+ to 1).
I used a linear scale for simplicity, but I agree that a logarithmic scale makes more sense for most real-world examples. The expression that controls this is hiding on the Expressions tab. Go there, then click on the little + next to the X expression. It will expand to show a list of things that you have some additional control over. One of these is the Background Color. Click on that, and over in the Definition box, you'll see the expression that I entered for it:
Actually, I should probably do the darkness with just the first parameter for simplicity. Yeah, the below looks pretty good for the example data. You'll probably need a much smaller multiplier out in the real world, but our sample data set maxed out at 15:
You may also want to play with the Line Style option, which is several down from the Background Color. In addition to darker colors, you may want to make wider lines. I think you can use any real number between 0.5 and 8 for the line width, <W2.5> for instance. DOH! Except that "Only applicable on line, combo and radar charts." Ah, well. Hopefully the color will be enough for you.
Ah! Presentation tab, "Labels on Datapoints". Select that to get the labels on the data points. Getting closer.
Excellent. This is so much farther than I expected to get for my first question. This is so helpful. This solution may take me a while to digest. I'm new to QV and not particularly technical.
Question 1: Looking at the script, I'm curious what function this section plays? I see some trigonometric references. Are you possibly creating an origin point in the chart and constructing a ring around it? And what does the "recno" portion of the syntax represent?
LOAD Product ,cos(2*pi()*recno()/$(vProductCount)) as X ,sin(2*pi()*recno()/$(vProductCount)) as Y RESIDENT RawData
Question 2: Interactivity is crucial for this graph. The ideal solution would allow me to click on any two or more vertices on the dial and restrict all additional output to the new conditions. For instance, if I click the egg and dog vertices together, I would like the dial to preserve the lines for the selected vertices and the tables at the right to display results for the intersection of egg and dog. Unfortunately, as soon as I click any single point on the dial (egg), the tables at the right adjust accordingly, but I lose the other vertices. Only egg appears. Any ideas for a solution?
Question 3: I may want to represent 100-250 vertices around this dial. That will be a formatting nightmare. Are you aware of any limitations that might prevent me from displaying 100 or more vertices on the dial?
BTW-I would be glad to write a recommendation for your effort. I appreciate your help.
1) I am indeed constucting an origin point and creating a ring around it. The origin in this case is trivial, (0,0). The ring contains the products. Recno() is a function that returns the number of the record.
We have five records in our RawData table, which since we have one row per product means we have five products. We store this in variable vProductCount. So as we look at each point, the recno() will increase from 1 to 5.
The formula is then saying that the first point should be 1/5 of the way around the circle, 1/5 of 2 pi, the second point 2/5 and so on. Add more products, and it will further subdivide the circle as necessary.
2) Hmmm, I'll need to think about this one. There are various tricks for not restricting what you show as much as what the user selected, both with set analysis and with model changes. Nothing is jumping to mind, but I'd also bet there's a clean solution, and I (or someone else) just needs to think of it.
3) No, there should be no problem for QlikView itself showing 250 vertices and the connections between them. This "radial chart" is at its core a scattter chart. I have scatter charts with probably tens of thousands of points. But it will be a serious problem for a person to make sense of what will most likely just be a big black circle on their screen. So we'll need to do something.
If appropriate for your data, what about creating a drill down group as the dimension instead of Product? At its simplest, this would just be "Product Group" and "Product". So at the product group level, you'd see people who bought from two product groups at the same time. Then if you clicked on a product group, you'd see only the products within that group.
Alternatively, and perhaps more usefully, how about limiting the number of data points dynamically to the N highest-ranked connections based on current selections? I'd have to think about exactly how to pull that off, but I believe it's possible. So when you first view the chart, you might see only the 50 most common pairs of products. But as you make selections that narrow the number of products you're looking at, you'd see less and less common pairs.
I think either way will take fiddling with the X and Y coordinates to calculate them on the fly instead of calculating them in script. But this should still be possible. Again, I'd need to think about it.
I should mention that there MIGHT be a much easier way. QlikView 10 allows for extensions, which is to say you can invent new kinds of charts. I'm still using QlikView 9, so I don't have any experience with the extensions. But it's possible that you could create a radial chart much more easily using them, and it's even possible that someone has already created a radial chart extension that you could simply use.
I might have to explore the external chart options that you mention for QV 10. I assumed that if I resorted to external charts (Schemaball and Circos were used in the examples) that I would lose the interactivity, which would be a major sacrafice for this project. Hence, I'm most interested in finding a solution contained within QV. I prefer your scatter plot suggestion though. It should work. I would just need to expand upon it.
Regarding your other comments:
1) The dropped vertices issue (clicking one vertex in the chart hides the remaning data points) isn't a deal killer, but it would be more intuitive for users to understand if the connections to the other data points were preserved in the chart.
2) I'm thinking about your drill down group suggestion. The part about limiting data points to the N highest ranked connections sounds ideal if I could pull it off. For our company (electronics software), it's common for customers to buy products from multiple product groups. One goal of this project is to reveal those cross-product group selling opportunities. Hence, I would like the initial presentation to be somewhat granular (greater visibility of connected products across product groups), but without overwhelming the user with too many points. I know the solution is out there.
Thanks again for all your great advice, John. I'll remain in touch.
I poked at it a little today and made some progress. Color is assigned so that the strongest connection is black, and the others are based on that, regardless of the actual values. You can select the specific products of interest, and it will redistribute those specific products around the circle, and show only the connections between them. Product groups are supported and are the default view. You can enter the number of products, groups and the chance that any two products are purchased together, and reload to generate a new random data set.
- Clicking on data point removes all connections.
- No "Top N Connections" functionality.
- Shading for product groups is not correct.
- Sorting is alphabetic so not very useful.
testRadialChart3.qvw 132.2 K
As an option to logarithmic shading, consider square root shading.
Visualize the connection between two products as a pipe, and the number of connections as the size of the pipe. If one pipe is handling twice as many connections as another, the ratio of the widths of the pipes is sqrt(2). With the darkness of the line corresponding to the width of the pipe as seen from above, it would make sense to use the square root instead of the logarithm for shading.
A logarithmic scale might be easier to explain, though. There's no "square root scale" in common use.
Added slider to control how many connections between products are displayed. Slider varies between 1 and the actual number of connections in the current selections. Slider doesn't work properly at product group level, as it is still tracking the product connections separately. This is also why the shading is incorrect. If a product has no connections due to the slider, it is removed from the circle, but space is still reserved for it on the circle so that products don't jump around as you move the slider back and forth. They still jump a little due to the disappearing labels and points. I'd probably rather keep the products on the circle, but I haven't thought how to do that, and maybe we don't want the clutter anyway.
Figured out how to sort numerically, but then decided that alphabetic order probably makes the most sense in the real world, where products aren't given numeric descriptions.
testRadialChart3.qvw 135.0 K
I think I agree with the square root scale. For my group's purposes, the scale does not have to be exceedingly precise. It will merely be a visual reference. It would be simple enough to explain that thicker lines represent more collaborations between a given pair of vertices. Good enough.
I looked at your previous submission from earlier this afternoon. I have a few observations and questions:
1) I inadvertently tricked the chart to preserve the lines for a given data point. For instance, when I clicked on the Group 2 data point, the tables to the left restricted the results as QV does. However, when I then unchecked Group 2 in the Product Group table to the left, the chart adjusted to display the Group 2 vertex, with four edges to the Group 1, Group 8, Group 6 and Group 3 vertices, respectively. I wish I could show a screenshot. I assume it's generating the correct result. It could probably provide enough of the final effect I was hoping for. In any case, it retains the connection information without bombarding the user with the full results. That's good. I'll look into it more.
2) The input box labeled Enter values and reload for new data set shows a variable called vChanceTogether. One specification I wanted for the final report was a calculation that indicated whether the connection between a given pair of vertices was in some way statistically significant. For instance, if lot of people buy product A and a lot of people buy product B, some small set of people will by chance buy both together. But this doesn't necessarily mean that the purchase of one influences the purchase of the other. I wanted to build in a calc that would highlight if the connection under review was significant. I'm revisiting my statistics resources to find the right calculation. Was it just coincidence that one variable was called vChanceTogether, or were you anticipating a step to identify data points that were unusually well correlated with each other?
You identified some great possible solutions. Thanks, John.