Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a simple table consisting of invoice ids and invoice fruits.
Invoice ID | Fruit |
A | Apple |
A | Orange |
A | Plum |
A | Tomato |
B | Peach |
B | Apple |
B | Plum |
C | Kumquat |
C | Grapes |
C | Orange |
C | Pear |
C | Plum |
Now I want to put the co-occurrences of all fruits into a pivot table where each co-occurrence is based on the distinct number of invoices where they co-occur. So, using Apples and Plums as an example, the apple/plum or plum/apple combination would appear on 2 invoices (A and B) while the apple/orange or orange/apple would appear on 1 invoice (A only). The straight table result set would look something like the following table. I do not want to hard-code fruits in my set analysis. I just want to be able to take the set of data and return the distinct invoice count based on all of the co-occurrences. To date I have not come up with anything using set analysis. Maybe there is something elegant that eludes me. Any ideas that may be helpful? Thank you.
Item | Co-occuring Item | Count of Invoices |
Apple | Orange | 1 |
Apple | Peach | 1 |
Apple | Plum | 2 |
Apple | Tomato | 1 |
Grapes | Kumquat | 1 |
Grapes | Orange | 1 |
Grapes | Pear | 1 |
Grapes | Plum | 1 |
Kumquat | Grapes | 1 |
Kumquat | Orange | 1 |
Kumquat | Pear | 1 |
Kumquat | Plum | 1 |
Orange | Apple | 1 |
Orange | Grapes | 1 |
Orange | Kumquat | 1 |
Orange | Pear | 1 |
Orange | Plum | 2 |
Orange | Tomato | 1 |
Peach | Apple | 1 |
Peach | Plum | 1 |
Pear | Grapes | 1 |
Pear | Kumquat | 1 |
Pear | Orange | 1 |
Pear | Plum | 1 |
Plum | Apple | 2 |
Plum | Grapes | 1 |
Plum | Kumquat | 1 |
Plum | Orange | 2 |
Plum | Peach | 1 |
Plum | Pear | 1 |
Plum | Tomato | 1 |
Tomato | Apple | 1 |
Tomato | Orange | 1 |
Regards,
Mark
Want to try this? ...
source_data:
load
*
from https://community.qlik.com/t5/App-Development/Co-occurrence-table-Challenge/td-p/1853232
(html, utf8, embedded labels, table is @1)
;
left join(source_data)
load
[Invoice ID],
Fruit as Co_occuring_item
resident source_data
;
remove_dup:
noconcatenate load *
resident source_data
where Fruit <> Co_occuring_item;
drop table source_data;
Table dimensions: Fruit & Co_occuring_item
Measure: Count(distinct [Invoice ID])
Hi Steve,
Thank you for your quick response! I would normally consider this approach or some cross join attempt or even having it done very early at the SQL server level, but I am inheriting an app where, although not ideal, it would be preferable to do it in set analysis. If such an approach is not possible, then I will retool using a scripting approach.
If anyone has suggestions, let me know.
Thanks,
Mark
As far as I know this can't be done in set analysis alone. You will need to do something in the load script to create a secondary Fruit field. But you can leave it open to see if anyone else has an idea on a trick.
I'm almost positive sure you won't be able to get another Fruit to field that is a cross join with original without doing in the load script, but i've seen crazier tricks done by others so maybe...