Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Counting instances

Hi folks,

First let me explain what kind of data I'm working with.

The original data set has the following variables and setups

[Taxonomy Category], Diag1, [Diag1 Category], Diag2, [Diag2 Category], Diag3, [Diag3 Category], Diag4, [Diag4 Category]

All of the categories are text and have 20 to 50 different categories. All of the Diag (Diagnosis Codes) are codes, mostly numeric, some with a letter at the beginning (ie E909).

I have 2 versions of the qualified data (partly because I don't know how to get a combined version)

DiagnosisTable:
CROSSTABLE (ICD, ICDCategory, 2)
LOAD recno() as ID, [Taxonomy Category],
[Diag1 Cat],
[Diag2 Cat],
[Diag3 Cat],
[Diag4 Cat]
Resident RawDiagnosisTable;

DiagnosisCode:
Crosstable (ICDDes, ICDNumber, 2)
Load recno() as CodeNumber, [Taxonomy Category],
Diag1,
Diag2,
Diag3,
Diag4
Resident RawDiagnosisTable;

I want to explore the combinations of nonblank cells between the diagnosis codes by row.

For example, how many rows have values in Diag1 AND Diag2, how many with Diag1 and diag2 and diag3? etc

I know in Excel you can use a function called COUNTBLANK and a few if/then statements to achieve this effect with the original data structure.

In Qlikview I am not aware of a function that can do this, nor am I well acquainted with the syntax for expressions.

Can anyone help?

Thanks!

16 Replies
Not applicable
Author

So for that data, we should see these kinds of numbers:

Number with 145
Number with 2 pairs
201
Number with 3 pairs
34
Number with 4 pairs
19

Sum 299

Currently I'm getting a total of 546 using the expression count(combination)

Not applicable
Author

I think I know the reason why we're seeing 546 counts instead of 299.

Consider a row that has entries in Diag1, Diag2, Diag3, Diag4.

I have a suspicion that for this row (and others), it is counting it for all combinations, thus a count for

1-2

2-3

3-4

1-4 ... and so forth. In other words, the row is not given a distinct label.

How do we remedy that?

johnw
Champion III
Champion III

Your suspicion is correct. That's what I was TRYING to do, and what made everything complicated. OK, just because we've come this far, to do what I was thinking with what your data really looks like, you just need to add an ID for the row, and to use the ID instead of the Taxonomy Category when building the extra table. I have that working just fine in an example with your data.

But I didn't realize you wanted a DISTINCT label for the row. That's why I have a separate table. I thought you WANTED to see all those combinations. If you only want a distinct label for the row, that's TRIVIAL by comparison.

if(len(Diag1),'1')&if(len(Diag2),'2')&if(len(Diag3),'3')&if(len(Diag4),'4') as Combination

Is that what you mean?

And actually, it looks like you ALWAYS fill in the codes from left to right? If so, then the ONLY possible combinations are these:

1
12
123
1234

NOT the ones we both listed before. And at that point, all you're doing is counting the number of diagnostic codes. You might as well just do this:

-rangesum(len(Diag1)>0,len(Diag2)>0,len(Diag3)>0,len(Diag4)>0) as NumberOfCodes

Here's every single one of those approaches. Maybe what you're asking for is one of them. Maybe not.

Not applicable
Author

Well I can't view your document unfortunately,

But yes I do want to see all of the combinations because we don't know a great deal about the data we're playing with. Maybe for this 300 that's how it worked out, but for the other 2 mil+ it might not be, maybe it's different in another taxonomy, I don't know yet.

You gave me an idea though, which is what I think you were trying to tell me in your last post.

RawDiagnosisTable:
LOAD [Taxonomy Category],
[Diag1 Cat], [Diag2 Cat], [Diag3 Cat], [Diag4 Cat],
Diag1, Diag2, Diag3, Diag4,
if(len(Diag1),'1')&if(len(Diag2),'2')&if(len(Diag3),'3')&if(len(Diag4),'4') as Combination
FROM C:\Users\kkorynta\Desktop\Sample.xlsx
(ooxml, embedded labels, table is Sheet1);


Load * Inline [Combination
12,
13,
14,
13,
14,
23,
24,
34,
123,
124,
134,
234,
1234
];

My output is a little funky though which I see why you mentioned it looked like it only loads from left to right.

Combination Count (Combination)

307
145
12201
131
141
231
241
341
12334
1241
1341
2341
123419

What I don't understand, is how/why the combos with a count of 1 are reported that way, shouldn't they show as zeros? Otherwise, yes this is what I was looking for.

pover
Luminary Alumni
Luminary Alumni

To make up for my poor answer yesterday, here's some script you can try:

SET ThousandSep=',';
SET DecimalSep='.';
SET MoneyThousandSep=',';
SET MoneyDecimalSep='.';
SET MoneyFormat='$#,##0.00;-$#,##0.00';
SET TimeFormat='hh:mm:ss TT';
SET DateFormat='DD/MM/YYYY';
SET TimestampFormat='DD/MM/YYYY hh:mm:ss[.fff] TT';
SET MonthNames='ene;feb;mar;abr;may;jun;jul;ago;sep;oct;nov;dic';
SET DayNames='lun;mar;mié;jue;vie;sáb;dom';

Tmp_Mapeo:
LOAD * INLINE [
Combination_Num, Map
1, 1 Pair
2, 1 Pair
3, 1 Pair
4, 1 Pair
12, 2 Pair
13, 2 Pair
14, 2 Pair
23, 2 Pair
24, 2 Pair
34, 2 Pair
123, 3 Pair
124, 3 Pair
234, 3 Pair
1234, 4 Pair
];

RawDiagnosisTable:
LOAD rowno() as RowID,
[Taxonomy Category],
Diag1,
[Diag1 Cat],
Diag2,
[Diag2 Cat],
Diag3,
[Diag3 Cat],
Diag4,
[Diag4 Cat]
FROM
Sample.xlsx
(ooxml, embedded labels, table is Sheet1);

AllCombinations:
Replace
CROSSTABLE (Combination,Exists,2)
LOAD RowID,
"Taxonomy Category",
len(Diag1) and not len(Diag2) and not (len(Diag3)) and not (len(Diag4)) as "1",
not len(Diag1) and len(Diag2) and not (len(Diag3)) and not (len(Diag4)) as "2",
not len(Diag1) and not len(Diag2) and (len(Diag3)) and not (len(Diag4)) as "3",
not len(Diag1) and not len(Diag2) and not (len(Diag3)) and (len(Diag4)) as "4",
len(Diag1) and len(Diag2) and not (len(Diag3)) and not (len(Diag4)) as "12",
len(Diag1) and not len(Diag2) and (len(Diag3)) and not (len(Diag4)) as "13",
len(Diag1) and not len(Diag2) and not (len(Diag3)) and (len(Diag4)) as "14",
not len(Diag1) and len(Diag2) and (len(Diag3)) and not (len(Diag4)) as "23",
not len(Diag1) and len(Diag2) and not (len(Diag3)) and (len(Diag4)) as "24",
not len(Diag1) and not len(Diag2) and (len(Diag3)) and (len(Diag4)) as "34",
len(Diag1) and len(Diag2) and len(Diag3) and not (len(Diag4)) as "123",
len(Diag1) and len(Diag2) and not (len(Diag3)) and len(Diag4) as "124",
len(Diag1) and not (len(Diag2)) and len(Diag3) and len(Diag4) as "134",
not (len(Diag1)) and len(Diag2) and len(Diag3) and len(Diag4) as "234",
len(Diag1) and len(Diag2) and len(Diag3) and len(Diag4) as "1234"
RESIDENT RawDiagnosisTable;

INNER JOIN (AllCombinations)
LOAD -1 as Exists AUTOGENERATE 1;
DROP FIELD Exists;

LEFT JOIN
LOAD text(Combination_Num) as Combination,
Map
Resident Tmp_Mapeo;

Drop table Tmp_Mapeo;

johnw
Champion III
Champion III


bryankoch wrote:What I don't understand, is how/why the combos with a count of 1 are reported that way, shouldn't they show as zeros?


Remove the inline table of possible combinations. I suspect that will fix the results. I think what's happening is that even when the combination doesn't exist in your diagnosis table, it's still counting the one occurrence in your inline table.

Not applicable
Author

Figured my little issue out with the counts. I overlooked that I was using a Key, which was counting from the other table. 😕 Otherwise, using

if(len(Diag1),'1')&if(len(Diag2),'2')&if(len(Diag3),'3')&if(len(Diag4),'4') as Combinations, in my load statement with

Load * Inline [Combination
12,
13,
14,
13,
14,
23,
24,
34,
123,
124,
134,
234,
1234
];

Showed me what I needed to know.

John, thanks for all the help. I really appreciate it, and if sometime in the near future I need to see all possible combinations within the row I'll know how!

And Karl, I tried yours and it works as well.

Thanks