16 Replies Latest reply: Sep 23, 2010 3:40 PM by bryankoch RSS

    Counting instances

    bryankoch

      Hi folks,

       

      First let me explain what kind of data I'm working with.

       

      The original data set has the following variables and setups

      [Taxonomy Category], Diag1, [Diag1 Category], Diag2, [Diag2 Category], Diag3, [Diag3 Category], Diag4, [Diag4 Category]

      All of the categories are text and have 20 to 50 different categories. All of the Diag (Diagnosis Codes) are codes, mostly numeric, some with a letter at the beginning (ie E909).

       

      I have 2 versions of the qualified data (partly because I don't know how to get a combined version)

      DiagnosisTable:
      CROSSTABLE (ICD, ICDCategory, 2)
      LOAD recno() as ID, [Taxonomy Category],
      [Diag1 Cat],
      [Diag2 Cat],
      [Diag3 Cat],
      [Diag4 Cat]
      Resident RawDiagnosisTable;

      DiagnosisCode:
      Crosstable (ICDDes, ICDNumber, 2)
      Load recno() as CodeNumber, [Taxonomy Category],
      Diag1,
      Diag2,
      Diag3,
      Diag4
      Resident RawDiagnosisTable;

       

      I want to explore the combinations of nonblank cells between the diagnosis codes by row.

      For example, how many rows have values in Diag1 AND Diag2, how many with Diag1 and diag2 and diag3? etc

       

      I know in Excel you can use a function called COUNTBLANK and a few if/then statements to achieve this effect with the original data structure.

       

      In Qlikview I am not aware of a function that can do this, nor am I well acquainted with the syntax for expressions.

       

      Can anyone help?

      Thanks!

        • Counting instances
          Karl Pover

          This would be the equivalent to COUNTBLANK in QlikView:

          count(if(trim([Diag1])='' or isnull([Diag1]),[Diag1]))

          because count(null()) = 0 and the if statement returns null if the condition is not met and you don't define a value when the condition is false.

          so to count how many rows have values in Diag1 and Diag2 you could try

          count(ICDNumber)

          -

          count(if(trim([Diag1]&[Diag2])='' or (isnull([Diag1]) and isnull([Diag2])),[Diag1]&[Diag2]))

          Regards.

           

            • Counting instances
              bryankoch

              Hmm

              What kind of dimension would you recommend I set up with that?

               

                • Counting instances
                  John Witherspoon

                  I'm sorry; I don't understand the question. If you want a count of the number of rows with those three codes, that's just a single number. There's no dimension implied. You can have a dimension if you want. I just wouldn't have any idea what dimension you might want.

                    • Counting instances
                      bryankoch

                      well at some point I would like to have them grouped.

                       

                      I think for now, selecting which chart to show this in is the better question for me. Because I cannot get any output to display for either answer.

                        • Counting instances
                          bryankoch

                          I guess to better explain, the output I am looking for is

                           

                          Combo Count

                          1-2

                          1-3

                          1-4

                          2-3

                          3-4

                          1-2-3

                          2-3-4

                          ... and so forth

                           

                          And somewhere down the line be able throw a sort to where I can look at persay all the combos in a specific Taxonomy or all the combos for a specific Diagnosis Category.

                            • Counting instances
                              John Witherspoon

                              Ah, so you want to see every POSSIBLE combination of at least two diagnostic codes? All of these? As values for a dimension?

                              12
                              13
                              14
                              23
                              24
                              34
                              123
                              124
                              134
                              234
                              1234

                              I don't know of any good way to make QlikView iterate over all the combinations of a dimension's values as a dimension. Below is pure brute force, but it might work, particularly since the number of combinations is small enough to get away with this:

                              AllCombinations:
                              CROSSTABLE (Combination,Exists)
                              LOAD
                              "Taxonomy Category"
                              ,len(Diag1) and len(Diag2) as "12"
                              ,len(Diag1) and len(Diag3) as "13"
                              ...
                              ,len(Diag1) and len(Diag2) and len(Diag3) and len(Diag4) as "1234"
                              RESIDENT RawDiagnosisTable
                              ;
                              INNER JOIN (AllCombinations)
                              LOAD -1 as Exists
                              AUTOGENERATE 1
                              ;
                              DROP FIELD Exists
                              ;

                              At this point, AllCombinations should be a linkage table between the Taxonomy Category and all existing combinations of your diagnosis codes. At that point, you should be able to do this:

                              Dimension = Combination
                              Expression = count(distinct "Taxonomy Category")

                              All of this is very untested. If it doesn't work like I wrote it and you can't fix it, maybe upload a file with some sample data for me to fiddle with.

                                • Counting instances
                                  bryankoch

                                  Well, creating the dimensions worked perfectly:

                                  AllCombinations:
                                  Replace
                                  CROSSTABLE (Combination,Exists)
                                  LOAD "Taxonomy Category",
                                  len(Diag1) and len(Diag2) as "12",
                                  len(Diag1) and len(Diag3) as "13",
                                  len(Diag1) and len(Diag4) as "14",
                                  len(Diag2) and len(Diag3) as "23",
                                  len(Diag2) and len(Diag4) as "24",
                                  len(Diag3) and len(Diag4) as "34",
                                  len(Diag1) and len(Diag2) and len(Diag3) as "123",
                                  len(Diag1) and len(Diag2) and len(Diag4) as "124",
                                  len(Diag1) and len(Diag3) and len(Diag4) as "134",
                                  len(Diag2) and len(Diag3) and len(Diag4) as "234",
                                  len(Diag1) and len(Diag2) and len(Diag3) and len(Diag4) as "1234"
                                  RESIDENT RawDiagnosisTable;

                                  INNER JOIN (AllCombinations)
                                  LOAD -1 as Exists AUTOGENERATE 1;
                                  DROP FIELD Exists;

                                   

                                  However, it's not counting correctly.

                                  When I use distinct it returns 6 (the number of taxonomies I have currently loaded in the file) and when I remove it, enters in the total in every entry for the dimension. I thought perhaps it was because that Taxonomy is now a "key" value and wasn't counting the correct field, but turns out that wasn't the case I don't think.

                                  I did attach the excel file (a much smaller version) but there's 100 rows from 3 different taxonomies.

                                    • Counting instances
                                      bryankoch

                                      Disregard that previous sample, use this one instead:

                                      • Counting instances
                                        bryankoch

                                        So for that data, we should see these kinds of numbers:

                                         

                                        Number with 145
                                        Number with 2 pairs
                                        201
                                        Number with 3 pairs
                                        34
                                        Number with 4 pairs
                                        19

                                        Sum 299

                                         

                                        Currently I'm getting a total of 546 using the expression count(combination)

                                         

                                         

                                         

                                          • Counting instances
                                            bryankoch

                                            I think I know the reason why we're seeing 546 counts instead of 299.

                                             

                                            Consider a row that has entries in Diag1, Diag2, Diag3, Diag4.

                                            I have a suspicion that for this row (and others), it is counting it for all combinations, thus a count for

                                            1-2

                                            2-3

                                            3-4

                                            1-4 ... and so forth. In other words, the row is not given a distinct label.

                                             

                                            How do we remedy that?

                                              • Counting instances
                                                John Witherspoon

                                                Your suspicion is correct. That's what I was TRYING to do, and what made everything complicated. OK, just because we've come this far, to do what I was thinking with what your data really looks like, you just need to add an ID for the row, and to use the ID instead of the Taxonomy Category when building the extra table. I have that working just fine in an example with your data.

                                                But I didn't realize you wanted a DISTINCT label for the row. That's why I have a separate table. I thought you WANTED to see all those combinations. If you only want a distinct label for the row, that's TRIVIAL by comparison.

                                                if(len(Diag1),'1')&if(len(Diag2),'2')&if(len(Diag3),'3')&if(len(Diag4),'4') as Combination

                                                Is that what you mean?

                                                And actually, it looks like you ALWAYS fill in the codes from left to right? If so, then the ONLY possible combinations are these:

                                                1
                                                12
                                                123
                                                1234

                                                NOT the ones we both listed before. And at that point, all you're doing is counting the number of diagnostic codes. You might as well just do this:

                                                -rangesum(len(Diag1)>0,len(Diag2)>0,len(Diag3)>0,len(Diag4)>0) as NumberOfCodes

                                                Here's every single one of those approaches. Maybe what you're asking for is one of them. Maybe not.

                                                  • Counting instances
                                                    bryankoch

                                                    Well I can't view your document unfortunately,

                                                    But yes I do want to see all of the combinations because we don't know a great deal about the data we're playing with. Maybe for this 300 that's how it worked out, but for the other 2 mil+ it might not be, maybe it's different in another taxonomy, I don't know yet.

                                                     

                                                    You gave me an idea though, which is what I think you were trying to tell me in your last post.

                                                    RawDiagnosisTable:
                                                    LOAD [Taxonomy Category],
                                                    [Diag1 Cat], [Diag2 Cat], [Diag3 Cat], [Diag4 Cat],
                                                    Diag1, Diag2, Diag3, Diag4,
                                                    if(len(Diag1),'1')&if(len(Diag2),'2')&if(len(Diag3),'3')&if(len(Diag4),'4') as Combination
                                                    FROM C:\Users\kkorynta\Desktop\Sample.xlsx
                                                    (ooxml, embedded labels, table is Sheet1);


                                                    Load * Inline [Combination
                                                    12,
                                                    13,
                                                    14,
                                                    13,
                                                    14,
                                                    23,
                                                    24,
                                                    34,
                                                    123,
                                                    124,
                                                    134,
                                                    234,
                                                    1234
                                                    ];

                                                    My output is a little funky though which I see why you mentioned it looked like it only loads from left to right.

                                                     

                                                    Combination Count (Combination)

                                                    307
                                                    145
                                                    12201
                                                    131
                                                    141
                                                    231
                                                    241
                                                    341
                                                    12334
                                                    1241
                                                    1341
                                                    2341
                                                    123419

                                                    What I don't understand, is how/why the combos with a count of 1 are reported that way, shouldn't they show as zeros? Otherwise, yes this is what I was looking for.

                                                      • Counting instances
                                                        Karl Pover

                                                        To make up for my poor answer yesterday, here's some script you can try:

                                                        SET ThousandSep=',';
                                                        SET DecimalSep='.';
                                                        SET MoneyThousandSep=',';
                                                        SET MoneyDecimalSep='.';
                                                        SET MoneyFormat='$#,##0.00;-$#,##0.00';
                                                        SET TimeFormat='hh:mm:ss TT';
                                                        SET DateFormat='DD/MM/YYYY';
                                                        SET TimestampFormat='DD/MM/YYYY hh:mm:ss[.fff] TT';
                                                        SET MonthNames='ene;feb;mar;abr;may;jun;jul;ago;sep;oct;nov;dic';
                                                        SET DayNames='lun;mar;mié;jue;vie;sáb;dom';

                                                        Tmp_Mapeo:
                                                        LOAD * INLINE [
                                                        Combination_Num, Map
                                                        1, 1 Pair
                                                        2, 1 Pair
                                                        3, 1 Pair
                                                        4, 1 Pair
                                                        12, 2 Pair
                                                        13, 2 Pair
                                                        14, 2 Pair
                                                        23, 2 Pair
                                                        24, 2 Pair
                                                        34, 2 Pair
                                                        123, 3 Pair
                                                        124, 3 Pair
                                                        234, 3 Pair
                                                        1234, 4 Pair
                                                        ];

                                                         

                                                        RawDiagnosisTable:
                                                        LOAD rowno() as RowID,
                                                        [Taxonomy Category],
                                                        Diag1,
                                                        [Diag1 Cat],
                                                        Diag2,
                                                        [Diag2 Cat],
                                                        Diag3,
                                                        [Diag3 Cat],
                                                        Diag4,
                                                        [Diag4 Cat]
                                                        FROM
                                                        Sample.xlsx
                                                        (ooxml, embedded labels, table is Sheet1);

                                                         

                                                        AllCombinations:
                                                        Replace
                                                        CROSSTABLE (Combination,Exists,2)
                                                        LOAD RowID,
                                                        "Taxonomy Category",
                                                        len(Diag1) and not len(Diag2) and not (len(Diag3)) and not (len(Diag4)) as "1",
                                                        not len(Diag1) and len(Diag2) and not (len(Diag3)) and not (len(Diag4)) as "2",
                                                        not len(Diag1) and not len(Diag2) and (len(Diag3)) and not (len(Diag4)) as "3",
                                                        not len(Diag1) and not len(Diag2) and not (len(Diag3)) and (len(Diag4)) as "4",
                                                        len(Diag1) and len(Diag2) and not (len(Diag3)) and not (len(Diag4)) as "12",
                                                        len(Diag1) and not len(Diag2) and (len(Diag3)) and not (len(Diag4)) as "13",
                                                        len(Diag1) and not len(Diag2) and not (len(Diag3)) and (len(Diag4)) as "14",
                                                        not len(Diag1) and len(Diag2) and (len(Diag3)) and not (len(Diag4)) as "23",
                                                        not len(Diag1) and len(Diag2) and not (len(Diag3)) and (len(Diag4)) as "24",
                                                        not len(Diag1) and not len(Diag2) and (len(Diag3)) and (len(Diag4)) as "34",
                                                        len(Diag1) and len(Diag2) and len(Diag3) and not (len(Diag4)) as "123",
                                                        len(Diag1) and len(Diag2) and not (len(Diag3)) and len(Diag4) as "124",
                                                        len(Diag1) and not (len(Diag2)) and len(Diag3) and len(Diag4) as "134",
                                                        not (len(Diag1)) and len(Diag2) and len(Diag3) and len(Diag4) as "234",
                                                        len(Diag1) and len(Diag2) and len(Diag3) and len(Diag4) as "1234"
                                                        RESIDENT RawDiagnosisTable;

                                                        INNER JOIN (AllCombinations)
                                                        LOAD -1 as Exists AUTOGENERATE 1;
                                                        DROP FIELD Exists;

                                                        LEFT JOIN
                                                        LOAD text(Combination_Num) as Combination,
                                                        Map
                                                        Resident Tmp_Mapeo;

                                                        Drop table Tmp_Mapeo;

                                                          • Counting instances
                                                            bryankoch

                                                            Figured my little issue out with the counts. I overlooked that I was using a Key, which was counting from the other table. :-/ Otherwise, using

                                                            if(len(Diag1),'1')&if(len(Diag2),'2')&if(len(Diag3),'3')&if(len(Diag4),'4') as Combinations, in my load statement with

                                                            Load * Inline [Combination
                                                            12,
                                                            13,
                                                            14,
                                                            13,
                                                            14,
                                                            23,
                                                            24,
                                                            34,
                                                            123,
                                                            124,
                                                            134,
                                                            234,
                                                            1234
                                                            ];

                                                             

                                                            Showed me what I needed to know.

                                                            John, thanks for all the help. I really appreciate it, and if sometime in the near future I need to see all possible combinations within the row I'll know how!

                                                             

                                                            And Karl, I tried yours and it works as well.

                                                            Thanks

                                                          • Counting instances
                                                            John Witherspoon

                                                             


                                                            bryankoch wrote:What I don't understand, is how/why the combos with a count of 1 are reported that way, shouldn't they show as zeros?


                                                            Remove the inline table of possible combinations. I suspect that will fix the results. I think what's happening is that even when the combination doesn't exist in your diagnosis table, it's still counting the one occurrence in your inline table.

                                      • Counting instances
                                        John Witherspoon

                                        Assuming you're working with your new tables instead of your raw table, maybe something like this to count how many with Diag1, Diag2 and Diag3?

                                        -sum(aggr(count({<ICDDes={'Diag1','Diag2','Diag3'}>} distinct ICDDes)=3,CodeNumber))

                                        The set analysis says we're "selecting" the three diagnosis codes of interest. The aggr(...,CodeNumber) says we're making a "table" by CodeNumber. For each row in this table, we check to see if the number of distinct diagnosis codes is 3, which means that that "row" has all three diagnosis codes of interest. A "true" value is -1, so if we sum these up through the whole table, then flip the sign, we should have a count of all CodeNumbers that have these three diagnosis codes.