15 Replies Latest reply: May 16, 2012 3:16 PM by B Aydin RSS

Aggregation problem

Esa Tikka

Hi all,

 

I have data as QUARTER, MONTH, RESID. For each RESID I need to count its occurrences in the data set but only if it occurs more than, say, 5 times. I have figured out how to accomplish this when counting all occurrences of RESID with the above mentioned condition by using the following expression:

 

=sum(aggr(nodistinct if (count(RESID)>5, 1, 0), MONTH, RESID))

 

However, if I remove the nodistinct keyword (or change it to distinct) to count distinct RESIDs instead of all occurrences I get very odd results as many RESID that do occur more than five times don't get any value on RESID level and thus sum up as zero.

Edit: I know it should not be zero as I also have another expression in the pivot showing the number of occurrences ( =count(RESID) ).

 

This might be something rather basic as I feel I haven't really grasped some of concepts of QV that behave differently from more traditional products...

 

Thanks in advance,

 

Esa

  • Aggregation problem
    Bin Xue

    Hi Esa,

     

    A simple example would illustrate what's the difference between distinct and nodistinct aggr. Say if we have a table

    Table1:

    key, value

    1,a

    1,b

    2,c

    2,d

    3,a

     

    aggr(distinct count(value),key) would produce the following output

    key, distinct count(value)

    1,2

    2,2

    3,1

     

    aggr(nodistinct count(value),key) would produce the following output

    key, nodistinct count(value)

    1,2

    1,2

    2,2

    2,2

    3,1

     

    If you do not specify distinct or nodistinct keyword, distinct would be assumed. That's why when you remove your nodistinct keyword, the sum would be different as a different table is produced by aggr function. Hope it helps.

     

    Regards,

    Xue Bin

  • Re: Aggregation problem
    Henric Cronström

    I assume that count(RESID)>5 should be defined globally and not just within each grouping defined by the group by fields? If so, I suggest you put the if function outside the aggregation:

     

    if(count(RESID)>5, <aggregation>, null())

     

    Secondly, you have it in a pivot table. If you have Month and RESID as dimensions, then I do not understand why you use the Aggr()-function. It would work straight off with

     

    if(count(total RESID)>5, count(RESID), null())

     

    HIC

    • Aggregation problem
      Esa Tikka

      I do want my counting to be defined on group level, so that the user can see the number of RESIDs on the desired level. The purpose is to follow the activity of resources on yearly, quarterly and monthly level, and to see the number of resources that have more than minimal amount of activity.

       

      I have RESID as a dimension merely for the purpose to be able to verify my results, in the final version I will drop it off and just have the (time) dimensions plus the number of individual resources that have had more than minimal amount of acitivty within the time period the grouping.

       

      Trying

       

      =if(count(total RESID)>5, count(RESID), null())

       

      produces exactly the same results as having just

       

      =count(RESID)

       

      but without "total"

       

      =if(count(RESID)>5, count(RESID), null())

       

      it does implement what I want, but still it does not solve the requirement of counting only distinct RESIDs fulfilling the condition. That is why I ended up putting "1" instead of "count(RESID)" in the THEN part of the IF clause, and that's probably why I ended up messing around with the aggregation (to get all the 1s added up).

       

      I'll keep on trying to wrap my brain around this. I might have already solved this in Cognos (which I have more experience on) and I guess the biggest obstacle I have is my failure at forgetting the Cognos way of doing things...

       

      Br,

      Esa

      • Re: Aggregation problem
        Henric Cronström

        Then you should try =if(count(RESID)>5, count(distinct RESID), null())

         

        HIC

        • Aggregation problem
          Esa Tikka

          Henric Cronström wrote:

           

          Then you should try =if(count(RESID)>5, count(distinct RESID), null())

           

          HIC

          Unfortunately that results the same numbers as with having just

          =count(distinct RESID)

          thus disabling the condition count>5 from the aggregation. That's exactly what I had before I received the additional requirement of weeding out those resources that have minimal activity.

          So basically my problem is how I get that additional condition to be applied to all aggregation.

           

           

          I'll try to attach an annotated pic with screen copies from the report. In those,

          "Assigned resources"=if(count(RESID)>5, count(distinct RESID), null())

          and

          "Assignments"=count(RESID)

          http://onelove.doesntexist.org/~etikka/extra/sample_results1.GIF

          In the upper part of the pic you see that for the month "huhti" none of the resources have over 5 assignments (thus count(RESID)<=5 meaning do not count those).

           

          In the lower part where the months have been collapsed "Assigned resources" results 3 for month "huhti" although it should be zero/null. In fact, the results are quite odd since the resource with the ID starting "f9a..." should not be counted at all even on the quarter level as there are not enough assignments (occurrences in the data) to qualify that.

           

          Thanks for your time anyway.

           

          Br,

          Esa

          • Re: Aggregation problem
            Esa Tikka

            I guess that from the screenshot it becomes clear that my issue is actually caused by having the expression evaluated always on the grouped numbers, not the data on the detail level or even on the lowest level in the dimension hierarchy in the pivot.

             

            Could someone please tell me if the thing I'm trying to do is somehow possible to accomplish directly in QV or is it implicitly "forbidden" to do such things with this tool?

             

            Browsing the docs it seems that grouping data is possible in load scripts so should I there calculate the values for some base interval and then just use that in the pivot?

             

            Br,

            Esa

             

            Message was edited by: Esa Tikka after reading the manual...

          • Aggregation problem
            B Aydin

            Hi,

             

            In the pivot tabIe (which is seen in your screenshots above), could you try the following expression for 'Assigned resources'?

             

            =sum(

                 if(aggr(count(ResID)>5, Year,Quarter,Month,ResID)=-1,1,0)

            )

             

             

            Regards,

            • Re: Aggregation problem
              Esa Tikka

              Ibaydin,

              On ResID level that works ok, but on Quarter and Year levels it gives way too high values, which I think is because the same ResID is counted many times. I don't quite know how the aggregation works in QV but I would guess that the same ResID occurs under multiple Months thus affecting Quarter level, and the same thing happens also between Quarter and Year.

               

              But thanks for providing something that works at least with limited level of grouping. I just need to put the different groupings to different pivots - that's a bit ugly but at least we now get figures that look sane. I need to do some double-checking, of course, but surely that looks promising!

               

              Br,

              Esa

  • Re: Aggregation problem
    B Aydin

    Hi Esa,

     

    I tried it with a dataset I made up, and I thought it was working. But, may be, I did not understand exactly what you are asking.

     

    Anyway, if you like to have a look at my test application, it is attached. It is trying to find out the number of RESIDs whose COUNTs are greater than 2 for a given YEAR, QTR, MONTH combination.

     

    Best regards,

    • Re: Aggregation problem
      Esa Tikka

      Hi Idaydin,

      I checked out your dataset and it would seem to me that with it you get actually too little number for 2008/Q1, as it should be two (both bba and fba qualify) but at least my QV shows the result as only one.

       

      The ultimate difference between your data and my data is that in my data most of the resources exceed the limit every month. See what happens if you add three entries for RESID fba to months jan, mar and apr on 2008. With the initial data you get 3 for the yearly total and one and two for Q1 and Q2 respectively. The resources behind the numbers are bba, dba and fba. With the altered data (which does not add any active resources) you get 6 for yearly total and 3 for both quarters even though none of the figures should have changed (fba was qualified for counting already with the initial data).

       

      Br,

      Esa

      • Re: Aggregation problem
        B Aydin

        hi,

         

        I tried another solution. It is attached.

         

        Best regards,

         

        http://quickdevtips.blogspot.com

        • Re: Aggregation problem
          Esa Tikka

          Hi Ibaydin,

          That seems like a great solution judging from your test project, but so far I've had trouble incorporating that in my project. Could this be rather heavy solution as my data size is around 1,7 million rows and after getting all pieces right qv.exe is running 100% on this 16-core piece of HW after almost 20 minutes it does still go on.

           

          Anyway, that was a fresh angle at the task and I learned some new things here, again. Nice. I'll go and double-check my work with a limited data set.

           

          Edit: BTW, is there a specific reason (other than simple typo) for the count expression being

          =pick(Dimensionality(),$(=vLevel2Formula),$(=vLevel3Formula),$(=vLevel4Formula),$(=vLevel4Formula))

          and not

          =pick(Dimensionality(),$(=vLevel1Formula),$(=vLevel2Formula),$(=vLevel3Formula),$(=vLevel4Formula))

           

          Edit 2: Yeah, got it working, and it seems like this approach is very sensitive to the amount of data. With <30000 rows there's no perceivable delay, but with 150 000 rows it already takes several seconds. Unfortunately it's not going to work for our data.

           

          Br,

          Esa

        • Re: Aggregation problem
          Esa Tikka

          Seems like the slowness was due to one additional dimension field that came from an joined table. During my attempt to pre-calculate the conditional counts in load script I moved that field to the same table as the rest of the data and suddenly the solution with pick(Dimensionality(),...) worked like a charm.

           

          Thanks a lot Ibaydin for your help!

           

          Br,

          Esa