4 Replies Latest reply: Apr 23, 2013 1:56 PM by stefanladenthin RSS

    problem with double aggregation

      Hello,

      i'm trying to calculate some avg. within some calculated dimensions.

      Therefor i need to know for every user in my user table how many users share the same channel_key with him.

       

      so in a simplified form my users table would look like this:

       

      user_id|u_mega_monthly_channel_spend_key

       

      Using u_mega_monthly_channel_spend_key as a dimension is not an option in this case, sinc there are already some complex calculated dimensions in place instead.

       

      But actually, from what i understand the following formula should add the user_count per u_mega_monthly_channel_spend_key to the user

       

      aggr(aggr(count({1}DISTINCT user_id),u_mega_monthly_channel_spend_key),user_id)

       

      This should enable me then to calculated the AVGs per user within my calc-dim.

       

      But somehow the value shows up only in one of the columns of the calc-dim.

       

      If i create a test-table which has just the user_id as dimension, then i see, that for each u_mega_monthly_channel_spend_key only one user get's assigned the total count value, the others just show '-' for the expression.

       

      I'm trying to solve this for 2days now, so any help would be really appreciated so much, so huge thanks in advance already .

       

      PS: My data-model is extremely complex, so adding the solution with consistent fake-data is atm not an option

        • Re: problem with double aggregation
          Gysbert Wassenaar

          Maybe this does what you want:

           

          avg(aggr(count({1}DISTINCT user_id),u_mega_monthly_channel_spend_key)

           

          If not please post a sample document. See this document for how to prepare an example with reduced and (optionally) scrambled data.

            • Re: problem with double aggregation

              Hi Gysbert,

              i already tried this as well, doesn't help.

               

              I've created a reduced test-document.

              On the Net Margin % table u can see the AVG users / Channel only being calculated for the "Worse 40%"-Fractile.

              When u Create a table with user_id as dim, u'll see that the aggregation was only computed for one user of the channel.

              I'm MIssing the value to calculate the AVG CPA column correctly.

               

              Any more ideas? (The problem is also that i can't compute this field on sql-side since the grouping of the utm-tags which then leads to a channel key is read from an excel file that get's updated by the marketing guys daily and is joined with the sql-data in the loadscript...)

                • Re: problem with double aggregation
                  Gysbert Wassenaar

                  I'm not sure what you want and I'm even less sure it's possible. I can get you the same AVG user / Channel value for all four fractiles by adding a NODISTINCT to the aggr function:

                   

                  avg( aggr(NODISTINCT count({1} DISTINCT user_id),u_mega_monthly_channel_spend_key))

                   

                  This can only return one value since it's aggregated only over one dimension, i.e.  u_mega_monthly_channel_spend_key. And it's not possible to aggregate over a calculated dimension. So an average for each fractile isn't possible this way.

                   

                  I'm out of ideas on this one.

                    • Re: problem with double aggregation

                      Gysbert u're a genious!

                      That's exactly what i wanted.. of course as long as there is only one channel selected this will get the same result for all fractiles.

                      But when i selected several one, then i have different sets of channels in the fractiles.

                      I still don't fully understand why i have to add nodistinct, which i didn't need to add to the avg spend per user:

                      avg(aggr(sum(Spend),user_id) )

                      there it worked as suspected, but for the users-in-same-channel it didn't.

                      So if you could tell me the difference between this and the users-in-same-channel formel :

                      avg(aggr(aggr(nodistinct count({1}DISTINCT user_id),u_mega_monthly_channel_spend_key),user_id))

                      that would be very enlightening.

                      But huge thx already for the solution, this really helps alot!!!