Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a list of data that has muliple rows for each title based on different categories such as
PUBLISHER BOOK CATEGORY PAGES
Publisher A Book A Category A 10
Publisher A Book A Category B 10
Publisher A Book A Category C 10
Publisher A Book B Category A 20
Publisher A Book B Category D 20
Publisher A Book B Category B 20
So when I group this data by PUBLISHER and BOOK, I use the 'sum (distinct Pages)' command to get the real page count per BOOK. In this case it would be 10 total pages for Book A and 20 total pages for Book B.
However, when I attempt to group everything by just PUBLISHER and not include BOOK in the straight table, I get a total of 30 pages for Book A and 20 pages for Book B, which is obviously incorrect.
So how can I create an expression the will sum the pages without double counting the pages due to the categories in each row?
I think I figured it out.
In a straight table with just the PUBLISHER as a dimension, I create the following expression.
sum(aggr(sum(distinct PAGES), BOOK))
We probably need also to know how the fields are related to each other in your data model.
For example, do you have a table with BOOK and PAGES, and another table with PUBLISHER, BOOK and CATEGORY?
And what are you trying to achieve?
However, when I attempt to group everything by just PUBLISHER and not include BOOK in the straight table, I get a total of 30 pages for Book A and 20 pages for Book B, which is obviously incorrect.
I don't really understand this. How do you get results for Book A and Book B, if you don't include these as dimensions?
The data is as it appears, in a excel spreadsheet... so one row for every publisher/book/category combination
Yes, you are right... I misspoke... in the example above I get a total of 90 pages when I include just the Publisher and not the Book or Category in the straight table.
I think I figured it out.
In a straight table with just the PUBLISHER as a dimension, I create the following expression.
sum(aggr(sum(distinct PAGES), BOOK))
u can simply use sum(distinct PAGES) when Publisher is ur only dimension
no need of aggregation !!!
There are possibly multiple books per Publisher with same number of PAGES, so I wouldn't recommend that (though it works for the sample given).
But as far as I understand, one and the same book should have an unambiguous number of pages, so I also would not use sum(distinct PAGES) per book. I would just use
=sum(aggr( only(PAGES), BOOK))
(or sum(aggr(PAGES, BOOK)) )
If you have different number of pages per book, this book will not included in the calculation, while with
sum(aggr(sum(distinct PAGES), BOOK))
the book will be included multiple times, for each distinct occurence of PAGES. Not sure what you need here.
Regards,
Stefan