Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a requirement which is pushing me to use some of the tools for dealing with big data. I have little practical experience with the available methods, and was hoping the community could help me get started by vetting out what approach might work best.
The source data is a "data cube" (known as Analysis services in Excel) and served through Impala. The platform that was used to build this cube is called AtScale, but that is probably not important.
The cube has 122 dimensions, most of them hierarchical (year > yearmonth, Org L1 > Org L2 >... Org L10, Product Category > Product Type > Product) you get the idea. These can be pulled in as separate dimensional tables with a key at the most granular level tying them to the measures in the fact table.
The cube has around 25 million rows
The Impala query results are limited to 200,000 rows
My main issue is that any level of detail will go over the 200,000 row limit. Trying to get a count of rows for every possible combination of dimension (keys) will also go over the 200,000 row limit. I'm not sure how to approach this.