Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi guys
I'm having quite a challenge integrating a short term fully detailed data set with a long term aggregated data set.
They contain the same core data (Revenue/Cost), but the aggregated data set only contain the overall dimensions and can thus be compressed to 4% of the rows with fewer columns. The TransactionHead and TransactionDetails tables are thus merged into one aggregated table.
The purpose is to have 1 year of full detail (2009-08 to 2010-08, 180 million lines) and three years of aggregated data (2008-01 to 2010-08, 20 million lines), thus saving memory and increasing performance.
The two solutions I see both have pros and cons:
Link Table
Will require a large key with extremely many unique values.
Concatenate
Concatenating aggregated data onto TransactionDetails, will require the CaseNumber key to be added to aggregated data, thus four doubling the number of rows.
This must be a fairly common problem, how do you solve it?
Best regards
Frederik
Still struggling with this issue, anyone got an idea?
Hi,
What is highest level of granularity in TransationAggregated table.
Regards
Ashish Srivastava
Hi Ashish
The TransactionAggregated table has been aggregated using GROUP BY on:
- Department
- Product
- Date (in full months)
- CustomerID (xCaseNumber should have been xCustomerID, sorry)
Revenue and Cost is summed up.
I am not asking about for which filed you have to do group by.
I am asking about what is highest level of granularity level of your TrancationAggregated table.
Regards
Ashish Srivastava
Hi Ashish
Assuming you mean which field has the highest number of unique values, then it is CustomerID with +250.000 unique customers.
If you mean which field has the highest extent of detail, as in Country -> Region -> City, then id say there's no field with the most details, as they they're all of a different kind. (not related directly like Country/Region/City)