I've got an application with around 65mil rows of sales info. These are attached to customers and periods.
I've got a table that is calculating a number of fields for customers that have been "lost" - there's a field that gives me the period each account was lost in.
The table calculates fine even at the highest overall level, except for one expression. This won't load at the highest granularity level, and it's only when I select a member of my sales hierarchy or a region that it produces - it just takes far too long.
The expression is to work out the last period it traded in. There isn't unfortunately a pre-calculated field that measures this (though there remains an option to calculate this in the backend if there are no other fixes), so the expression is:
The issue is that it's having to search through every single record to find the maxium fullperiod that a record attaches to. The field works, it just won't calculate at the highest level. I ignore the PeriodNumber and Year fields as these are pre-selected by the user and evidently would limit my search to just that period!
All thoughts welcome. Happy to take this into the load script if so be it, but just wondered if there was a less resource intensive way of calculating the value in the front end.
My understanding is that this information will not change whilst the data has been loaded.
As a result I think that I would be tempted to calculate on load by creating the maximum PeriodNumber and Year from the existing sales data table (resident) and store it as a table of CustomerNumber and MaxPeriodNumber.
My guess is that you have already gotten to this position already but needed some form of verification ....
Pretty much, but thought I'd throw it out there in case there was a 'simpler' solution, reading this new data in requires a bit of extra formatting and work on the model to attach it how the client was after it, but is probably a worthwhile investment given the situation.
Beside a direct pre-calculation within a table in the script it could be that a change from the datamodel (by minimizing hops between tables) could speed up such calculations. Before you should try it with modifications on your expression, maybe something like this:
I'd tried the FirstSortedValue route which didn't seem to change the response time. The Maxstring I'd assumed would perform worse as it would be a text comparison, where as I assume it was treating FullPeriod as numeric as it's a 201501 field value for example, but will give it a shot. The subfield is an interesting one.
Will give it a whirl, but think I'm busy working on the table links now for that solution. Removing the two fields from the set would need a remodel of the way in which all the dates are working, but that may be a direction to travel in anyway - there's already a fair amount of transformation going on and we want this to be a relatively low maintenance change due to other pressures but will take what you've suggested on board. Thankyou.