Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello;
I am working on a set of data containing multiple dates. The goal is for each item to define what the maximum date is.
I am using therefore the apply map function.
Could you please let me know if this is the most efficient methodology to do so.
Many thanks in advance for your help and have a great day.
Kristel.
Attached you will find the spreadsheet used as an example and the qvd file.
Hope you will be able to read this is.
If not here is the script;
Regiongrouping:
mapping load
Region,
(max(orderdate))
FROM
(biff, embedded labels)
group by Region;
OderData:
LOAD Region,
orderdate,
applyMap ('Regiongrouping', Region) as [Max oderdate],
value,
comment
FROM
(biff, embedded labels);
To make sure the load is focussing on the max date only we could close with:
Where orderdate=[Max oderdate];
But is it the best way???
I'd do it a little bit simpler, with one time access to the source data and no applymap:
OderData:
LOAD
Region,
orderdate as [Max oderdate],
value,
comment
FROM
//
INNER JOIN (OderData) LOAD
Region,
max(orderdate) as [Max oderdate]
RESIDENT OderData
GROUP BY Region;
Another possibility would be to do this:
OrderData:
LOAD Region,
orderdate,
Region&'-'&date(orderdate,'YYYYMMDD') as [RegionIndex],
value,
comment
FROM
(biff, embedded labels);
RegionLast:
LOAD Region,
max(orderdate) as [Max order date]
Resident OrderData
Group by Region;
RegionLast1:
load
Region&'-'&date([Max order date],'YYYYMMDD') as RegionIndex,
'1' as Region_LastFlag
Resident RegionLast;
DROP Table RegionLast;
left join (OrderData)
load
RegionIndex,
Region_LastFlag as Region_IsLast
resident RegionLast1;
drop table RegionLast1;
RegionDetailsLast:
load
Region,
orderdate,
value,
Region_IsLast,
comment
resident OrderData
where Region_IsLast = '1';
drop table OrderData;
What do you think?
And now when you have different scenario, how do you evaluate them on efficiency?
Thanks.
Kristel
Hello Kristel,
Instead of doing it so complex,
just do one thing.Use the aggr() function.
No need to use any mapping load.Use a straight table .add one more calculated dimension.in that use the aggr() function.In the expression just write any 1 and then hide the expression.You will find your desired answer.
the aggr() function will be like =aggr(max(orderdate),Region)
Hope u understood.
Kristel,
I don't see why do you need a long and complex script while you can have it short and simple.
Biren Agrawalla,
As a rule, it is better to have actual fields rather than calculated dimensions - same result but better performance. Still your solution makes sense if the max date is not the absolute max date per Region but within current date selection. Even in this case, I'd rather move this expression from calculated dimension to expression. And, in expression, no need to aggr() because Region is a dimension already, simply:
max(orderdate)
Hi,
the aggr() approach is a fast solution to get results on the fly. However, as data volume increases, having a calculated dimension can become so heavy for your application, impacting in performance times. So if you have access to script is better to make as much pre-calculations as you can in script,
regards
Hi Micheal,
Thanks for the solution.But I know this is not a properway of doing as we have many other ways as well.
But I just posted my thought because this can also be a way of solving.
Its better if we will come to know about as many possible types of solutions for a problem.
Thanks
Regards,
Biren