I need to calculate YTD totals for this year and previous year. Fiscal year runs from october 2010 -->september 2011.
I created a SQL view to calculate YTD totals which works fine but due to performance issues, I am trying to convert it into a ETL. SQL is similar to as below:
select a.yr, a.Dt, a.x1, a.x2, sum(b.cnt) as ytd_total from Fact_tbl a join Fact_tbl b
on a.yr = b.yr and
a.x1 =b.x1 and
a.x2=b.x2 and
a.dt >= b.dt
join time_dim c
on a.yr=c.yr
and a.dt=c.date
and (a.dt between c.year_start_date and c.year_end_date)
group by a.yr, a.Dt, a.x1, a.x2 )
So, in my ETL I am using a tmap and doing a inner join between 2 fact_tbl which have data for current year and previous year. Also, I am doing a inner join between the lookup flow of fact_tbl and time_dim. After this I am having a hard time implementing the below:
1. condition to check if Dt is between the year start & end dates.
I have got a filter on the input file in tmap which verifies that the Dt field is a date after the Year_start_date and before Year_end_date. I feel this condition is not working.
2. Calculating ytd_total.
I am using this calculation for ytd_total.. row9.Dt.after(row8.Dt)|| row9.Dt.equals(row8.Dt)? (row8.cnt+ row9.cnt): 0
I dont think this is the right way to calculate the totals as I need to do a group by too.
Please let me know if you have any thoughts on doing this?
Regards,
What are the settings for each of the lookups in tMap? In particular, the Match Model in both cases should be All Matches (as well as the Join Model being Inner Join).
Hi Alevy,
One last thing..
In the ETL, before aggregation I did a cross join with time dimension table and created a record for each date. This created a huge data set and I am getting a java memory error when I run it. So I decided to aggregate initially and then include the missing /skipped dates. I cannot get the logic that will get the YTD total from previous date for a skipped date.
Any suggestions??
Thanks.
I suggest using tLoop --> tIterateToFlow to create a flow of all the dates you need (in ascending order). Then do a left-join lookup of the values you have by date, giving you a flow of all dates but with the values for some dates being null. Follow that with a tJavaRow with code like the following: output_row.DT = input_row.DT; if (input_row.cnt!=null) output_row.cnt = input_row.cnt; That will pass through the values that you do have but where you are missing a value, the previous row's value will be re-used.
Alevy, I used the code that you mentioned and is working great. But the issue I am having is, I have multiple groups of records as mentioned below and 2 fiscal years for each group. In each group, a missing value should be obtained from previous row until the end of fiscal year. At the begining of fiscal year, if the value is 0 it should remain 0 until there is a value. yr dt x1 x2 cnt 2010 10/1/2009 toy1 ca 1 2010 10/2/2009 toy1 ca 8 ----- ----- --- -- 2010 9/30/2010 toy1 ca 40 2009 10/1/2008 toy2 vt 1 2009 10/2/2008 toy2 vt 1 2009 10/3/2008 toy2 vt 5 - - - - - - - - - - - - - -
But using the above code, for few groups previous row value is being passed to another group or the same group if the value at beginning of fiscal year is 0. I have included the condition to check if the date is in between the fiscal year start date & fiscal year end date. I have been trying couple of things, but it is not working as I expect. Please let me know if you have any thoughts.
Try modifying the tJavaRow as follows:
output_row.DT = input_row.DT;
if (input_row.cnt!=null)
output_row.cnt = input_row.cnt;
else if (<current record's fiscal year end date>.after(context.FiscalYear))
output_row.cnt = 0;
context.FiscalYear = <current record's fiscal year end date>;
where context.FiscalYear is a Date, initialised with any date that is before the end of the earliest fiscal year e.g. 01/01/1900.
That will reset the output to zero each time the fiscal year changes (including on the first record) and then carry that zero forward until you have an incoming value.