Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
load link_Sup_ADCNO, BeginningDateofSupervision + IterNo()-1 as Dup_analysis_date , S as Dup_S
from resident Supervision while IterNo() <= Supervision_End_Date - BeginningDateofSupervision +1 ;
The statement above is running in the application for more than 2 hours. I have posted this issue to others and received some good feedback to pull data from the resident table.
But I am still having performance issues , any ideas on how to make this while statement run faster and more efficiently?
Thanks
Rick
Would be great if you could provide also the context. Maybe 'while' is not the best solution to the overall problem.
I am creating a calendar as the user wants to select dates and date ranges for analysis. The supervision in the database being analyzed has a start date and end date. So I am creating the analysis date one day at a time with the while loop displayed above.
For an adcno the supervision can start in 1985 and end in 2016 so I building dates for 30 years just for one adcno.
Then I am going to the next adcno and creating dates again!!
You get the idea I hope
Let me know what other information I can provide.
Thanks
Rick
You should not mix calendar fields with other fact fields, create 30 years days first in calendar table and link it with other facts through common date field.
I am actually using the build of the dates from the code above to create the master calendar. The dates need to be associated with a specific adcno. So Adcno 1 can have a range of dates of supervision from 1985 to 2015. Adcno2 has range of supervision from 1997 to 1999.
Can you provide any examples?
Thanks
Rick
The data in the database looks like this where the range exists between beginningdateofsupervision and supervision_end_date
select distinct
b.OffenderIdentificationNumber as Sup_ADCNO
BeginningDateofSupervision , Supervision_End_Date
FROM [AIMSDW].[DW_Stage_PCAIMS].[Tbl_DORS07AA] b
As far as I remember you were talking of millions of records of your input table, and each date loop may create thousands of records. So we are talking about billions of output table records. Is this correct?
Some things to consider:
- Do you really need day level granularity?
- Same granularity for the whole date range?
I think creating billions of records may take some time indeed, so not sure if we can get to minutes instead of hours.
- Do you need to create these records once, or do you really need to run over the complete data daily?
You may want to look into incremental load approaches to avoid the need to run the date creation over and over again.
You would then load (most of) the dates from a QVD, which could be optimized.
Perhaps I can create dates every 5 days or 3 days. Yes I believe that that incremental load is the only solution for this volume of data.
If the user did not want a calendar I could have used set analysis and avoided this issue.
Do you have any ideas for creating less granularity?s
Rick
Thanks
The first idea that occurred to me to maintain granularity without having billions of rows would be to still expand the data out, but to a key value, not the specific dates. Something like this:
Original Data:
Adcno, Begin, End
1, 1985-03-10, 2015-11-03
... millions of rows ...
Expand into date key values:
Adcno, DateKey
1, 1985-03-10
1, 1985-03-11
...
1, 1985-03-31
1, 1985-04
1, 1985-05
...
1, 1985-12
1, 1986
1, 1987
1, 1988
...
1, 2014
1, 2015-01
1, 2015-02
...
1, 2015-10
1, 2015-11-01
1, 2015-11-02
1, 2015-11-03
And build a table that links all date keys to associated dates:
DateKey, Date
1985-01-01, 1985-01-01
1985-01-02, 1985-01-02
... all dates thru ...
2015-12-30, 2015-12-30
2015-12-31, 2015-12-31
1985-01, 1985-01-01
1985-01, 1985-01-02
...
1985-01, 1985-01-31
1985-02, 1985-02-01
... dates for all months thru...
2015-12, 2015-12-31
1985, 1985-01-01
1985, 1985-01-02
... dates for all years thru...
2015, 2015-12-31
Not sure quite how to do the expansion into the date key values in step 2, but it's surely doable. And you might include decades as well as years. Maybe quarters. Whatever helps.
You have a Start and End Date and want to select dates and date ranges for analysis.
Have you looked at using the IntervalMatch() function in the load script ?