I am creating a calendar as the user wants to select dates and date ranges for analysis. The supervision in the database being analyzed has a start date and end date. So I am creating the analysis date one day at a time with the while loop displayed above.
For an adcno the supervision can start in 1985 and end in 2016 so I building dates for 30 years just for one adcno.
Then I am going to the next adcno and creating dates again!!
You get the idea I hope
Let me know what other information I can provide.
I am actually using the build of the dates from the code above to create the master calendar. The dates need to be associated with a specific adcno. So Adcno 1 can have a range of dates of supervision from 1985 to 2015. Adcno2 has range of supervision from 1997 to 1999.
Can you provide any examples?
As far as I remember you were talking of millions of records of your input table, and each date loop may create thousands of records. So we are talking about billions of output table records. Is this correct?
Some things to consider:
- Do you really need day level granularity?
- Same granularity for the whole date range?
I think creating billions of records may take some time indeed, so not sure if we can get to minutes instead of hours.
- Do you need to create these records once, or do you really need to run over the complete data daily?
You may want to look into incremental load approaches to avoid the need to run the date creation over and over again.
You would then load (most of) the dates from a QVD, which could be optimized.
The first idea that occurred to me to maintain granularity without having billions of rows would be to still expand the data out, but to a key value, not the specific dates. Something like this:
Adcno, Begin, End
1, 1985-03-10, 2015-11-03
... millions of rows ...
Expand into date key values:
And build a table that links all date keys to associated dates:
... all dates thru ...
... dates for all months thru...
... dates for all years thru...
Not sure quite how to do the expansion into the date key values in step 2, but it's surely doable. And you might include decades as well as years. Maybe quarters. Whatever helps.
This isn't ideal (one of the concatenates from QVD is unoptimized), but here's a combination of incremental load and the DateKey approach I was talking about. It seems to go quite fast, including the unoptimized concatenate.
I'm getting about 42 DateKey values per row of the original table. To keep from causing trouble, I put that data in its own table so as not to duplicate rows of the original table. If you have 10,000,000 keys, you're going to have about 420,000,000 rows in this table. It may not be as big in memory as it sounds, though. I figure a couple bytes for the DateKey per row, three bytes for the regular Key, so around 2 GB being added to your data model to split all of your keys out into every date associated with those keys. Not sure if you consider that big or not, and also not sure if I've calculated correctly.
GenerateDates3.qvw 374.2 K