Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Future PM Schedules

I have been tasked to determine the Interval between dates of a PM Schedule.  Here is a sample of my data:

   

EventScheduleIDEventStartDateTimeFrequency
1007/6/2015Weekly
1007/13/2015Weekly
1007/20/2015Weekly
1039/1/2015Monthly
1033/1/2016Monthly
1039/1/2016Monthly
1033/1/2017Monthly

In the database of my client stores the frequency you can see Weekly and Monthly in example.  There might be some schedules that are Daily and Yearly as well.  There is no Quarterly, Bi-Monthly, Semi-Annually.  The way they have to setup Quarterly as you can predict is Monthly and then select every 3 months.  Is there a formula that create the words Quarterly, Semi-Annually, etc.  I have all the schedules and the start dates for them going out 2 years.  So in the example above for Schedule 103 I want to change on the front end Monthly to Semi-Annually.  I hope this makes sense.  Any suggestions are welcome.

David

10 Replies
sunny_talwar

Is the fact the there are 6 months between the two rows for the EventScheduleID makes the frequency Semi-Annually? You need this to be on the front end? What kind of object?

Not applicable
Author

To answer your first question the answer is yes.  So if my client runs a report on the schedules what they are going to get in return is:

EventScheduleID        Frequency

100                                 Weekly

103                                 Monthly

100 is fine because it is Weekly.  But 103 on the Front ends shows Monthly when it is actually Semi-Annually.  I hope this makes sense.  I would like to have an Excel output showing Semi-Monthly instead of Monthly.  Currently they are opening each schedule.  Go to the frequency section and they will see every 6 months.  That is a manually process.  That why I was looking for a formula.

David

MarcoWedel

You might want to correct your source data instead of trying to repair it in the front end.

regards

Marco

Not applicable
Author

‌if I could I would.  It is the product data and that is not going to be changed.  If there is no formula I am ok with it. 

Not applicable
Author

Hi David

There is a way you can achieve this by calculating the average period between each event for each EventScheduleID and converting that to an interval which can be decoded into a frequency.

I have only had a quick look at this to prove the method and offer a possible principle of operation so you may find you need to do some sensitivity analysis on it when e.g. events fall out of sync with their frequency or change frequency.  If your intervals are fairly stable it should offer something of a solution. Hopefully it will serve as a starting point to a better solution.

In essence the steps would be:

  • Load your data
  • Ensure your dates are properly formatted and are numerical in the background so they can be sorted
  • Sort your data by EventScheduleID and EventDate in ascending order. (Very important)
  • Calculate the difference in days between each of the events for each of the EventScheduleID's
  • Do a bit of processing to convert these periods into intervals
  • Convert the interval values to text-based frequency values

To try out my proposed solution:

  • Copy the code below into your script editor and load the data.
  • Create a table box sorted by EventScheduleID and EventDate in ascending order using the fields
    • EventScheduleID
    • EventStartDateTime
    • EventDate
    • Frequency
    • DaysSinceLast
    • averageDays
    • averageInterval
    • averageEventsPerYear
    • calculatedFrequency
  • Check out if the results are what you expect to get

Here is the code.  I've written notes to help explain what happens at each stage.

//Step 1: Load the raw data

tblData_temp:

Load * inline [

LineID,EventScheduleID, EventStartDateTime, Frequency

1,100,07/06/2015,Weekly

2,100,07/13/2015,Weekly

3,100,07/20/2015,Weekly

4,103,09/01/2015,Monthly

5,103,03/01/2016,Monthly

6,103,09/01/2016,Monthly

7,103,03/01/2017,Monthly

8,104,02/05/2016,Daily

9,104,02/06/2016,Daily

10,104,02/09/2016,Daily

11,104,02/10/2016,Daily

12,104,02/11/2016,Daily

13,105,02/15/2014,Annually

14,105,02/07/2015,Annually

15,105,02/11/2016,Annually

16,106,02/11/2016,Bi-Monthly

17,106,04/15/2016,Bi-Monthly

18,106,06/05/2016,Bi-Monthly

19,106,08/11/2016,Bi-Monthly

20,107,02/11/2015,Quarterly

21,107,05/05/2015,Quarterly

22,107,08/16/2015,Quarterly

23,107,11/11/2015,Quarterly

24,107,02/11/2016,Quarterly

];

//Step 2: Format the dates to ensure they are proper dates

//(You may not need this step if your dates are already in a proper date format)

left join (tblData_temp)

Load

  LineID

  ,Date(MakeDate(Right(EventStartDateTime,4),left(EventStartDateTime,2),mid(EventStartDateTime,4,2)),'DD/MM/YYYY') as EventDate

Resident tblData_temp;

//Step 3: Sort the raw data to get it into the correct sort order for processing

tblData:

noconcatenate

Load

  LineID,

  EventScheduleID,

  EventStartDateTime,

  EventDate,

  Frequency

Resident tblData_temp

order by EventScheduleID,EventDate asc

;

//Step 4: Drop the old raw data

Drop table tblData_temp;

//Step 5: For each EventScheduleID calculate the number of days since the last event. 

//This sets to 0 for the first instance of each new EventScheduleID

left join (tblData)

Load

  LineID,

  If(EventScheduleID=Previous(EventScheduleID),EventDate-Previous(EventDate),0) as DaysSinceLast

Resident tblData;

//Step 6: Calculate the intervals of the year based on a 365 day year

left join (tblData)

Load

  EventScheduleID,

  Sum(DaysSinceLast)/(Count(DaysSinceLast)-1) as averageDays,

  365/Floor(Sum(DaysSinceLast)/(Count(DaysSinceLast)-1)) as averageInterval,

  Floor(365/Floor(Sum(DaysSinceLast)/(Count(DaysSinceLast)-1))) as averageEventsPerYear

Resident tblData

Group By EventScheduleID;

//Step 7: Decode the averageEventsPerYear values into text versions of the frequency. 

//You may need to change this into ranges rather than absolute values.

left join (tblData)

Load

  LineID

  ,If(averageEventsPerYear=365,'Daily'

  ,If(averageEventsPerYear=52,'Weekly'

  ,If(averageEventsPerYear=6,'Bi-Monthly'

  ,If(averageEventsPerYear=4,'Quarterly'

  ,If(averageEventsPerYear=2,'Semi-Annually'

  ,If(averageEventsPerYear=1,'Annually','Check')))))) as calculatedFrequency

Resident tblData;

The results will look something like this:

         

EventSchedule

ID

EventStart

DateTime

EventDateFrequency

DaysSince

Last

average

Days

average

Interval

averageEvents

PerYear

calculated

Frequency

10007/06/201506/07/2015Weekly0752.1428571452Weekly
10007/13/201513/07/2015Weekly7752.1428571452Weekly
10007/20/201520/07/2015Weekly7752.1428571452Weekly
10309/01/201501/09/2015Monthly0182.33333332.0054945052Semi-Annually
10303/01/201601/03/2016Monthly182182.33333332.0054945052Semi-Annually
10309/01/201601/09/2016Monthly184182.33333332.0054945052Semi-Annually
10303/01/201701/03/2017Monthly181182.33333332.0054945052Semi-Annually
10402/05/201605/02/2016Daily01.5365365Daily
10402/06/201606/02/2016Daily11.5365365Daily
10402/09/201609/02/2016Daily31.5365365Daily
10402/10/201610/02/2016Daily11.5365365Daily
10402/11/201611/02/2016Daily11.5365365Daily
10502/15/201415/02/2014Annually03631.0055096421Annually
10502/07/201507/02/2015Annually3573631.0055096421Annually
10502/11/201611/02/2016Annually3693631.0055096421Annually
10602/11/201611/02/2016Bi-Monthly060.666666676.0833333336Bi-Monthly
10604/15/201615/04/2016Bi-Monthly6460.666666676.0833333336Bi-Monthly
10606/05/201605/06/2016Bi-Monthly5160.666666676.0833333336Bi-Monthly
10608/11/201611/08/2016Bi-Monthly6760.666666676.0833333336Bi-Monthly
10702/11/201511/02/2015Quarterly091.254.0109890114Quarterly
10705/05/201505/05/2015Quarterly8391.254.0109890114Quarterly
10708/16/201516/08/2015Quarterly10391.254.0109890114Quarterly
10711/11/201511/11/2015Quarterly8791.254.0109890114Quarterly
10702/11/201611/02/2016Quarterly9291.254.0109890114Quarterly

As you can see it has corrected 103 from monthly to Semi-Annually.  I hope that offers some food for thought on how you could approach the problem.

Kind regards

Steve

Not applicable
Author

Thanks Steve.  It worked perfectly.  Just had to make some adjustments to the calculatefrequency options but other than that it got me close.  Awesome.....

Not applicable
Author

Hi Steven,

Your solution I finally put into place.  The problem I ran into was it returned a bunch of checks.  Numbers like:

   

averageEventsPerYear
8
3
1
5
26
13
11
0

My question is it I check them and assign a calculated Frequency will those numbers always be the same answer.  You solution has been awesome but I want some additional advice.

David

Not applicable
Author

Hi David

I'm pleased my idea has been of some help.  As mentioned in my original post this solution will work most effectively if your intervals are fairly stable, i.e. there are approximately the same number of days in between each event for each event schedule id.

To answer your question, as this model is based on regular day intervals then it will always provide consistent answers, however, if those answers do not fit with your interval model then the ID's will emerge as 'Check' exceptions.  Consequently, if you hard code the figures shown in your table the larger values are likely to be relatively accurate but the lower values may be more at risk of error.

If you have widely fluctuating event intervals in your data then it will be difficult to categorise them.  From the values shown in your chart it does seem to suggest that you have some EventScheduleID's where the frequencies are quite unstable and/or the frequencies have changed periodically which would skew the results.

For example, you would expect that a weekly event took place every 7 days, providing approximately 52 events per year, while a Semi-annual event took place every 182.5 days providing 2 events per year etc.  So if your data has intervals which are not stable you will find that your average days figure will drift from these expected values and the larger the variation then the wider the drift.

There a few ways to deal with this.  E.g.

  • Create new intervals to catch the odd periods
  • Use ranges to smooth out the fluctuations and catch some of the outliers
  • Use the Check indicator to identify schedule id's with too much variation in their event schedule and manually adjust them
  • Group your data by time periods if you know that frequencies change, thus reducing the overall drift.

Sensitivity analysis on your data would help you to identify firstly whether or not this solution will actually work for you and also whether you can define a set of ranges that will give you reasonable answers.  The check indicator will allow you to identify the outliers in your results set and analyse the degree of drift from the expected values.  There is always a risk that a widely varying data set will produce a valid result suggesting a frequency which is actually not correct.


Once you know the drift you can then assign a set of ranges instead of absolute values in the nested if statement to allow a wider range of values to be categorised into a group.  This will be more forgiving for the longer intervals than the shorter ones as e.g. the averages would have to drift a lot to move from 182.5 to 365 days but could more easily drift from 1 to 7 days

So for example you could replace the nested if code block with something like:

//Step 7: Decode the averageEventsPerYear values into text versions of the frequency using ranges.
left join (tblData)
Load
LineID
,
If(averageEventsPerYear>=335 and averageEventsPerYear <=395,'Daily'
,
If(averageEventsPerYear>=49 and averageEventsPerYear<=60,'Weekly'
,
If(averageEventsPerYear>=5 and averageEventsPerYear<=7,'Bi-Monthly'
,
If(averageEventsPerYear>=3 and averageEventsPerYear<5,'Quarterly'
,
If(averageEventsPerYear>=1.5 and averageEventsPerYear<3,'Semi-Annually'
,
If(averageEventsPerYear>=0.5 and averageEventsPerYear<1.5,'Annually','Check')))))) as calculatedFrequencyRangeMethod
Resident tblData;

I have just used arbitrary values in the above example code to demonstrate the principle, you may need to change them to suit your model.  It effectively creates a series of frequency bands into which the Event Schedule ID's can be categorised.

With the data set used in the previous example plus two new, less stable id's this would give the following results:

   

Event Schedule IDEvent DateDays Since Lastaverage Daysaverage Intervalaverage Events Per Yearcalculated Frequencycalculated Frequency Range Method
10006/07/20150752.1428571452WeeklyWeekly
10013/07/20157752.1428571452WeeklyWeekly
10020/07/20157752.1428571452WeeklyWeekly
10301/09/20150182.33333332.0054945052Semi-AnnuallySemi-Annually
10301/03/2016182182.33333332.0054945052Semi-AnnuallySemi-Annually
10301/09/2016184182.33333332.0054945052Semi-AnnuallySemi-Annually
10301/03/2017181182.33333332.0054945052Semi-AnnuallySemi-Annually
10405/02/201601.5365365DailyDaily
10406/02/201611.5365365DailyDaily
10409/02/201631.5365365DailyDaily
10410/02/201611.5365365DailyDaily
10411/02/201611.5365365DailyDaily
10515/02/201403631.0055096421AnnuallyAnnually
10507/02/20153573631.0055096421AnnuallyAnnually
10511/02/20163693631.0055096421AnnuallyAnnually
10611/02/2016060.666666676.0833333336Bi-MonthlyBi-Monthly
10615/04/20166460.666666676.0833333336Bi-MonthlyBi-Monthly
10605/06/20165160.666666676.0833333336Bi-MonthlyBi-Monthly
10611/08/20166760.666666676.0833333336Bi-MonthlyBi-Monthly
10711/02/2015091.254.0109890114QuarterlyQuarterly
10705/05/20158391.254.0109890114QuarterlyQuarterly
10716/08/201510391.254.0109890114QuarterlyQuarterly
10711/11/20158791.254.0109890114QuarterlyQuarterly
10711/02/20169291.254.0109890114QuarterlyQuarterly
10802/12/201603.333333333121.6666667121CheckCheck
10805/12/201633.333333333121.6666667121CheckCheck
10808/12/201633.333333333121.6666667121CheckCheck
10812/12/201643.333333333121.6666667121CheckCheck
10901/02/2017067.666666675.4477611945CheckBi-Monthly
10901/04/20175967.666666675.4477611945CheckBi-Monthly
10912/06/20177267.666666675.4477611945CheckBi-Monthly
10923/08/20177267.666666675.4477611945CheckBi-Monthly

In the above example the stable events have again been categorised correctly using the range method.  The slightly less stable id 109 has also now been categorise correctly, however id 108 remains at 'Check'.  This is correct as there is no frequency band in your model for events which take place 3 times per year i.e. approximately every 122 days, so the report has correctly highlighted id 108 as an exception.

I hope that helps to get you a step further with your analysis.

Kind regards

Steve

Not applicable
Author

Hi Steve,

Again thank you so much.  I was hoping I could adjust the formula like you did.  I have it now down to just a few outliers that I have to work on.

David