Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
lerhshong
Contributor
Contributor

Calculating persistency dynamically

Hello,

I'm interested in calculating a persistency measure for insurance policies that is defined using cohorts as below:

lerhshong_0-1636361996882.png

Each row is based on policy issue date, and the values are counts of policies. The columns record whether a policy is still in-force n months after issuance. For example, in the first row, it says that of all policies that were issued in 201901, 118 are still in-force after 1 month. 117 are still in-force after 2 months, etc.

What I'm interested in calculating is the measure in blue, which I'll call '13m persistency'. It requires at least 13 months of policies before it, so if 201901 is my first batch, then I can only start calculating it at 202001. The definition is:

persistency_13m_t = PRODUCT(policies in-force at t / policies in-force at t-1) = PRODUCT (yellow / green)

Referring to the image above, 

persistency_13m_202001 = PRODUCT( yellow / green) = 0.92 (the blue cell)

This is done on a rolling window basis, so that for the next month, we only ever take 12 cohorts at a go. My desired output is to get a table of persistency_13m and their values, so they can be visualized as a line:

yyyymm | persistency_13m
202001 | 0.92
202002 | 0.94
202003 | 0.89
....

Raw dataset

The raw dataset is at a policy level, with a policy_paid_to_date, which is equivalent to the date which the policy is in-force up to. So a policy_paid_to_date of today means the policy is in-force until today.

policy_number | policy_issue_date | policy_paid_to_date | inforce indicator
ABC | 2019-01-01 | 2021-01-01 | 1
XYZ | 2019-02-02 | 2021-02-02 | 1
PQR | 2019-03-03 | 2019-03-03 | 0

What I tried

I found the triangle hard to maneuver so I went back to the original meaning of the persistency measure. What it means in plain English is for a given yyyymm, I want to look back 13 months, calculate all the yellow/greens, and then multiply them together. I came up with this pseudocode:

min_policy_issue_date = min(policy_issue_date)    // the smallest policy issue date in the dataset, e.g. 201901
date_start_counting_persistency = min_policy_issue_date + 12 months  // the date where our persistency measure is first defined, e.g. 202001

list_of_persistency = [date_start_counting_persistency, date_start_counting_persistency + 1 month, date_start_counting_persistency + 2 months, ..... today() ], e.g. [202001, 202002, 202003, .... 202111]

for date in list_of_persistency:
    load *,
    if (policy_paid_to_date >= date + 12 months, 1, 0) as inforce_tm,
    if (policy_paid_to_date >= date + 13 months, 1, 0) as inforce_t,
    where policy_issue_date between (date, date + 12 months) // Only include relevant policies

    sum (inforce_t) / sum(inforce_tm) as yellow_green group by policy_issue_date_yyyymm

    // Output by now is a bunch of yellow / greens. Need to take their product and we're home

   persistency_13m = PRODUCT( yellow / greens)

My concern here is that the for loop will make the data load slow and also, as time goes by the list to loop through becomes longer. Given that the past data won't change, can we do this incrementally so that we only calculate the past once, and subsequently there's no more for loop? How can I translate the above pseudocode to Qlik? Thanks.

0 Replies