Bayesian nonlinear asymptotic regression model

A frequent problem in User Acquisition is to estimate ARPU (aka LTV when t->+inf) of the cohort — and a solid retention model is the foundation of any decent ARPU forecast.

A common approach to model retention is nonlinear asymptotic regression: y(t) = a + b/(c + t)

Think of it as: most users bail early, then things flatten out. Here, (a) is your long-term survivors (the loyal 2–3%), (b) is the big early drop, and (c) controls how fast the bleeding stops.

and there is how to implement a fully bayesian version of this model to properly deal with model uncertainty. As a bonus, it's possible to extend heteroskedasticity with cohort size.

"""
 uv tool run \
  --with pymc \
  --with arviz \
  --with numpy \
  --with matplotlib \
  --with seaborn \
  --with jupyter  \
  jupyter notebook --port=8888
"""

import pymc as pm
import arviz as az
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# time
x = np.array([1, 2, 3, 4, 5, 6, 7, 14, 28, 30, 60, 90 ]) 

# cohort retention
y = np.array([0.468, 0.339, 0.28, 0.245, 0.22, 0.202, 0.188, 0.131, 0.0908, 0.0847, 0.052, 0.0391]) 

with pm.Model() as model:
    sigma = pm.HalfNormal('sigma', 0.1)

    a = pm.HalfNormal('a', 0.1) 
    b = pm.HalfNormal('b', 5)
    c = pm.HalfNormal('c', 1)
    # add heteroskedasticity as error in tail should be smaller than in head
    # just make sure it doesn't explode at t=0
    # ln(0+1) + 1 = 1 
    # ln(100 + 1) + 1 = 5.61
    y_sigma = sigma / (np.log(x + 1) + 1)
    mu = a + b/(x + c)
    y = pm.Normal('y', mu=mu, sigma= y_sigma, observed=y)

    idata = pm.sample(1000, target_accept=0.95)
    ppc = pm.sample_posterior_predictive(idata, model=model)

az.plot_posterior(idata)
X = ppc.posterior_predictive.y.stack(z=('chain', 'draw')).to_numpy() # (T, draw)
def plot():
    for i in range(0,500):
        plt.plot(x, X[:, i], color='g')
    plt.plot(x, y,'x', color='b')
    plt.show()
plot()
# uncertainty on 90th day prediction
sns.histplot(X[-1, :])