A frequent problem in User Acquisition is to estimate ARPU of users in a cohort. Cohort — a group of users acquired together (same date, source, channel, etc.).
Individual user revenue follows a heavy-tailed distribution:
Majority (~75%) generate zero revenue
Some (~20%) generate modest revenue
Tiny minority (~5%) generate the significant part (whales)
Assumption: users inside a cohort are i.i.d., individual revenue ∼ LogNormal (or any other heavy-tailed distribution with finite mean and variance).
Because mean μ and variance σ² exist and are finite, we can apply the Central Limit Theorem to the cohort mean.
So the problem is to estimate unknown parameters μ and σ² using data. For modeling purposes, it would be convenient to define σ in terms of some other parameter k and μ , for that we need observed standard deviation of revenue inside cohorts.In practice, revenue SD is roughly proportional to mean (common for positive heavy-tailed data).This leads to the hierarchical Bayesian model that works extremely well as long as cohort size n ≳ 50–100