<!--
.. title: Hassett, chapter 8: Commonly-used continuous distributions
.. date: 2025-07-02 Wed 20:45 UTC-05:00
.. description: notes on Hassett and Stewart, chapter 8
.. type: text
.. has_math: true
-->

{{% hassett_navigation %}}

[TOC]

excluding 8.6, 8.7, Pareto and Weibull

# 8.1 the Uniform distribution

Given $X$ uniform on the interval $[a,b]$, we have the density functions
$$\begin{aligned}
\text{p.d.f.} &&
f(x) &= \begin{cases}
    \frac{1}{b-a} & a \leq x \leq b
    \\\\ 0 & \text{otherwise}
    \end{cases}
\\\\
\text{c.d.f.} &&
F(x) &= \begin{cases}
    0 & \phantom{x<{}} x < a \\\\
    \frac{x-a}{b-a} & a ≤ x < b \\\\
    1 & b ≤ x
    \end{cases}
\\\\
\text{mean} &&
E(X) &= \frac{a+b}{2}
\\\\
\text{variance} &&
V(X) &= \frac{(b-a)^2}{12}
\end{aligned}$$

# 8.2 the Exponential distribution

Integration by parts tells us that, for integer $n$
and real $a>0$,
$$\begin{aligned}
\int_0^\infty x^n e^{-ax}\mathrm dx &= \frac{n!}{a^{n+1}}.
\end{aligned}$$

For noninteger $n$, the factorial generalizes to $n! \to
\Gamma(n-1)$, but apparently you can't go past the pole at $\Gamma(0)$.

The exponential distribution gives the probability for the waiting
time between Poisson-distributed events.

  * An example.

    > Accidents at a busy intersection occur at an average rate of
    > $\lambda = 2\,\text{month}^{-1}$, following a Poisson distribution.
    > The time $T$ between accidents is a random variable with density
    > function
    > $$\begin{aligned}
    f(t) &= 2e^{-2t}, \text{ for } t \geq 0.
    \end{aligned}$$

    [I was bothered here by the dimensionful $\lambda$](/posts/2025/07/02/improbable-units).

The general form of the distribution is
$$\begin{aligned}
\text{p.d.f.:} &&
    f(t) &= \lambda e^{-\lambda t}
\\\\
\text{c.d.f.:} &&
    F(t) &= 1-e^{-\lambda t}
\\\\
\text{survival function:} &&
    S(t) &= 1-F(t) &&= e^{-\lambda t}
\\\\
\text{mean:} &&
    E(T) &= \int \lambda t\, e^{-\lambda t} \mathrm dt &&= \frac{1}{\lambda}
\\\\
\text{mean square:} &&
    E(T^2) &= \int \lambda t^2\, e^{-\lambda t} \mathrm dt &&= \frac{2}{\lambda^2}
\\\\
\text{variance:} &&
    V(T) &= \frac2{\lambda^2} - \frac{1}{\lambda^2} &&= \frac{1}{\lambda^2}
\end{aligned}$$

## Failure (or hazard) rates

> Let $T$ be a random variable with density function $f(t)$,
> cumulative distribution function $F(t) = \int_0^t \mathrm dt' f(t')$,
> and survival function $S(t) = 1-F(t)$.
> The "failure rate function" $\lambda(t)$ is defined by
> $$\begin{aligned}
\lambda(t) = \frac{f(t)}{1-F(t)} = \frac{f(t)}{S(t)}.
\end{aligned}$$
> For exponential distributions this is cute:
> $$\begin{aligned}
\lambda(x) &= \frac{\lambda e^{-\lambda x}}{ e^{-\lambda x}} = \lambda.
\end{aligned}$$
> This is a sort of conditional probability:
>$$\begin{aligned}
\lambda(t)\delta t
    & \approx \frac{P(t<T<t+\delta t)}{P(t < T)}
\\\\&= P(t<T<t+\delta t | t < T)
\end{aligned}$$

## Why the waiting time is exponential for Poisson-distributed events

One final assumption:

> If the number of events in a time period of unit length is
> Poisson-distributed with parameter $\lambda$, then the number of
> events in a time period with length $t$ is Poisson-distributed with
> parameter $\lambda t$.

This give
$$\begin{aligned}
P(X=0) = \frac{e^{-\lambda t} (\lambda t)^0}{0!} = e^{-\lambda t}.
\end{aligned}$$

But $P(X=0)$ must be the same as the survival function, $S(t)$.
From the discovery that $S(t) = e^{-\lambda t}$, we can derive all the
other features of the exponential distribution.

# 8.3 the Gamma distribution

If an exponential random variable can be used to model the waiting
time before the next independent event, a gamma distributed can model
the waiting time before the $n$th next event.  Consider modeling the
failures of machine parts or the survival time for a disease.

The gamma density function is
$$\begin{aligned}
f(x) &= \frac{ \beta^\alpha }{ \Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}
\end{aligned}$$

Here $\beta$ is a mean-time parameter and $\alpha$ seems to
correspond to the number of events of interest.  (Beware that some
authors prefer instead the waiting-time parameter $1/\beta$.)
Note that the
$\alpha=1$ case,

$$\begin{aligned}
f(x) &= \frac{\beta}{\Gamma(1)} e^{-\beta x}
\end{aligned}$$

corresponds to the exponential distribution.

## Theorem: sums of independent exponential random variables

Stated without proof:

> Let $X_1,X_2,\cdots,X_n$ be independent random variables,
> exponentially distributed with the same constant $\beta$.
> Then the sum $\sum X_i$ is gamma-distributed, with the same $\beta$
> and with $\alpha = n$.

## Mean and variance

As one might hope,
$$\begin{aligned}
E(X) &= \frac{\alpha}{\beta} & V(X) &= \frac{\alpha}{\beta^2}
\end{aligned}$$

# 8.4 the Normal distribution

The distribution and its friends are

$$\begin{aligned}
\text{p.d.f.:} &&
    f(x) &= \frac{1}{\sqrt{2\sigma^2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}
\\\\
\text{mean:} &&
    E(X) &= \mu
\\\\
\text{variance:} &&
    V(X) &= \sigma^2
\end{aligned}$$

The cumulative distribution function is non-analytic.
I happen to know that it's some nastiness involving the error
function, but I always have to look it up:

$$\begin{aligned}
F(x) & = \frac12\left(
1 + \text{erf}\left(
        \frac{x-\mu}{\sigma\sqrt 2}
    \right)
\right)
\end{aligned}$$

Transformations remain normally distributed, with

$$\begin{aligned}
E(aX + b) &= a E(X) + b
\\\\
V(aX + b) &= a^2 V(X) = a^2\sigma^2
\\\\
\sigma_Y &= \sqrt{V(Y)} = \text{abs}(a)\cdot\sigma
\end{aligned}$$

We can "standardize" a random variable by considering instead the
random variable

$$\begin{aligned}
Z &= \frac{X-\mu}{\sigma} = \frac1\sigma X - \frac\mu\sigma
\end{aligned}$$

which has $E(Z) = 0$ and $V(Z) = 1$.

Ordinarly one consults a computer or a table for values of $F(Z)$.

## The central limit theorem

> The central limit theorem says that, if we have lots of independent
> and identically-distributed (i.i.d.) random variables, their sum is
> approximately normally-distributed with mean $n\mu$ and variance
> $n\sigma^2$.

## The continuity correction

If we're modeling a discrete problem, consider adjusting the limits of
the continuous normal distribution by half a unit, to include all the
probability associated with real numbers which round to the (included)
integer endpoints.


# 8.5 the Lognormal distribution

The log-normal distribution is nice for one-sided phenomena with a
long tail, like insurance claims.
It holds for a random variable $Y = e^X$ if $X$ is normally distributed.
So we have

$$\begin{aligned}
f(y) &=
\frac{1}{\sqrt{(y\sigma)^2 \cdot 2\pi}}
\exp\left( -\frac12\left(\frac{\ln y - \mu}{\sigma}
\right)^2 \right)
\end{aligned}$$

This gives the annoying
$$\begin{aligned}
E(Y) &= \exp\left( \mu + \frac{\sigma^2}{2}\right)
\\\\
V(Y) &= e^{2\mu + \sigma^2} \left(e^{\sigma^2} - 1\right)
\end{aligned}$$

But for cumulative probabilities, we can say
$$\begin{aligned}
F_Y(c)
    &= P(Y \leq c)
\\\\&= P(e^X \leq c)
\\\\&= P(X \leq \ln c)
\\\\&= F_X(\ln c)
\end{aligned}$$

## Lognormal distribution for stock prices

Stock prices over time grow exponentially,
$$\begin{aligned}
A(t) &= A_0 e^{rt}
\end{aligned}$$
Apparently this makes the lognormal distribution a good fit?
That would be the case if the growth rates $r$ were normally
distributed.
I guess that's pretty reasonable.

# 8.6 (exclude) the Pareto distribution

The one-parameter[^1] Pareto distribution has density

[^1]: There's also a two-parameter version.

$$\begin{aligned}
f(x) &= \frac{\alpha}{\beta}\left( \frac{\beta}{x}\right)^{\alpha+1}
&\text{ with }\begin{cases}
2 < \alpha \\\\
0 < \beta \leq x \\\\
\end{cases}
\end{aligned}$$

If the restriction $\alpha>2$ is relaxed, the mean and the variance
aren't guaranteed to exist.  The parameter $\beta$ defines the domain
of the density function, because the cumulative distribution works out
to be

$$\begin{aligned}
F(x) &= 1 - \left(\frac \beta x\right)^\alpha,
\end{aligned}$$

and $x<\beta$ would have negative cumulative probability.
The Pareto distribution is defined for $x$ above some threshold:

> ![A plot of the Pareto density function](assets/pareto-density.png)
>
> The figure shows the graph of a Pareto density function for loss
> amounts measured in hundreds of dollars, i.e. $\$300$ is represented
> by $x=3$.  This insurance policy has a deductible of $\$300$.
> Claims for under this minimum aren't filed.

The mean and variance are

$$\begin{aligned}
E(X) &= \frac{\alpha\beta}{\alpha - 1}
\\\\
V(X) &= \frac{\alpha\beta^2}{\alpha-2} - \left(\frac{\alpha\beta}{\alpha-1}\right)^2
\end{aligned}$$

We can also define the failure rate[^2]:

[^2]: The gamma, normal, and lognormal distributions apparently
    failure rates which are nontrivial to derive, so they're not in
    this book.  Cue the standard eye-rolling about how "everything in
    the textbook is trivial."

$$\begin{aligned}
\lambda(t) &= \frac{f(t)}{1-F(t)}
\\\\ &= \frac \alpha\beta \frac{ (\beta/x)^{\alpha+1} }{ (\beta/x)^\alpha }
\\\\ &= \frac \alpha x
\end{aligned}$$

The failure rate here is useful for things like insurance claim amounts.

# 8.7 (exclude) the Weibull distribution

If the failure rate isn't constant with time, the Weibull distribution
might be useful.  It has a failure rate

$$\begin{aligned}
\lambda(x) &= \alpha \beta x^{\alpha - 1},
\end{aligned}$$

which comes from a probability density
$$\begin{aligned}
f(x) &= \alpha\beta x^{\alpha-1} e^{-\beta x^\alpha},
& \text{for } x &\geq 0,\ \alpha > 0,\ \beta > 0
\end{aligned}$$

For $\alpha=1$ we recover the exponential distribution;
for $\alpha > 1$ we have a failure rate that increases with time.
We also have

$$\begin{aligned}
\text{cumulative distribution:} &&
    F(x) &= 1 - e^{-\beta x^\alpha}
\\\\
\text{mean:} &&
    E(X) &= \frac{\Gamma\left( 1 + \frac{1}{\alpha}\right)}{\beta^{1/\alpha}}
\\\\\
\text{variance:} &&
    V(X) &= \frac{1}{\beta^{2/\alpha}} \left(
    \Gamma\left(1 + \frac2\alpha\right)
    +
    \Gamma\left(1 + \frac1\alpha\right)^2
    \right)
\end{aligned}$$

# 8.8 the Beta distribution

The beta distribution is defined on $[0,1]$, and is useful for
modeling things that can be written as percentages.  It has two
positive parameters and the probability density

$$\begin{aligned}
f(x) &=
    \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}
    \cdot
    x ^ {\alpha - 1}
    (1-x)^{\beta - 1}
\end{aligned}$$

Here's a figure showing $(\alpha,\beta) = (4,3)$:

![A beta density function](assets/beta-density.png)

The $\Gamma$ garbage is just a normalization.
But it does imply that integrals of the form

$$\begin{aligned}
\int_0^1 x^m (1-x)^n \mathrm dx
    &=
    \frac{\Gamma(m+1)\ \Gamma(n+1)}{\Gamma(m+n+1)}
\\\\
    &= \frac{n!\ m!}{(n+m)!}
\end{aligned}$$

result in ratios of factorials.

We have

$$\begin{aligned}
\text{mean:} &&
    E(X) &= \frac{\alpha}{\alpha + \beta}
\\\\
\text{variance:} &&
    V(X) &= \frac{\alpha\beta}{ (\alpha+\beta)^2 (\alpha+\beta+1)}
\end{aligned}$$

# 8.9 Fitting theoretical distributions to real problems


> The reader may wonder how to decide which distribution is a good
> fit.
> Well, keep wondering, cupcake.
> We're just going to tell you what to choose.