<!--
.. title: Hassett, chapter 5: Commonly-used discrete distributions.
.. date: 2025-06-26 16:45 UTC-05:00
.. description: notes on Hassett and Stewart, chapter 5
.. type: text
.. has_math: true
-->

{{% hassett_navigation %}}

[TOC]

# 5.1 The binomial distribution (113)

A binomial random variable $X$ is the number of successes in an
experiment where
1. there are $n$ in dependent trials,
2. each of which has the same probability $P(\text{success})=p$ and
   $P(\text{failure})=q=1-p$, and
3. each trial is independent of the others.

Coin flip, guess on an exam, etc.

The probability of $s$ successes should be
$$
P(X=s) = {n\choose s}p^s q^{n-s}
$$

This gives expected mean and variance
$$
\begin{aligned}
E(X) &= np \\\\
V(X) &= npq = np(1-p) \\\\
\end{aligned}
$$

I'd like to briefly prove these, and to see what functions are
available on my calculator and in `scipy.stats`.

## Expectation value and variance

The expectation value seems intuitive, but writing out the definition
makes it seem less obvious:'$$
\begin{aligned}
E(X) = \sum k\ P(k) = \sum_{k=0}^{n} k {n\choose k} p^k q^{n-k}
\end{aligned}
$$

This seems like perhaps one of those differentiate-the-sum problems.
To reduce clutter, and keeping in mind that $q=1-p$, let's find
$$
\begin{aligned}
\frac{\mathrm d}{\mathrm dp}\left( p^k q^{n-k} \right)
    &= k\,p^{k-1}q^{n-k} - (n-k) p^k q^{n-k-1}
\\\\&= \left(
k\left( \frac1q + \frac1p\right) - \frac nq
\right) p^k q^{n-k}
\\\\
pq \frac{\mathrm d}{\mathrm dp}\left( p^k q^{n-k} \right)
    &= \left( k\left( p + q \right) - np \right) p^k q^{n-k}
    = (k-np) p^k q^{n-k}
\end{aligned}
$$
Since all of the ${n\choose k}p^k q^{n-k} = P(k)$ add up to unity,
whose derivative is zero, we can include the coefficients and the sum
to get
$$
\begin{aligned}
0 &= pq \frac{\mathrm d}{\mathrm dp} \left( \sum P(k) \right) \\\\
    &= \sum k P(k) - np \sum P(k)
\\\\
np &= E(X)
\end{aligned}
$$

To get the variance, let's take the derivative again:
$$
\begin{aligned}
pq \frac{\mathrm d}{\mathrm dp} \big( E(X) \big)
    &= \sum \left(k^2 - knp\right) P(k)
\\\\
pq \cdot n &= E(X^2) - np \cdot E(X)
\\\\
(np)^2 + pq\cdot n &= E(X^2)
\end{aligned}
$$
This gives variance $V(X) = E(X^2) - (E(X))^2 = npq,$, which is what
we wanted to demonstrate.

## Features of `np.random`

Here's one method to draw from the distribution

```python
>>> np.random.binomial(100,1/6,10)
array([18, 18, 13, 19, 19, 16, 21, 25, 15, 10])
```
However, the documentation for `np.random.binomial` says

```rst
.. note::
    New code should use the `~numpy.random.Generator.binomial`
    method of a `~numpy.random.Generator` instance instead;
    please see the :ref:`random-quick-start`.
```

There the minimal working example seems to be

```python
>>> from numpy.random import Generator, PCG64
>>> g = Generator(PCG64())
>>> g.binomial(100,1/6,10)
array([24, 16, 17, 15, 17, 19, 24, 16, 22, 14])
```

The `PCG64` object is an `np.random.BitGenerator`, which seems to be a
way to distinguish between real-number distributions versus the pure
randomness that you need to undergird it.

## Features of `scipy.stats.binom`

> As an instance of the `rv_discrete` class, `binom` object inherits from it
> a collection of generic methods (see below for the full list),
> and completes them with details specific for this particular distribution.
```rst
Methods
-------
rvs(n, p, loc=0, size=1, random_state=None)
    Random variates.
pmf(k, n, p, loc=0)
    Probability mass function.
logpmf(k, n, p, loc=0)
    Log of the probability mass function.
cdf(k, n, p, loc=0)
    Cumulative distribution function.
logcdf(k, n, p, loc=0)
    Log of the cumulative distribution function.
sf(k, n, p, loc=0)
    Survival function  (also defined as ``1 - cdf``, but `sf` is sometimes more accurate).
logsf(k, n, p, loc=0)
    Log of the survival function.
ppf(q, n, p, loc=0)
    Percent point function (inverse of ``cdf`` --- percentiles).
isf(q, n, p, loc=0)
    Inverse survival function (inverse of ``sf``).
stats(n, p, loc=0, moments='mv')
    Mean('m'), variance('v'), skew('s'), and/or kurtosis('k').
entropy(n, p, loc=0)
    (Differential) entropy of the RV.
expect(func, args=(n, p), loc=0, lb=None, ub=None, conditional=False)
    Expected value of a function (of one argument) with respect to the distribution.
median(n, p, loc=0)
    Median of the distribution.
mean(n, p, loc=0)
    Mean of the distribution.
var(n, p, loc=0)
    Variance of the distribution.
std(n, p, loc=0)
    Standard deviation of the distribution.
interval(confidence, n, p, loc=0)
    Confidence interval with equal areas around the median.
```

I think I know what all of these do.  Testing them would be a fun
little project for a graduate student with infinite time.

# 5.2 The hypergeometric distribution (122)

This is for choices without replacement.  Example:  choose five people
randomly from twenty men plus thirty women.  What is the probability you get
two men and three women?  It's combinatorics:
$$
\begin{aligned}
P(\text{two mean, three women})
    &= \frac{{20\choose 2}{30\choose 3}}{50\choose 5}
\end{aligned}
$$

In general, if we are
* choosing $n$ from $N$ items, and
* we want $k$ items from a subgroup with population $r$,
we have probability
$$
\begin{aligned}
P(k) &= \frac{
    {r\choose k}{N-r \choose n-k}
}{ N \choose n }.
\end{aligned}
$$

There's a footnote about relaxing the restriction that $r \geq n$:
that is, we're choosing fewer items than are in the population of
interest.  In fact this relation holds only for
$$
\begin{aligned}
\max(0, r-(N-n)) \leq k \leq \min(r,n)
\end{aligned}
$$

The lower bound is from the pigeonhole principle; the upper bound
prevents us from choosing more than all of our subset.
If we have twenty
men and thirty women, and we choose forty people, then at least ten
of them *must* be men, but no more than twenty of them *can* be men.
The binomial
coefficients do strange things in these limits, anyway.

## Expectation and variance

The expectation value is pretty sensible
$$
\begin{aligned}
E(X) &= n\cdot \frac rN
\end{aligned}
$$
The variance is kind of a big mess, though:
$$
\begin{aligned}
V(X) &= n \cdot \frac rN \cdot
\left(1-\frac rN\right) \cdot \frac{N-n}{N-1}
\end{aligned}
$$

What's the right way to think of this?  In the limit where you're
taking a small sample of a large population, $n\ll N$, the final
fraction looks like $\frac{1-n/N}{1-1/N} \to 1$, and we recover
the result $V(X) = np(1-p)$ for independent samples.  The factor
$\frac{N-n}{N-1}$ is apparently called the "finite population
correction factor," and a secondhand reference suggests neglecting it
for $\frac nN\lesssim \frac1{10}$.

To actually compute these moments of the distribution, we'd need to do
$$
\begin{aligned}
E(X) = \sum k P(k)
&= \frac{1}{N\choose n} \sum k{r\choose k}{N-r\choose n-k}
\end{aligned}
$$

The method probably involves some dumb identities with the binomial
coefficients.  The one that lets you recursively build Pascal's
triangle is
$$
\begin{aligned}
{n\choose k} &= {n-1\choose k-1} + {n-1\choose k},
\end{aligned}
$$
which is a special case of
[an identity named for Vandermonde](https://en.wikipedia.org/wiki/Vandermonde%27s_identity),
$$
\begin{aligned}
{m+n\choose r} &= \sum_{k=0}^r {m\choose k}{n \choose r-k}
\end{aligned}
$$

## Computational stuff

From `scipy.stats.hypergeometric`:

> The hypergeometric distribution models drawing objects from a bin.
> `M` is the total number of objects, `n` is total number of Type I
> objects.  The random variate represents the number of Type I objects
> in `N` drawn without replacement from the total population.
>
> As an instance of the `rv_discrete` class, `hypergeom` object
> inherits from it a collection of generic methods (see below for the
> full list), and completes them with details specific for this
> particular distribution.

```rst
Methods
-------
rvs(M, n, N, loc=0, size=1, random_state=None)
    Random variates.
pmf(k, M, n, N, loc=0)
    Probability mass function.
```

Let's watch the probabilities of the hypergeometric distribution
converge to the binomial distribution as we make the population
larger.

```python
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np

n=5
plot.plot(stats.binom.pmf(np.arange(n+1), n, 3/5), ...)
M = 0;
M += 5; plt.plot(stats.hypergeom.pmf(np.arange(n+1), M, n, M*3//5), ...)
```
![plot of binomial and hypergeometric distributions](hypergeometric.svg)

# 5.3 The Poisson distribution (126)

Poisson is also used to model numbers of events, but these events are
independent of one another.  For example, "this intersection has an
average $\lambda=2$ accidents per month."

The distribution is
$$
\begin{aligned}
P(X=k) &= \frac{e^{-\lambda} \lambda^k}{k!}
\\\\
E(X) &= \lambda \\\\
V(X) &= \lambda
\end{aligned}
$$

I feel like Poisson should arise in some limit of binomial.
Let's match the means $\lambda = np$, and consider the case where
there are zillions of trials and only a few successes, $k\ll n$.
Then the binomial probability

$$
P_\text{binomial}(k) = {n\choose k} p^k q^{n-k}
$$

has three terms we want to approximate.  The coefficient goes like

$$
{n\choose k} \approx \frac{n^k}{k!}.
$$

For example, ${100\choose 3} = \frac{100\times 99\times 98}{3!}
\approx \frac{100^3}{3!}$.  Looking at $q=(1-p)$, we have

$$
\begin{aligned}
(1-p)^{n-k}
     &\approx (1-p)^n
\\\\ &= 1 - np + {n\choose 2}p^2 - {n\choose 3}{p^3} + \cdots
\\\\ &\approx 1 - \frac{np}{1!} + \frac{(np)^2}{2!} + \cdots
\\\\ &= e^{-np}
\end{aligned}
$$

So the relation overall is
$$
\begin{aligned}
P(k) &= {n\choose k} \cdot p^k \cdot q^{n-k}
\\\\ &\approx \frac{n^k}{k!} \cdot p^k \cdot e^{-np}
\\\\ &= \frac{(np)^k e^{-np}}{k!} = P_\text{poisson}(k)
\end{aligned}
$$

Oh, look!  That's the next section in the book:

> If $n$ is large and $p = \lambda/n$ is small, then $P(X=k)$ can be
> calculated approximately using either the Poisson or the binomial
> distributions:
>
> $$ \frac{e^{-\lambda}\lambda^k}{k!}
> \approx {n\choose k} \left(\frac{\lambda}{n}\right)^k
> \left(1-\frac{\lambda}{n}\right)^{n-k}.
> $$

# 5.4 The geometric distribution (132)

The geometric distribution is for waiting-time problems, like "how
many dice rolls until a six"?  There are (at least) two definitions,
depending on whether you count the success or not — that is, whether a
six on the first die roll counts as $k=0$ or as $k=1$.
With $X$ meaning "failures before success" and $Y = X+1$ meaning
"attempts including success," we have
$$
\begin{aligned}
\\\\
P(X=k) &= q^k p     \qquad           & P(Y=k) &= q^{k-1} p
\\\\
E(X) &= \frac{q}{p}                & E(Y) &= 1 + \frac qp = \frac 1p
\\\\
V(X) &= \frac{q}{p^2}                & V(Y) &= V(X+1) = V(X) = \frac{q}{p^2}
\end{aligned}
$$

# 5.5 The negative binomial distribution (136)

This extends the geometric series.  Suppose I want to know how many
dice rolls before the *third* time I roll a six?  One such sequence is
$\set{1,4,3,5,6,3,4,5,2,6,3,4,6}$.  goes back to combinatorics, which
is why the binomial expansion gets involved.  The probability that I
get my third six on the twelfth roll is the probability that the
twelfth roll itself is a six, plus the fraction of eleven-roll
sequences which contain exactly two sixes.

> A series of independent trials has probability $P(S)=p$ on each trial.
> Let $X$ be the number of failures before the $r$-th success.
>
> $$
\begin{aligned}
P(k) = {r+k-1 \choose r-1} q^k p^r
\end{aligned}
$$

For $r=1$, we get the geometric back:
$$
\begin{aligned}
P(k; r=1) &= {1+k-1\choose r-1} q^k p = {k\choose 0} q^k p
\end{aligned}
$$

Delightfully, the mean and variance are
$$
\begin{aligned}
E(X) &= \frac{rq}{p} & \qquad V(X) &= \frac{rq}{p^2}
\end{aligned}
$$

# 5.6 The discrete uniform distribution (141)

This one's simple:
$$
\begin{aligned}
P(k) &= \frac 1n, &&k\in\set{1\cdots n}
\\\\
E(X) &= \frac{n+1}{2}
\\\\
V(X) &= \frac{n^2-1}{12}
\end{aligned}
$$

# Problems

The problems are split up as follows:

* 1–10         Binomial
* 11–15        Hypergeometric
* 16–22        Poisson
* 23–26        Geometric
* 27–31        Negative Binomial
* 33–35        Discrete Uniform
* 36–41        Sample actuarial problems
