<!--
.. title: Hassett, chapter 2: Counting for probability
.. date: 2025-06-19 Thu 11:00 UTC-05:00
.. description: notes on Hassett and Stewart, chapter 2
.. type: text
.. has_math: true
-->

{{% hassett_navigation %}}

[TOC]

# 2.1 What is probability?

Counting approach for equally likely outcomes, versus relative
frequency estimates for unequally likely outcomes.

# 2.1 Sets, sample spaces, and events

Finite sets: $S = \{1,2,3,4,5,6\}$

Infinite sets are built.
$$
\mathbb N
= \left\\{
x | x \text{ is a real number and } x>0
\right\\}
$$

Sample space
:   The set of all possible outcomes of the experiment.

![If you die, your insurance company might face financial
strain](sec-22-death.png)

Event
:   A subset of the sample space for an experiment.

Examples:

  * Coin toss comes up heads.  Sample space $S=\\{H,T\\}$.  Event
    $E=\\{H\\}$.
  * At most five of one hundred life policies have death benefit
    claims in the next year.

    Sample space: $S = \\{0,1,2,\cdots,100\\}$.  
    Event: $E = \\{0,1,2,3,4,5\\}$.

# 2.3 Compound events: set notation
<!-- [2025-06-19 Thu 12:44] -->>
<!-- [2025-06-20 Fri 12:46] -->

Negation: $\not E$  (in LaTeX, `{\sim}`, to pretend it's not a binary
operator)  
Or (union): `\cup` $A \or B$  
And (intersection): `\cap` $A \and B$

# 2.4 Set identities

Unions and intersections are distributive:

$$
\begin{aligned}
A \and (B\or C) &= (A\and B) \or (A \and C) \\\\
A \or (B\and C) &= (A\or B) \and (A \or C)
\end{aligned}
$$

De Morgan's Laws:

$$
\begin{aligned}
    \not(A\or B) &= (\not A) \and (\not B) \\\\
    \not(A\and B) &= (\not A) \or (\not B)
\end{aligned}
$$

# 2.5 Counting

Partition a set $S$ into $A$ and $\not A$.  Let $n(\cdot)$ be the
number of elements in some $(\cdot)$.  Then

$$
n(\not A) = n(S) - n(A)
$$

To find the number of elements in a union, we have to subtract the
double-counted elements in the intersection:

$$
n(A\or B) = n(A) + n(B) - n(A\and B)
$$

Empty sets: $\emptyset$, $\varnothing$; "mutually exclusive events"
means $A\and B = \emptyset$.

Multiplication principle; permutations.

Note that if I have $n$ objects and I only select $r$ of them, the
number of permutations is $P(n,r) = n!/(n-r)!$.  I recover the
familiar $P(n) = n!$ for the case where you take all of the objects,
$r=n$.

Combinations and binomial coefficients.  The binomial coefficient
("number of combinations") is sometimes written $C(n,r)$.

Note: the permutation button on a calculator is usually labeled "nPr";
the combination or "n choose r" button is labeled "nCr".

For partitions, multinomial coefficients.  Suppose twenty people are
split into groups of ten, six, and four.  The number of ways to do
this is

$$
{20\choose 4}{16\choose 6}{10\choose 10}
= \frac{20!}{4!\ 16!}\cdot \frac{16!}{6!\ 10!}\cdot \frac{10!}{10!\ 0!}
= \frac{20!}{4!\ 6!\ 10!}
$$

<!-- [2025-06-20 Fri 14:47] -->
# Exercises

question 2-27:
:   Given the six letters ABCDEF, how many four-letter words can be
    made with and without repetitions?

With repetitions, $6^4 = 1296$;
without, $P(6,4) = 6\cdot5\cdot4\cdot3 = 1296$.

question 2-39:
:   How many ways are there to arrange the letters in the word MISSISSIPPI?

Eleven total letters: one M, four I, four S, two P.

There are eleven places the M can go.  That leaves ten places for the
four I, $10 \choose 4$ options, and so on.  So it's the multinomial:
$$
{11\choose 1,4,4,2} = \frac{10!}{1!\ 4!\ 4!\ 2!} = 34\,650.
$$

## Sample actuarial exam problem

question 2-47:
:   An auto insurance company has 10,000 policyholders.  Each is
    classified as

    1. young or old
    2. male or female
    3. married or single

    Of these, 3000 are young, 4600 are male, 7000 are married.
    There are 1320 young males, 3010 married males, and 1400 young
    married persons.
    Finally, 600 policyholders are young married males.

    How many are young, female, and single?

| set                  | symbol          |  count | complement | count |
|----------------------|:---------------:|-------:|:----------:|------:|
| all                  | $S$             | 10 000 |            |       |
| young                | $Y$             |  3 000 | $\not Y$   | 7 000 |
| male                 | $M$             |  4 600 | $\not M$   | 5 400 |
| wed                  | $W$             |  7 000 | $\not W$   | 3 000 |
| young and male       | $Y\and M$       |   1320 |            |       |
| married and male     | $W\and M$       |   3010 |            |       |
| young and married    | $Y\and W$       |   1400 |            |       |
| young, married, male | $Y\and W\and M$ |    600 |            |       |

Boy, the Venn diagram really is the way to approach this.  Start from
the center and work my way out.

| set                   | arithmetic                | count |
|:---------------------:|:-------------------------:|------:|
| $Y\and W \and M$      | 600                       |   600 |
| $Y\and M\and(\not W)$ | 1320 - 600                |   720 |
| $W\and M\and(\not Y)$ | 3010 - 600                |  2410 |
| $Y\and W\and(\not M)$ | 1400 - 600                |   800 |
| $Y\and \not(M\or W)$  | 3000 - (600 + 720 + 800)  |   880 |
| $M\and \not(Y\or W)$  | 4600 - (600 + 720 + 2410) |   870 |
| $W\and \not(Y\or M)$  | 7000 - (600 + 800 + 2410) |  3190 |

That's annoying:  I don't have 10,000 total people.  What have I missed?

### Some hours later ...

That problem failed my sanity check because it's overdetermined.
If you satisfy all of the conditions on the subsets, you can't end up
with 10,000 policies.  Proving this was the case
(rather than me being confused, or repeatedly making the same
arithmetic mistake) just ate up a great big chunk of time.  So let's
write down what I did.

I wanted to generate sets that had these properties, so I wrote a
little Python class:

```python
import itertools
class FillableSet (set):
    """Extend the builtin ``set`` to populate itself up to a fixed size.
    Without an explicit source, it will populated with positive integers.
    With ``exclude``, ignore some elements that may or may not appear in ``source``.
    """
    def __init__(self, size, *, source=None, exclude=[-1]):
        self.size = size
        self.fill(source=source, exclude=exclude)

    def fill(self, *, source=None, exclude=[]):
        if source is None:
            source = itertools.count(1)
        for s in source:
            if len(self) >= self.size:
                break
            if s in exclude:
                continue
            self.add(s)
        print("Length:", len(self))
        if self:
            print("Largest element:", max(list(self)))
```

The `FillableSet` will populate itself with members of the `source`
(or with consecutive integers).  So now we generate the sets.  Without
loss of generality, we'll make the young people first:

```python
>>> y = FillableSet(3000)
Length: 3000
Largest element: 3000
```

Now let's generate the set of male people.  That'll be the union of
young male people and non-young male people,
$$M = (M\and Y) \or (M\and\not Y).$$
So first we'll generate some young male people.

```python
>>> y_m = FillableSet(1320, source=y)
Length: 1320
Largest element: 1320
>>> m = FillableSet(4600, source=y_m); m.fill(exclude=y)
Length: 1320
Largest element: 1320
Length: 4600
Largest element: 6280
```

Now we need to generate married people.  Some of them are young and
male:
```python
>>> y_m_w = FillableSet( 600, source=y_m)
Length: 600
Largest element: 600
```

Some of them are not young, and some of them are not male, so let's
generate those.  The expressions are
$$ M\and W = M\and W \and (Y\or\not Y),$$
```python
>>> m_w   = FillableSet(3010, source=y_m_w) ; m_w.fill(exclude=y)
Length: 600
Largest element: 600
Length: 3010
Largest element: 5410
```

and
$$ W\and Y = (M \or \not M)\and W \and Y.$$

```python
>>> w_y   = FillableSet(3010, source=y_m_w) ; w_y.fill(exclude=m)
Length: 600
Largest element: 600
Length: 3010
Largest element: 7010
```

Now we just have to fill out $W$, the married people.  We have three
subsets of $W$, and we add new people until we have the size we want.

```python
>>> w = FillableSet(7000, source=[])
Length: 0
>>> for source in y_m_w, w_y, m_w:
...     w.fill(source=source)
...
Length: 600
Largest element: 600
Length: 1400
Largest element: 2120
Length: 3810
Largest element: 5410
>>> w.fill(exclude=set.union(y,m))
Length: 7000
Largest element: 9470
```

That reproduces our list of intersections:

| set                 | size |
|:-------------------:|-----:|
| y                   | 3000 |
| m                   | 4600 |
| w                   | 7000 |
| y $\cap$ m          | 1320 |
| m $\cap$ w          | 3010 |
| y $\cap$ w          | 1400 |
| y $\cap$ m $\cap$ w |  600 |
| y $\cup$ m $\cup$ w | 9470 |

But the 10,000 policyholders are not there.  The problem is
overdetermined:  we have one set too many.