<!--
.. title: Hassett, chapter 11: Applying multivariate distributions
.. date: 2025-07-08 Tue 16:06 UTC-05:00
.. description: notes on Hassett and Stewart, chapter 11
.. type: text
.. has_math: true
-->

{{% hassett_navigation %}}

[TOC]

# 11.1 Distributions of functions of two random variables

Consider the random variable functions

$$
X+Y \quad X-Y \quad \min(X,Y) \quad \max(X,Y)
$$

For sums, we're basically adding up all of the probabilities

$$\begin{aligned}
p_S(s) &= \sum_x p(x, s-x)
\\\\   &= \sum_x p_X(x) \cdot p(s-x | x)
\\\\   &= \sum_x p_X(x) \cdot p_Y(s-x) \text{(if independent)}
\end{aligned}$$

<!-- ## (excluding 11.1.4) -->

For independent continuous variables, likewise

$$\begin{aligned}
f_S(s)
    &= \int_{-\infty}^{\infty} f_X(x)\cdot f_Y(s-x) \mathrm dx
\end{aligned}$$

## Sums of exponential variables are gamma-distributed

Let's look at the waiting times between accidents in two towns.  The
p.d.f. and the marginal density functions are

$$\begin{aligned}
    f(x,y) &= e^{-x-y} \\\\
    f_X(x) &= e^{-x} \\\\
    f_Y(y) &= e^{-y} \\\\
\end{aligned}$$

Let's show (as was done in the previous chapter, but I skipped it)
that $X$ and $Y$ are independent.  We have the marginal probability

$$\begin{aligned}
f_Y(y)  &= \int_0^\infty \mathrm dx\ e^{-x-y}
\\\\    &= e^{-y} \cdot \left(1-0\right) \text{, as assumed above}
\end{aligned}$$

and the conditional probability

$$\begin{aligned}
f(x|y)
        &= \frac{f(x,y)}{f(y)}
\\\\    &= \frac{e^{-x-y}}{e^{-y}} = f_X(x)
\end{aligned}$$

and vice-versa.  So they're independent.  Now we want to find the
density function for $X+Y$.

$$\begin{aligned}
f(X+Y)
    &= \int_0^\infty \mathrm dx\ f_X(x) \cdot f_Y(s-x)
\\\\&= \int_0^\infty \mathrm dx\ e^{-x} \cdot e^{-(s-x)}
\\\\&= \int_0^\infty \mathrm dx\ e^{-s} \qquad ???
\end{aligned}$$

Oh, this is subtle: because the probability density for a *negative*
waiting time is *zero*, I should have changed the limits of the
integral.  Let's write that again, more clearly.

$$\begin{aligned}
f(x,y)
    &= \begin{cases}
        e^{-x-y} & (x,y) \geq 0 \\\\
        0 & x<0 \text{ or } y < 0 \\\\
    \end{cases}
\\\\
f_X(x)
    &= \begin{cases}
        e^{-x} & x > 0 \\\\
        0 & x < 0
    \end{cases}
\\\\
f_Y(y)
    &= \begin{cases}
        e^{-y} & y > 0 \\\\
        0 & y < 0
    \end{cases}
\end{aligned}$$

So now we have the joint probability distribution for the sum of

$$\begin{aligned}
f(X+Y)
    &= \int_0^\infty \mathrm dx\ f_X(x) \cdot f_Y(s-x)
\\\\&= \int_0^s \mathrm dx\ e^{-x} \cdot e^{-(s-x)}
        + \int_s^\infty \mathrm dx\ e^{-x} \cdot 0
\\\\&= e^{-s} \int_0^s \mathrm dx\ 1
\\\\&= s e^{-s}
\end{aligned}$$

Because $X$ and $Y$ were exponentially distributed with $\beta=1$,
their sum $X+Y$ should be gamma-distributed with $(\alpha,\beta) =
(2,1)$.  Remember that the gamma distribution has density

$$\begin{aligned}
f_\Gamma(x)
    &=  \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}.
\end{aligned}$$

I can sort of see how to get this relation in general from induction,
but I shouldn't spend the time on it.

## Minimum of two random variables

Now let's look at $\min(X+Y)$ for exponential distributions.  Remember
the cumulative and survival functions

$$\begin{aligned}
F(t) &= &P(X \leq t) &= 1-e^{-\beta t} \\\\
S(t) &= &P(X > t) &= e^{-\beta t} \\\\
\end{aligned}$$

Suppose $X$ and $Y$ are exponentially distributed with parameters
$\beta,\lambda$.  The probability that $\min(X,Y)$ is at least some
time $t$ is the probability that *both* $X$ and $Y$ have survived
through $t$.  That is

$$\begin{aligned}
S(t)
        &= P(\min{X,Y} > t)
\\\\    &= P(X > t \and Y > t)
\\\\    &= P(X > t) \cdot P(Y > t), \text{ by independence}
\\\\    &= S(X) \cdot S(Y)
\\\\    &= e^{-\beta t}e^{-\lambda t} = e^{-(\beta+\lambda)t}
\end{aligned}$$

This survival functions means that the minimum time is exponentially
distributed with parameter $\beta+\lambda$.

This procedure works for the minimum of any two (independent) random
variables.  The survival obeys

$$\begin{aligned}
S_\text{min}(t)
        & = S(X) \cdot S(Y)
\end{aligned}$$

because both independent variables must survive.
The survival function for the minimum might correspond to a known distribution.
Likewise, the distribution for the maximum of two variables can be
found by preserving the c.d.f.,

$$\begin{aligned}
F_\text{max}(t)
        &=  F_X(t) \cdot F_Y(t)
\end{aligned}$$

# 11.2 Expected values of functions of random variables

We don't need to know distributions to find expectation values,
because we can do

$$\begin{aligned}
E[ g(X,Y) ]
    &= \sum_{x,y} g(x,y) \cdot p(x,y)
\end{aligned}$$

or its continuous equivalent.

Because expectation values are linear,

$$\begin{aligned}
E(X+Y) &= E(X) + E(Y)
\end{aligned}$$

Products don't work like that, unless $X$ and $Y$ are independent.

## Covariance

The covariance is

$$\begin{aligned}
\text{Cov} (X,Y)
    &= E[ (X-\mu_x) \cdot (Y-\mu_Y) ]
\end{aligned}$$

with positive and negative associations/correlations having the usual
meaning.

An alternative definition is

$$\begin{aligned}
\text{Cov}(X,Y)
    &= E(XY) - E(X)\cdot E(Y)
\end{aligned}$$

This formulation makes it clear (based on the statements above) that,
if $X$ and $Y$ are independent, their covariance will be zero.

The variance of the sum depends on the covariance:

$$\begin{aligned}
V(X+Y)
    &= V(X) + V(Y) + 2\cdot\text{Cov}(X+Y)
\end{aligned}$$

Some properties:

* The covariance is symmetric: $\text{Cov}(X,Y) = \text{Cov}(Y,X)$.
* The covariance of a random variable with itself is its variance.
* The covariance of a random variable with a constant is zero.
* Scaling either random variable scales the covariance:
  $$ \text{Cov}(aX,bY) = ab\cdot\text{Cov}(X,Y).$$
* Covariances are distributive:
  $$\text{Cov}(X,Y+Z) = \text{Cov}(X,Y) + \text{Cov}(X,Z).$$

## Correlation coefficients

The correlation coefficient is

$$\begin{aligned}
\rho_{XY}
    &= \frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}
\\\\&+ \frac{\text{Cov}(X,Y)}{\sqrt{V(X)\cdot V(Y)}
\end{aligned}$$

Variables which are linearly related have a unit correlation
coefficient:

$$\begin{aligned}
\rho_{XY}
    &= \frac{\text{Cov}(X, aX+b)}{\sigma_X\sigma_{aX+b}}
\\\\&= \frac{\text{Cov}(X,aX) + \text{Cov}(X,b)}{\sigma_X(|a|\sigma_{aX+b})}
\\\\&= \frac{a\cdot V(X) + 0}{|a| \sigma_X^2}
\\\\&= \pm 1, \text{ depending on the sign of } a.
\end{aligned}$$

## (exclude) Bivariate normal distribution

Correlated normal variables can have

$$\begin{aligned}
f(x,y)
    &=  \frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}}
    e^{
    \frac{-1}{2(1-\rho^2)}
    \left[
        \left( \frac{x-\mu_x}{\sigma_x}\right)^2
        - 2\rho
            \left( \frac{x-\mu_x}{\sigma_x}\right)
            \left( \frac{y-\mu_y}{\sigma_y}\right)
        + \left( \frac{y-\mu_y}{\sigma_y}\right)^2
    \right]
    }
\end{aligned}$$

# 11.3 (exclude) Moment generating functions for sums of independent random variables

I don't care about moment-generating functions.

# 11.4 The sum of more than two random variables
## Sums of different distributions
### Poisson $\to$ Poisson

If $X_1,X_2,\cdots,X_n$ are independent Poisson random variables
with parameters $\lambda_1,\lambda_2,\cdots,\lambda_n$, then their
sum $\sum_i X_i$ is Poisson distributed with parameter $\sum_i\lambda_i$.

### Geometric $\to$ negative binomial

The sum of $n$ i.i.d. geometric random variables with success
probability $p$ is a negative binomial random variable with the same
$p$ and $r=n$.

### Normal $\to$ normal

If the $X_i$ are independent normal variables with means $\mu_i$ and
variances $\sigma_i^2$, then the sum $\sum_i X_i$ has mean
$\sum_i\mu_i$ and variance $\sum_i \sigma_i ^2$.

### Exponential $\to$ gamma

If the $X_i$ are i.i.d. exponential random variables with parameter
$\beta$, their sum $\sum_i X_i$ is a gamma random variable with
$\alpha=n$ and the same $\beta$.


## Mean and variance of a multiple sums

In a triple sum, the pairwise terms all enter the covariance twice.

$$\begin{aligned}
E(X+Y+Z)
    &= E(X) + E(Y) + E(Z)
\\\\
V(X+Y+Z)
    &= V(X) + V(Y) + V(Z)
\\\\& \quad + 2\times\left(
        \text{Cov}(X,Y) + \text{Cov}(X,Z) + \text{Cov}(Y,Z)
    \right)
\end{aligned}$$

In fact, that's true no matter how many terms are in the sum.

$$\begin{aligned}
E\left(
    \sum_i X_i
\right)
        &= \sum_i E(X_i)
\\\\
V\left(
    \sum_i X_i
\right)
        &= \sum_i V(X_i) + 2\sum_{i<j} \text{Cov}(X_i,X_j)
\end{aligned}$$

## Central limit theorem

*Big* sums are normally distributed.

# 11.5 Double expectation theorem

The expectation value of a *conditional* probability
is just the expectation value of the variable back again:

$$\begin{aligned}
E[ E(X|Y) ]
    &= E(X)
\end{aligned}$$}

The conditional variance is

$$\begin{aligned}
V(X|Y=y)
    &= E(X^2 | Y=y) - \left( E(X|Y=y) \right)^2
\end{aligned}$$

Eventually this cute thing happens:

$$\begin{aligned}
V(X) &= E[ V(X|Y) ] + V[ E(X|Y) ] \\\\
V(Y) &= E[ V(Y|X) ] + V[ E(Y|X) ] \\\\
\end{aligned}$$

# 11.6 Applying the double expectation theorem
# Problems
