<!--
.. title: division and conquest
.. slug: division-and-conquest
.. date: 2025-07-20 18:32:50 UTC-05:00
.. tags: probability
.. category: math
.. description: In which I perhaps overcomplicate things.
.. type: text
.. has_math: true
-->

Suppose I have some random process described by a binomial
distribution, with "success" probability $p$.  For $n$ trials, the
expected number of "successes" obeys

$$\begin{aligned}
P(k) &= {n\choose k} p^k (1-p)^{n-k}
\end{aligned}$$

Now suppose I do a bunch of different sets of trials, such as
[practice exams](/pages/study/p-exam) of varying lengths.  I want to
model each practice exam as being drawn from a distribution of the
appropriate size with the same probability.  What's the right way to
combine them?

<!-- TEASER_END -->

# Warmup: Poisson combination

Suppose first I have a Poisson distribution with
$\mu\pm\sigma = n\pm\sqrt n$.
I combine two of these (with $n+m=N$) to get

$$\begin{aligned}
\left\langle N\right\rangle \pm \sigma_N
    &= (n+m) \pm \sqrt{\sigma_n^2 + \sigma_m^2}
\\\\&= (n+m) \pm \sqrt{n+m}
\\\\&= N \pm \sqrt N
\end{aligned}$$

So the sum of some Poisson-distributed variables is still
Poisson-distributed.

# The binomial distribution

With the probability mass functionn

$$\begin{aligned}
P(k)
    &= {n\choose k} p^k q^{n-k}
\end{aligned}$$

we have expectation value $E(k) = np$ and variance $V(k) = npq$.
Suppose we do this twice, with total sample size $N=n+m$
and total success rate $K=k_n+k_m$.
I guess the probability distribution for $K$ goes like

$$\begin{aligned}
P_N(K)
    &= \sum_{k_n} P_n(k_n) \ P_m(K-k_n)
\\\\&= \sum_{k_n}
        {n\choose k_n} p^{k_n} (1-p)^{n-k_n}
        \ %
        {m\choose k_m} p^{k_m} (1-p)^{m-k_m}
\\\\&= \sum_{k_n}
        {n \choose k_n} {N-n \choose K-k_n}
        p^{K} q^{N-K}
\end{aligned}$$

This looks suspiciously like an identity,

$$\begin{aligned}
{N\choose K}
    &\overset{?}{=} \sum_{k=0}^{\min(n,K)} {n \choose k} {N-n \choose K-k}
\end{aligned}$$

For $n=0$ we have trivial agreement.
For $n=1$, we have a special case of the identity

$$\begin{aligned}
{n+1\choose k+1}
    &= {n\choose k} + {n \choose k+1}.
\end{aligned}$$

In the Pascal's Triangle approach, this corresponds to adding adjacent
entries on one row to get the entry between them on the next row.

In fact, the "identity" I've got looks an awful lot like recursion
going $n$ rows up the triangle.  For example, to find ${7\choose 3} = 35$,
we can go up three rows to take

$$\begin{aligned}
{7\choose 3}
    &= 1\times1 + 3\times4 + 3\times 6 + 1\times4
\\\\&= {3\choose3}{4\choose0} + {3\choose2}{4\choose1}
        + {3\choose1}{4\choose2} + {3\choose0}{4\choose3}
\end{aligned}$$

as shown here:

![Pascal's triangle](pascal.svg)
{style="text-align: center"}

# Next question

So in [my model](/pages/study/p-exam), the value for $p$ which
minimizes the $\chi^2$ for all the individual practice sessions is a
little different from the value for $p$ which I get from just dividing
all the correct questions by all the attempted questions.  Why is
that?  I have a couple of ideas, which I'll get into later on.