the james-stein estimator

I am reminded today by a video by "Mathemaniac" about the James-Stein estimator.

Suppose I'm trying to estimate a number $n$ of independent parameters simultaneously, by taking a sample from each one-dimensional normal distribution with unknown means $\mu_n$ and unit standard deviations $\sigma_n=1$. The naïve estimator is to use each sample $x_n$ as an estimate $\hat\mu_n$ of the mean. However, if my number of parameters is large enough, the zero-biased estimator

$$ \left(\begin{array}{c} \hat \mu_1 \\ \vdots \\ \hat \mu_n \end{array}\right) = \left( 1-\frac{n-2}{x_1^2 + \cdots + x_n^2} \right) \left(\begin{array}{c} x_1 \\ \vdots \\ x_n \end{array}\right) $$

actually produces a smaller mean-squared error on the ensemble as a whole.

What's happening is that the vector of means $\vec\mu = (\mu_1\cdots \mu_1)$ corresponds to some point in the $n$-dimensional parameter space. Since all of the standard deviations $\sigma_n$ are equal, the observations $\vec x= (x_1\cdots x_n)$ will be drawn from a sphere centered around $\vec \mu$. Unless $\vec x$ happens to be within a sphere with radius $|\vec\mu|/2$ centered on $\vec\mu/2$, applying the James-Stein correction will make the estimator $\vec{\hat\mu}$ closer to the true $\vec\mu$ than the raw observation vector $\vec x$.

The effect is biggest if all of the $\mu_n/\sigma_n$ are small, because the (hyper)sphere of wrong-way bias is small.

Weird. Worth remembering.