SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang dd8345a6ea

20260420:first commit

2026-04-20 11:42:41 +08:00

4.5 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

Cramér-Rao Lower Bound (CRLB)

Definition

The Cramér-Rao Lower Bound (CRLB) states that for any unbiased estimator of a population parameter \theta, the lowest possible variance is the reciprocal of the Fisher Information I(\theta):

\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}

It represents a fundamental limit in statistical estimation: no matter how clever your estimation method is, you cannot beat this bound.

Key Concepts

1. The Score Function

The score g(\theta; \mathbf{x}) is the derivative of the log-likelihood with respect to the parameter:

g(\theta; \mathbf{x}) = \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta)

It measures the "force" the data exerts on the parameter estimate.
Crucial property: \mathbb{E}[g(\theta; \mathbf{x})] = 0 (under regularity conditions).

2. Fisher Information

Fisher Information I(\theta) is the variance of the score function:

I(\theta) = \text{Var}(g(\theta; \mathbf{x})) = \mathbb{E}\left[ \left( \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta) \right)^2 \right]

Alternative expression (via curvature):

I(\theta) = -\mathbb{E}\left[ \frac{\partial^2}{\partial \theta^2} \log f(\mathbf{x} \mid \theta) \right]

This connects information directly to the curvature of the log-likelihood function. A sharper peak (higher curvature) means higher information and a tighter bound.

Properties:

I(\theta) is proportional to sample size n (I_n = n \cdot I_1).
Higher variance in the data means lower information per data point.

3. Observed vs. Expected Information

Expected Information: Uses the true parameter and expectation over all possible data. Formula-based.
Observed Information: Uses the actual observed data and the estimated parameter \hat{\theta}. Computed from the Hessian of the log-likelihood at \hat{\theta}.
In practice (especially in MLE), standard errors are calculated using the observed information.

Classic Examples

Normal Distribution (Mean Estimation)

Parameter: \mu
Score: g(\mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)
Fisher Information: I = \frac{n}{\sigma^2}
CRLB: \frac{\sigma^2}{n}
Conclusion: The sample mean \bar{x} is the "best" unbiased estimator, as its variance exactly hits the bound.

Binomial Distribution (Proportion Estimation)

Parameter: \pi
Score: g(\pi) = \frac{k}{\pi} - \frac{n-k}{1-\pi}
Fisher Information: I = \frac{n}{\pi(1-\pi)}
CRLB: \frac{\pi(1-\pi)}{n}
Conclusion: The sample proportion \hat{\pi} = k/n is the optimal unbiased estimator.

Connection to Maximum Likelihood Estimation (MLE)

MLE is consistent and asymptotically efficient.
As sample size n \to \infty, the variance of the MLE approaches the CRLB: \text{Var}(\hat{\theta}_{\text{MLE}}) \approx 1/I(\theta).
This is why standard errors reported by MLE software are calculated as 1/\sqrt{I_{\text{observed}}}.

Role in Computerized Adaptive Testing (CAT)

In CAT, the CRLB dictates the theoretical limit of measurement precision.

Each question contributes a certain amount of Fisher Information I_i(\theta).
The test continues until the accumulated information I(\theta) = \sum I_i(\theta) is large enough that 1/I(\theta) (the minimum possible variance) is below a predefined threshold.
选题策略 (Item Selection): Choosing the item with the maximum I_i(\theta) at the current ability estimate \hat{\theta} is equivalent to driving the CRLB down as fast as possible.

Multidimensional Extension (Information Matrix)

For a vector of parameters \boldsymbol{\theta}, the Fisher Information becomes a matrix \mathbf{I}(\boldsymbol{\theta}). The CRLB states that the covariance matrix of any unbiased estimator satisfies:

\text{Cov}(\hat{\boldsymbol{\theta}}) \succeq \mathbf{I}(\boldsymbol{\theta})^{-1}

(where \succeq denotes positive semi-definiteness).

4.5 KiB Raw Blame History Unescape Escape