Files
myWiki/concepts/cramer-rao-lower-bound.md

78 lines
4.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Cramér-Rao Lower Bound (CRLB)
created: 2026-04-17
updated: 2026-04-17
type: concept
tags: [machine-learning, benchmark]
sources: [raw/papers/hbs-cramerrao-bound-notes.md]
---
# Cramér-Rao Lower Bound (CRLB)
## Definition
The Cramér-Rao Lower Bound (CRLB) states that for **any unbiased estimator** of a population parameter $\theta$, the lowest possible variance is the reciprocal of the Fisher Information $I(\theta)$:
$$\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}$$
It represents a fundamental limit in statistical estimation: no matter how clever your estimation method is, you cannot beat this bound.
## Key Concepts
### 1. The Score Function
The score $g(\theta; \mathbf{x})$ is the derivative of the log-likelihood with respect to the parameter:
$$g(\theta; \mathbf{x}) = \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta)$$
- It measures the "force" the data exerts on the parameter estimate.
- **Crucial property:** $\mathbb{E}[g(\theta; \mathbf{x})] = 0$ (under regularity conditions).
### 2. Fisher Information
Fisher Information $I(\theta)$ is the variance of the score function:
$$I(\theta) = \text{Var}(g(\theta; \mathbf{x})) = \mathbb{E}\left[ \left( \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta) \right)^2 \right]$$
**Alternative expression (via curvature):**
$$I(\theta) = -\mathbb{E}\left[ \frac{\partial^2}{\partial \theta^2} \log f(\mathbf{x} \mid \theta) \right]$$
This connects information directly to the curvature of the log-likelihood function. A sharper peak (higher curvature) means higher information and a tighter bound.
**Properties:**
- $I(\theta)$ is proportional to sample size $n$ ($I_n = n \cdot I_1$).
- Higher variance in the data means lower information per data point.
### 3. Observed vs. Expected Information
- **Expected Information:** Uses the true parameter and expectation over all possible data. Formula-based.
- **Observed Information:** Uses the actual observed data and the estimated parameter $\hat{\theta}$. Computed from the Hessian of the log-likelihood at $\hat{\theta}$.
- In practice (especially in MLE), standard errors are calculated using the observed information.
## Classic Examples
### Normal Distribution (Mean Estimation)
- **Parameter:** $\mu$
- **Score:** $g(\mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)$
- **Fisher Information:** $I = \frac{n}{\sigma^2}$
- **CRLB:** $\frac{\sigma^2}{n}$
- **Conclusion:** The sample mean $\bar{x}$ is the "best" unbiased estimator, as its variance exactly hits the bound.
### Binomial Distribution (Proportion Estimation)
- **Parameter:** $\pi$
- **Score:** $g(\pi) = \frac{k}{\pi} - \frac{n-k}{1-\pi}$
- **Fisher Information:** $I = \frac{n}{\pi(1-\pi)}$
- **CRLB:** $\frac{\pi(1-\pi)}{n}$
- **Conclusion:** The sample proportion $\hat{\pi} = k/n$ is the optimal unbiased estimator.
## Connection to Maximum Likelihood Estimation (MLE)
- MLE is **consistent** and **asymptotically efficient**.
- As sample size $n \to \infty$, the variance of the MLE approaches the CRLB: $\text{Var}(\hat{\theta}_{\text{MLE}}) \approx 1/I(\theta)$.
- This is why standard errors reported by MLE software are calculated as $1/\sqrt{I_{\text{observed}}}$.
## Role in Computerized Adaptive Testing (CAT)
In CAT, the CRLB dictates the theoretical limit of measurement precision.
- Each question contributes a certain amount of Fisher Information $I_i(\theta)$.
- The test continues until the accumulated information $I(\theta) = \sum I_i(\theta)$ is large enough that $1/I(\theta)$ (the minimum possible variance) is below a predefined threshold.
- **选题策略 (Item Selection):** Choosing the item with the maximum $I_i(\theta)$ at the current ability estimate $\hat{\theta}$ is equivalent to driving the CRLB down as fast as possible.
## Multidimensional Extension (Information Matrix)
For a vector of parameters $\boldsymbol{\theta}$, the Fisher Information becomes a matrix $\mathbf{I}(\boldsymbol{\theta})$. The CRLB states that the covariance matrix of any unbiased estimator satisfies:
$$\text{Cov}(\hat{\boldsymbol{\theta}}) \succeq \mathbf{I}(\boldsymbol{\theta})^{-1}$$
(where $\succeq$ denotes positive semi-definiteness).
## 相关概念
- [[computerized-adaptive-testing]] — CAT 的核心目标是最小化能力估计方差CRLB 提供了理论下界,选题策略本质上是在最大化 Fisher 信息以快速逼近该下界。
- [[eml-operator]] — EML 树的梯度优化依赖于对参数空间的曲率估计,与 CRLB 中 Fisher 信息作为对数似然曲率的数学本质相通。