78 lines
4.5 KiB
Markdown
78 lines
4.5 KiB
Markdown
---
|
||
title: Cramér-Rao Lower Bound (CRLB)
|
||
created: 2026-04-17
|
||
updated: 2026-04-17
|
||
type: concept
|
||
tags: [machine-learning, benchmark]
|
||
sources: [raw/papers/hbs-cramerrao-bound-notes.md]
|
||
---
|
||
|
||
# Cramér-Rao Lower Bound (CRLB)
|
||
|
||
## Definition
|
||
The Cramér-Rao Lower Bound (CRLB) states that for **any unbiased estimator** of a population parameter $\theta$, the lowest possible variance is the reciprocal of the Fisher Information $I(\theta)$:
|
||
$$\text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}$$
|
||
|
||
It represents a fundamental limit in statistical estimation: no matter how clever your estimation method is, you cannot beat this bound.
|
||
|
||
## Key Concepts
|
||
|
||
### 1. The Score Function
|
||
The score $g(\theta; \mathbf{x})$ is the derivative of the log-likelihood with respect to the parameter:
|
||
$$g(\theta; \mathbf{x}) = \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta)$$
|
||
- It measures the "force" the data exerts on the parameter estimate.
|
||
- **Crucial property:** $\mathbb{E}[g(\theta; \mathbf{x})] = 0$ (under regularity conditions).
|
||
|
||
### 2. Fisher Information
|
||
Fisher Information $I(\theta)$ is the variance of the score function:
|
||
$$I(\theta) = \text{Var}(g(\theta; \mathbf{x})) = \mathbb{E}\left[ \left( \frac{\partial}{\partial \theta} \log f(\mathbf{x} \mid \theta) \right)^2 \right]$$
|
||
|
||
**Alternative expression (via curvature):**
|
||
$$I(\theta) = -\mathbb{E}\left[ \frac{\partial^2}{\partial \theta^2} \log f(\mathbf{x} \mid \theta) \right]$$
|
||
This connects information directly to the curvature of the log-likelihood function. A sharper peak (higher curvature) means higher information and a tighter bound.
|
||
|
||
**Properties:**
|
||
- $I(\theta)$ is proportional to sample size $n$ ($I_n = n \cdot I_1$).
|
||
- Higher variance in the data means lower information per data point.
|
||
|
||
### 3. Observed vs. Expected Information
|
||
- **Expected Information:** Uses the true parameter and expectation over all possible data. Formula-based.
|
||
- **Observed Information:** Uses the actual observed data and the estimated parameter $\hat{\theta}$. Computed from the Hessian of the log-likelihood at $\hat{\theta}$.
|
||
- In practice (especially in MLE), standard errors are calculated using the observed information.
|
||
|
||
## Classic Examples
|
||
|
||
### Normal Distribution (Mean Estimation)
|
||
- **Parameter:** $\mu$
|
||
- **Score:** $g(\mu) = \frac{n}{\sigma^2}(\bar{x} - \mu)$
|
||
- **Fisher Information:** $I = \frac{n}{\sigma^2}$
|
||
- **CRLB:** $\frac{\sigma^2}{n}$
|
||
- **Conclusion:** The sample mean $\bar{x}$ is the "best" unbiased estimator, as its variance exactly hits the bound.
|
||
|
||
### Binomial Distribution (Proportion Estimation)
|
||
- **Parameter:** $\pi$
|
||
- **Score:** $g(\pi) = \frac{k}{\pi} - \frac{n-k}{1-\pi}$
|
||
- **Fisher Information:** $I = \frac{n}{\pi(1-\pi)}$
|
||
- **CRLB:** $\frac{\pi(1-\pi)}{n}$
|
||
- **Conclusion:** The sample proportion $\hat{\pi} = k/n$ is the optimal unbiased estimator.
|
||
|
||
## Connection to Maximum Likelihood Estimation (MLE)
|
||
- MLE is **consistent** and **asymptotically efficient**.
|
||
- As sample size $n \to \infty$, the variance of the MLE approaches the CRLB: $\text{Var}(\hat{\theta}_{\text{MLE}}) \approx 1/I(\theta)$.
|
||
- This is why standard errors reported by MLE software are calculated as $1/\sqrt{I_{\text{observed}}}$.
|
||
|
||
## Role in Computerized Adaptive Testing (CAT)
|
||
In CAT, the CRLB dictates the theoretical limit of measurement precision.
|
||
- Each question contributes a certain amount of Fisher Information $I_i(\theta)$.
|
||
- The test continues until the accumulated information $I(\theta) = \sum I_i(\theta)$ is large enough that $1/I(\theta)$ (the minimum possible variance) is below a predefined threshold.
|
||
- **选题策略 (Item Selection):** Choosing the item with the maximum $I_i(\theta)$ at the current ability estimate $\hat{\theta}$ is equivalent to driving the CRLB down as fast as possible.
|
||
|
||
## Multidimensional Extension (Information Matrix)
|
||
For a vector of parameters $\boldsymbol{\theta}$, the Fisher Information becomes a matrix $\mathbf{I}(\boldsymbol{\theta})$. The CRLB states that the covariance matrix of any unbiased estimator satisfies:
|
||
$$\text{Cov}(\hat{\boldsymbol{\theta}}) \succeq \mathbf{I}(\boldsymbol{\theta})^{-1}$$
|
||
(where $\succeq$ denotes positive semi-definiteness).
|
||
|
||
## 相关概念
|
||
- [[computerized-adaptive-testing]] — CAT 的核心目标是最小化能力估计方差,CRLB 提供了理论下界,选题策略本质上是在最大化 Fisher 信息以快速逼近该下界。
|
||
- [[eml-universal-operator]] — EML 树的梯度优化依赖于对参数空间的曲率估计,与 CRLB 中 Fisher 信息作为对数似然曲率的数学本质相通。
|