myWiki/concepts/tpp-training-methods.md

---
title: "TPP 训练方法"
created: 2026-06-16
updated: 2026-06-16
type: concept
tags: [temporal-point-process, training, estimation, optimization]
sources: [raw/papers/advances-temporal-point-processes-2026.md]
---

# TPP 训练方法 (Training Methods for TPP)

TPP 模型的参数估计面临独特挑战：连续时间的似然函数通常涉及强度函数的积分。不同训练目标在计算与统计效率间有本质权衡。

## 四大训练目标

### 1. 最大似然估计 (MLE)

```
theta_hat = arg max sum log lambda*(t_n) - ∫_0^T lambda*(tau) dtau
```

- **散度类型**：KL 散度
- **需要似然**：是
- **需要数值积分**：是（强度积分）
- **统计效率**：渐近最优（最小方差）
- **代表**：Ozaki (1979), Paninski (2004)

### 2. Wasserstein 距离训练

```
theta_hat = arg min W(f_data, f_theta)
```

- **散度类型**：Wasserstein 距离
- **需要似然**：否
- **需要数值积分**：否
- **统计效率**：一致的，方差较高
- **代表**：Xiao et al. (2017a, 2018)

### 3. 噪声对比估计 (NCE)

将参数学习转化为二分类问题——区分真实序列和噪声序列：

```
Loss = E_{T~real}[log sigma(s_theta(T))] + E_{T~noise}[log(1-sigma(s_theta(T)))]
```

- **散度类型**：基于分类
- **需要似然**：否
- **需要数值积分**：否
- **统计效率**：一致的，方差较高
- **代表**：Guo et al. (2018), Mei et al. (2020)

### 4. 评分匹配 / Fisher 散度

通过评分函数（score function）的梯度衡量分布差异：

```
theta_hat = arg min E_f[||∇log f(T) - ∇log f_theta(T)||^2]
```

- **散度类型**：Fisher 散度
- **需要似然**：否（仅需评分函数）
- **需要数值积分**：否
- **统计效率**：一致的，方差较高
- **代表**：Sahani et al. (2016), Li et al. (2023), Cao et al. (2024)

## 关键权衡

- MLE 统计上最优，但积分开销大
- 替代方法避免积分但渐近方差更高
- 选择取决于应用场景的规模 vs 精度需求
- [[intensity-free-modeling|Intensity-free 参数化]] 是另一种绕过积分的策略（保留 MLE 的效率优势）

## 参考

- [[temporal-point-process|时间点过程]]
- [[conditional-intensity-function|条件强度函数]]
- [[neural-temporal-point-process|神经 TPP]]
- [[intensity-free-modeling|Intensity-free 建模]]
- [[advances-temporal-point-processes-2026|TPP 综述]]