myWiki/papers/personalization-trap-2025.md

---
title: "The Personalization Trap (Fang et al., Amazon, 2025)"
created: 2026-06-24
updated: 2026-06-24
type: paper
tags: ["personalization", "memory", "emotional-intelligence", "bias", "social-capital", "dpo"]
sources:
  - "https://arxiv.org/abs/2510.09905"
code: "https://github.com/personalization-trap"
---

# The Personalization Trap

> Fang et al., Amazon | arXiv:2510.09905v2 | cs.AI / cs.CL | Oct 2025 (updated Jun 2026)

## 问题

个性化 AI 系统融入长期 [[user-memory-bias|用户记忆]]，但记忆如何影响情感推理？相同的场景 + 不同的用户画像 → 系统性地分歧的情感解读。

理论框架：Bourdieu 的 [[social-capital-framework|社会资本理论]] — 经济/文化/社会维度上的社会位置塑造他人对我们行为和情感的解读方式。AI 引入用户背景信息时，可能复制这些社会偏见。

## 方法

### 用户画像
- **显式画像**：PersonaHub 30 个基础画像 × 2 版本（advantaged/disadvantaged），基于社会资本四维度（人口统计/家庭背景/社会关系/个人资产）
- **[[intersectional-persona-evaluation|交叉性画像]]**：PRISM 数据集 → 81 个画像（性别×年龄×宗教×种族交叉）

### 评估工具
- **[[situational-test-emotional-understanding|STEU]]**：42 个情感理解场景，标准答案
- **改良 STEM**：44 个第一人称情绪管理建议场景
- 人类标注：93% 画像真实度（vs PersonaHub），经 9 位标注员移除画像敏感题目

### 混合效应模型
固定效应（人口统计变量）+ 随机效应（题目级变异），以白/基督徒/男/34-65 作为基线。

## 关键结果

### 发现 1：[[personalization-trap|用户记忆系统性影响情感理解]]

| 模型 | 无记忆 | 优势画像 | 劣势画像 |
|------|--------|---------|---------|
| Claude 3.7 Sonnet | 90.91 | 80.10*† | 77.37* |
| DeepSeek-R1 | 84.85 | 81.62*† | 76.57* |
| Llama 3.2 90B | 84.85 | 64.91*† | 62.24* |

*†: 优势-劣势差距显著 (p<0.05)

### 发现 2：[[emotional-reasoning-bias|人口统计学偏见]]

- **宗教**：穆斯林画像系统性地得分偏低（Mistral: β=-0.061, p<0.001）
- **性别**：非二元性别效果因模型而异（Claude 3.7 no-think: β=+0.018; Qwen3-4B think: β=-0.030）
- **年龄**：65+ 画像在部分模型中得分显著降低
- **种族**：效应较弱但存在

### 发现 3：偏见在情绪建议中持续

Claude 3.7 对女性/非二元性别的建议质量显著低于男性（β=-0.102, p<0.001）。

### [[dpo-bias-mitigation|DPO 偏见缓解]]

| 模型 | STEU Before | STEU After | Bias ∆ Before | Bias ∆ After |
|------|-----------|-----------|-------------|-------------|
| Gemma-2-2B | 59.50% | 63.70% | 5.50% | -2.30% |
| Qwen-3-1.7B | 60.90% | 60.30% | 1.70% | 0.40% |

仅 500 训练样本即有效减少偏见。MMLU 同时提升，但指令遵循下降——存在 bias resistance vs instruction adherence 的 trade-off。

## 核心洞察

1. **个性化陷阱** — 为增强共情而引入的个性化，可能放大社会不平等。优劣势画像在相同场景下得到系统性不同的情感解读
2. **[[persona-invariant-reasoning|画像无关推理]]的理想** — 在用户无关的任务上，模型应保持推理一致，但用户记忆不恰当地渗入了通用推理
3. **Thinking 模型的保护效应** — 推理能力似乎提供了部分偏见抵抗

## 来源

[原始存档](raw/papers/personalization-trap-2025.md) | [arXiv](https://arxiv.org/abs/2510.09905) | [GitHub](https://github.com/personalization-trap)