20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/papers/personalization-trap-2025.md
+++ b/papers/personalization-trap-2025.md
@@ -0,0 +1,76 @@
+---
+title: "The Personalization Trap (Fang et al., Amazon, 2025)"
+created: 2026-06-24
+updated: 2026-06-24
+type: paper
+tags: ["personalization", "memory", "emotional-intelligence", "bias", "social-capital", "dpo"]
+sources:
+  - "https://arxiv.org/abs/2510.09905"
+code: "https://github.com/personalization-trap"
+---
+
+# The Personalization Trap
+
+> Fang et al., Amazon | arXiv:2510.09905v2 | cs.AI / cs.CL | Oct 2025 (updated Jun 2026)
+
+## 问题
+
+个性化 AI 系统融入长期 [[user-memory-bias|用户记忆]]，但记忆如何影响情感推理？相同的场景 + 不同的用户画像 → 系统性地分歧的情感解读。
+
+理论框架：Bourdieu 的 [[social-capital-framework|社会资本理论]] — 经济/文化/社会维度上的社会位置塑造他人对我们行为和情感的解读方式。AI 引入用户背景信息时，可能复制这些社会偏见。
+
+## 方法
+
+### 用户画像
+- **显式画像**：PersonaHub 30 个基础画像 × 2 版本（advantaged/disadvantaged），基于社会资本四维度（人口统计/家庭背景/社会关系/个人资产）
+- **[[intersectional-persona-evaluation|交叉性画像]]**：PRISM 数据集 → 81 个画像（性别×年龄×宗教×种族交叉）
+
+### 评估工具
+- **[[situational-test-emotional-understanding|STEU]]**：42 个情感理解场景，标准答案
+- **改良 STEM**：44 个第一人称情绪管理建议场景
+- 人类标注：93% 画像真实度（vs PersonaHub），经 9 位标注员移除画像敏感题目
+
+### 混合效应模型
+固定效应（人口统计变量）+ 随机效应（题目级变异），以白/基督徒/男/34-65 作为基线。
+
+## 关键结果
+
+### 发现 1：[[personalization-trap|用户记忆系统性影响情感理解]]
+
+| 模型 | 无记忆 | 优势画像 | 劣势画像 |
+|------|--------|---------|---------|
+| Claude 3.7 Sonnet | 90.91 | 80.10*† | 77.37* |
+| DeepSeek-R1 | 84.85 | 81.62*† | 76.57* |
+| Llama 3.2 90B | 84.85 | 64.91*† | 62.24* |
+
+*†: 优势-劣势差距显著 (p<0.05)
+
+### 发现 2：[[emotional-reasoning-bias|人口统计学偏见]]
+
+- **宗教**：穆斯林画像系统性地得分偏低（Mistral: β=-0.061, p<0.001）
+- **性别**：非二元性别效果因模型而异（Claude 3.7 no-think: β=+0.018; Qwen3-4B think: β=-0.030）
+- **年龄**：65+ 画像在部分模型中得分显著降低
+- **种族**：效应较弱但存在
+
+### 发现 3：偏见在情绪建议中持续
+
+Claude 3.7 对女性/非二元性别的建议质量显著低于男性（β=-0.102, p<0.001）。
+
+### [[dpo-bias-mitigation|DPO 偏见缓解]]
+
+| 模型 | STEU Before | STEU After | Bias ∆ Before | Bias ∆ After |
+|------|-----------|-----------|-------------|-------------|
+| Gemma-2-2B | 59.50% | 63.70% | 5.50% | -2.30% |
+| Qwen-3-1.7B | 60.90% | 60.30% | 1.70% | 0.40% |
+
+仅 500 训练样本即有效减少偏见。MMLU 同时提升，但指令遵循下降——存在 bias resistance vs instruction adherence 的 trade-off。
+
+## 核心洞察
+
+1. **个性化陷阱** — 为增强共情而引入的个性化，可能放大社会不平等。优劣势画像在相同场景下得到系统性不同的情感解读
+2. **[[persona-invariant-reasoning|画像无关推理]]的理想** — 在用户无关的任务上，模型应保持推理一致，但用户记忆不恰当地渗入了通用推理
+3. **Thinking 模型的保护效应** — 推理能力似乎提供了部分偏见抵抗
+
+## 来源
+
+[原始存档](raw/papers/personalization-trap-2025.md) | [arXiv](https://arxiv.org/abs/2510.09905) | [GitHub](https://github.com/personalization-trap)