myWiki/raw/papers/zhang-reconciling-sft-interaction-2026.md

---
title: "Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective"
created: 2026-06-03
updated: 2026-06-03
type: raw-paper
arxiv_id: "2605.17967"
authors:
  - "Junpeng Zhang"
  - "Lei Cheng"
  - "Guoxi Zhang"
  - "Hua Cai"
  - "Qing Xu"
  - "Quanshi Zhang"
affiliations:
  - "Shanghai Jiao Tong University"
  - "Beijing Institute for General Artificial Intelligence"
  - "UniDT"
published: "2026-05-18"
venue: "arXiv preprint"
primary_category: "cs.AI"
source: "https://arxiv.org/abs/2605.17967"
code: null
tags: [SFT, interactions, LLM, fine-tuning, interpretability, overfitting, early-stopping]
---

# Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

**Authors**: Junpeng Zhang, Lei Cheng, Guoxi Zhang, Hua Cai, Qing Xu, Quanshi Zhang (Shanghai Jiao Tong University, BIGAI, UniDT)

**arXiv**: 2605.17967 | **Published**: 2026-05-18 | **Category**: cs.AI

## Abstract

This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs. The authors find that the evolution of interactions during SFT can effectively explain the inconsistent effectiveness of SFT for LLMs. Specifically: (1) SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. (2) This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions. These findings are validated across multiple LLMs and datasets.

## Key Concepts

- **Interaction-based explanation**: Decomposing LLM inference patterns into AND-OR interactions between input tokens
- **Three interaction types**: Removed (eliminated during SFT), Preserved (retained throughout), Newly emerged (acquired during SFT)
- **Two-stage SFT dynamics**: Brief denoising stage (~1000 steps) → prolonged overfitting stage
- **Interaction quality metrics**: Generalizability (γ) and uncancelled-effect ratio (ρ)
- **Preserved interactions as inference backbone**: A small set of low-order, generalizable interactions supports the majority of token prediction

## Experimental Setup

- **Models**: Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, Llama-2-7B-Chat, Llama-3-8B-Instruct, Gemma-3-4B-it
- **Datasets**: GoEmotions, Unilaw-R1-Data, Databricks-Dolly-15k
- **Method**: LoRA fine-tuning, interaction extraction via AND-OR decomposition
- **GPUs**: 8× NVIDIA Tesla V100-PCIE-32GB

## Five Core Findings

1. LLMs learn only a few newly emerged interactions in the first (denoising) stage, but many in the second (overfitting) stage
2. Early-emerged interactions are more generalizable; later-emerged interactions behave like noise
3. Interaction removal occurs primarily within the very short first stage
4. Removed interactions are predominantly noise: high-order, non-generalizable, mutually canceling
5. Preserved interactions (small set, low-order) exhibit high generalizability and weak cancellation — they form the backbone of LLM inference

## Practical Implications

- SFT is effective but its useful regime is surprisingly short
- Interactions can serve as diagnostic signals for monitoring SFT progress
- Provides a principled criterion for early stopping in end-to-end SFT
- Challenges the belief that fine-tuning on massive datasets is necessarily beneficial