SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

3.6 KiB

Raw Blame History

title, created, updated, type, arxiv_id, authors, affiliations, published, venue, primary_category, source, code, tags

title

created

updated

type

arxiv_id

authors

affiliations

published

venue

primary_category

source

code

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

Authors: Junpeng Zhang, Lei Cheng, Guoxi Zhang, Hua Cai, Qing Xu, Quanshi Zhang (Shanghai Jiao Tong University, BIGAI, UniDT)

arXiv: 2605.17967 | Published: 2026-05-18 | Category: cs.AI

Abstract

This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs. The authors find that the evolution of interactions during SFT can effectively explain the inconsistent effectiveness of SFT for LLMs. Specifically: (1) SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. (2) This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions. These findings are validated across multiple LLMs and datasets.

Key Concepts

Interaction-based explanation: Decomposing LLM inference patterns into AND-OR interactions between input tokens
Three interaction types: Removed (eliminated during SFT), Preserved (retained throughout), Newly emerged (acquired during SFT)
Two-stage SFT dynamics: Brief denoising stage (~1000 steps) → prolonged overfitting stage
Interaction quality metrics: Generalizability (γ) and uncancelled-effect ratio (ρ)
Preserved interactions as inference backbone: A small set of low-order, generalizable interactions supports the majority of token prediction

Experimental Setup

Models: Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, Llama-2-7B-Chat, Llama-3-8B-Instruct, Gemma-3-4B-it
Datasets: GoEmotions, Unilaw-R1-Data, Databricks-Dolly-15k
Method: LoRA fine-tuning, interaction extraction via AND-OR decomposition
GPUs: 8× NVIDIA Tesla V100-PCIE-32GB

Five Core Findings

LLMs learn only a few newly emerged interactions in the first (denoising) stage, but many in the second (overfitting) stage
Early-emerged interactions are more generalizable; later-emerged interactions behave like noise
Interaction removal occurs primarily within the very short first stage
Removed interactions are predominantly noise: high-order, non-generalizable, mutually canceling
Preserved interactions (small set, low-order) exhibit high generalizability and weak cancellation — they form the backbone of LLM inference

Practical Implications

SFT is effective but its useful regime is surprisingly short
Interactions can serve as diagnostic signals for monitoring SFT progress
Provides a principled criterion for early stopping in end-to-end SFT
Challenges the belief that fine-tuning on massive datasets is necessarily beneficial

3.6 KiB Raw Blame History Unescape Escape