20260601
This commit is contained in:
45
concepts/interleaved-gui-tool-trajectory-scaling.md
Normal file
45
concepts/interleaved-gui-tool-trajectory-scaling.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "Interleaved GUI-Tool Trajectory Scaling Pipeline"
|
||||
created: 2026-05-31
|
||||
type: concept
|
||||
tags: [data-synthesis, gui-tool, trajectory-scaling, tool-synthesis]
|
||||
---
|
||||
|
||||
# Interleaved GUI-Tool Trajectory Scaling Pipeline
|
||||
|
||||
**Interleaved GUI-Tool Trajectory Scaling Pipeline** 是 [[toolcua-optimal-gui-tool-orchestration|ToolCUA]] 论文提出的数据扩展流水线,用于从**已有的纯 GUI 轨迹语料库**合成**大规模 GUI-Tool 交错轨迹数据**。
|
||||
|
||||
## 核心思想
|
||||
|
||||
> Repurpose(重利用)现有纯 GUI 轨迹 + MLLM 合成工具库 → 无需人工标注即可大规模生成 GUI-Tool 交错数据
|
||||
|
||||
## 四步流程
|
||||
|
||||
### 1. Trajectory Filtering & Balancing
|
||||
- 从开放数据集按执行质量、任务长度、应用覆盖筛选
|
||||
- 跨领域平衡分布,提供稳定的合成源
|
||||
|
||||
### 2. Trajectory-Aware Synthetic Tool Library Construction
|
||||
- **MLLM**(如 Kimi-K2.5、Claude-4.5-Sonnet)分析每条 GUI 轨迹
|
||||
- 从观察到的 GUI 过程**抽象出可调用的高层操作**
|
||||
- 每个工具包含:函数签名、自然语言描述、参数语义
|
||||
- 支持多粒度:从单步包装(`chrome_open_settings`)到多步组合(`chrome_open_language_settings`)
|
||||
|
||||
### 3. Tool Trajectory Generation with Next-State Grounding
|
||||
- 用合成工具库生成等效的"纯工具"轨迹
|
||||
- **[[next-state-grounding|Next-State Grounding]]**:将工具步骤锚定到原始 GUI 轨迹的对应截图上,验证执行效果一致性
|
||||
- **Bottom-up Merging**:将相邻细粒度步骤逐步合并为更高层的复合工具调用
|
||||
|
||||
### 4. Interleaved GUI-Tool Trajectory Generation
|
||||
- 随机选取部分工具调用,替换为原始 GUI 操作序列
|
||||
- 同时从工具库中移除被替换的工具 → 构建"部分工具可用"的上下文
|
||||
- 生成多样化的交错变体($\mathcal{D}_{\text{all}}$)
|
||||
- 每次替换自然暴露两类切换边界($\mathcal{D}_{\text{critical}}$):GUI→Tool 和 Tool→GUI
|
||||
|
||||
## 三维扩展
|
||||
|
||||
| 维度 | 扩展内容 |
|
||||
|------|---------|
|
||||
| **Tool Functionality** | 跨应用域的工具功能覆盖 |
|
||||
| **Tool Granularity** | 从原子工具到复合技能 |
|
||||
| **Switching Context** | 覆盖工具更有益/更无益的场景 |
|
||||
Reference in New Issue
Block a user