411 lines
12 KiB
Markdown
411 lines
12 KiB
Markdown
|
|
# Bookworm v6.0 战略演进设计文档
|
|||
|
|
|
|||
|
|
> 基于 v5.8 全维度审计 + 计算机哲学评审,设计三个核心演进方向
|
|||
|
|
|
|||
|
|
**创建日期**: 2026-03-04
|
|||
|
|
**状态**: 设计稿 (待实施)
|
|||
|
|
**影响范围**: 路由引擎 / 技能组织 / 质量闭环 / 视觉测试 / LLM 网关
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## S1: 概率性路由引擎
|
|||
|
|
|
|||
|
|
### 问题陈述
|
|||
|
|
|
|||
|
|
当前 38 条消歧规则是**手工编码的信道编码**,维护成本随技能数 O(N²) 增长。
|
|||
|
|
当技能数 >100 时,消歧规则将成为维护瓶颈 (需 ~80+ 条)。
|
|||
|
|
|
|||
|
|
### 设计方案: 三阶段渐进演化
|
|||
|
|
|
|||
|
|
#### 阶段 1: 强化隐式反馈闭环 (v5.9)
|
|||
|
|
|
|||
|
|
**现有基础**:
|
|||
|
|
- `implicit-feedback.js` 已实现 5 分钟窗口内的路由确认/纠正推断
|
|||
|
|
- `route-ab-test.js` 已实现 Thompson Sampling + 收敛检测
|
|||
|
|
- `route-weights.json` 已支持权重增量 [-0.5, +0.5]
|
|||
|
|
|
|||
|
|
**增强点**:
|
|||
|
|
```
|
|||
|
|
1. 在 route-auditor.js (Stop hook) 中自动触发 implicit-feedback
|
|||
|
|
当前: 手动运行 / 无集成
|
|||
|
|
增强: 每次会话结束自动推断本次所有路由的正确性
|
|||
|
|
|
|||
|
|
2. 将 implicit-feedback 结果自动回流到 route-weights.json
|
|||
|
|
当前: 只写 route-feedback.jsonl,未回流
|
|||
|
|
增强: 添加 applyImplicitWeights() 函数
|
|||
|
|
confirmed → weight += 0.05 (限幅)
|
|||
|
|
corrected → weight -= 0.1, 正确技能 += 0.1
|
|||
|
|
|
|||
|
|
3. 扩展 A/B 实验的触发范围
|
|||
|
|
当前: 仅 top-2 差距 < 15% 时触发
|
|||
|
|
增强: 加入消歧规则命中时的强制实验
|
|||
|
|
当消歧规则覆盖 BM25 结果时 → 50% 概率保留 BM25 原选择
|
|||
|
|
收集对比数据 → 验证消歧规则是否真的优于 BM25
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**实现路径**:
|
|||
|
|
```javascript
|
|||
|
|
// route-auditor.js 增强 (Stop hook)
|
|||
|
|
const implicit = require('../scripts/implicit-feedback.js');
|
|||
|
|
const feedback = implicit.inferFeedback({ maxDays: 1 });
|
|||
|
|
implicit.applyToWeights(feedback); // 新增函数
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**预期效果**: 消歧规则中 ~30% 可通过数据验证其必要性,不必要的规则可退役。
|
|||
|
|
|
|||
|
|
#### 阶段 2: 学习型消歧 (v6.0)
|
|||
|
|
|
|||
|
|
**核心设计**: 将硬编码规则转化为可学习的软规则
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
消歧规则现状:
|
|||
|
|
{ "pattern": "React+Bug", "action": "debugger" } // 硬编码
|
|||
|
|
|
|||
|
|
学习型消歧:
|
|||
|
|
{
|
|||
|
|
"pattern": "React+Bug",
|
|||
|
|
"baseAction": "debugger",
|
|||
|
|
"learnedWeight": 0.85, // 从反馈数据学习
|
|||
|
|
"alternatives": {
|
|||
|
|
"frontend-expert": 0.10,
|
|||
|
|
"reviewer-expert": 0.05
|
|||
|
|
},
|
|||
|
|
"confidence": 0.92, // 基于样本量
|
|||
|
|
"sampleCount": 47
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**实现**: 新增 `scripts/adaptive-disambiguator.js`
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// 核心算法: Bayesian Rule Learning
|
|||
|
|
class AdaptiveDisambiguator {
|
|||
|
|
constructor(rules, feedbackHistory) {
|
|||
|
|
this.rules = rules;
|
|||
|
|
this.priors = this.initPriors(rules);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 每条规则维护一个 Dirichlet 分布
|
|||
|
|
// α_i = 先验 + 命中后成功次数
|
|||
|
|
initPriors(rules) {
|
|||
|
|
return rules.map(r => ({
|
|||
|
|
pattern: r.pattern,
|
|||
|
|
// 先验: 硬编码 action 获得 α=10 的强先验
|
|||
|
|
alphas: { [r.action]: 10, _other: 1 },
|
|||
|
|
totalSamples: 0,
|
|||
|
|
}));
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 根据后验分布选择 action
|
|||
|
|
selectAction(matchedRuleIdx, candidates) {
|
|||
|
|
const prior = this.priors[matchedRuleIdx];
|
|||
|
|
|
|||
|
|
// 样本充足 (>30) 且收敛 → 确定性选择
|
|||
|
|
if (prior.totalSamples > 30) {
|
|||
|
|
const maxAlpha = Math.max(...Object.values(prior.alphas));
|
|||
|
|
const winner = Object.keys(prior.alphas)
|
|||
|
|
.find(k => prior.alphas[k] === maxAlpha);
|
|||
|
|
if (maxAlpha / prior.totalSamples > 0.8) return winner;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 样本不足或未收敛 → Thompson Sampling
|
|||
|
|
return this.thompsonSample(prior.alphas, candidates);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 记录反馈
|
|||
|
|
recordFeedback(ruleIdx, selectedSkill, wasCorrect) {
|
|||
|
|
const prior = this.priors[ruleIdx];
|
|||
|
|
if (wasCorrect) {
|
|||
|
|
prior.alphas[selectedSkill] = (prior.alphas[selectedSkill] || 0) + 1;
|
|||
|
|
} else {
|
|||
|
|
prior.alphas[selectedSkill] = Math.max(0, (prior.alphas[selectedSkill] || 1) - 0.5);
|
|||
|
|
}
|
|||
|
|
prior.totalSamples++;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**安全机制**:
|
|||
|
|
- 硬编码规则作为强先验 (α=10),需要大量反面证据才能推翻
|
|||
|
|
- 前 30 个样本内不改变行为,仅收集数据
|
|||
|
|
- 任何规则的学习权重偏离先验 >50% 时,写入 evolution-log 并标记审查
|
|||
|
|
|
|||
|
|
#### 阶段 3: 向量嵌入补充 (v6.x)
|
|||
|
|
|
|||
|
|
**方案**: 用 MCP 调用嵌入模型,为每次查询生成向量,与预计算的技能向量做余弦相似度
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
用户输入 "优化首屏加载速度"
|
|||
|
|
↓
|
|||
|
|
BM25 路由: frontend-expert (0.72), performance-expert (0.68)
|
|||
|
|
↓ (差距 < 15%, 触发向量补充)
|
|||
|
|
Embedding: text-embedding-3-small
|
|||
|
|
query_vec = embed("优化首屏加载速度")
|
|||
|
|
skill_vecs = precomputed (每个技能 description 的 embedding)
|
|||
|
|
cos_sim: performance-expert (0.89), frontend-expert (0.76)
|
|||
|
|
↓
|
|||
|
|
融合: BM25(0.5) + Embedding(0.3) + Context(0.2)
|
|||
|
|
performance-expert: 0.5*0.68 + 0.3*0.89 + 0.2*ctx = 0.607 + ctx
|
|||
|
|
frontend-expert: 0.5*0.72 + 0.3*0.76 + 0.2*ctx = 0.588 + ctx
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**前置条件**:
|
|||
|
|
- 集成 LiteLLM/OpenRouter MCP 或 本地 ollama embedding
|
|||
|
|
- 预计算 68 个技能向量 → `skills-embeddings.json`
|
|||
|
|
- 仅在 BM25 top-2 差距 < 15% 时触发 (节省 API 调用)
|
|||
|
|
|
|||
|
|
**延迟预算**: +200-500ms (仅低置信度查询触发)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## S2: 层级化技能组织
|
|||
|
|
|
|||
|
|
### 问题陈述
|
|||
|
|
|
|||
|
|
68 技能平铺在同一层级,BM25 评分 O(N*K*T) 随 N 线性增长。
|
|||
|
|
消歧冲突以 O(N²) 增长。当前 61 个关键词冲突已经很高。
|
|||
|
|
|
|||
|
|
### 设计方案: 两级路由
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Layer 1: 域路由 (Domain Router)
|
|||
|
|
├── dev (22 技能: frontend, backend, mobile, ...)
|
|||
|
|
├── architecture (11 技能: architect, database, cloud-native, ...)
|
|||
|
|
├── devops (6 技能: devops, devsecops, git, sre, ...)
|
|||
|
|
├── quality (3 技能: tester, reviewer, project-audit)
|
|||
|
|
├── product (4 技能: product-manager, designer, ux, coordinator)
|
|||
|
|
├── business (9 技能: business-plan, finance, sales, ...)
|
|||
|
|
├── content (5 技能: tech-writer, copywriter, email, ...)
|
|||
|
|
├── ai-data (3 技能: ai-ml, data-analyst, data-engineer)
|
|||
|
|
├── security (1 技能: security-expert)
|
|||
|
|
└── meta (4 技能: genesis-engine, prompt-optimizer, ...)
|
|||
|
|
|
|||
|
|
Layer 2: 技能路由 (Skill Router)
|
|||
|
|
在选定域内做精确 BM25 匹配
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 数据结构
|
|||
|
|
|
|||
|
|
在 `skills-index.json` 中新增 `domain` 字段:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"domains": {
|
|||
|
|
"dev": {
|
|||
|
|
"keywords": ["代码", "开发", "实现", "写", "组件", "API", "接口", ...],
|
|||
|
|
"skills": ["frontend-expert", "backend-builder", "mobile-expert", ...]
|
|||
|
|
},
|
|||
|
|
"architecture": {
|
|||
|
|
"keywords": ["架构", "设计", "选型", "DDD", "微服务", ...],
|
|||
|
|
"skills": ["architect-expert", "database-tuning-expert", ...]
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 路由流程
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
用户输入 "React 组件性能优化"
|
|||
|
|
↓
|
|||
|
|
Layer 1: 域分类器
|
|||
|
|
dev: 0.7 (React, 组件)
|
|||
|
|
architecture: 0.3 (性能, 优化)
|
|||
|
|
→ 选择 dev (top-1),保留 architecture 作为辅助域
|
|||
|
|
|
|||
|
|
Layer 2: 域内 BM25 (仅 22 技能)
|
|||
|
|
frontend-expert: 0.85
|
|||
|
|
performance-expert: 0.72 (跨域辅助)
|
|||
|
|
→ 最终: frontend-expert
|
|||
|
|
|
|||
|
|
消歧规则: 仅需覆盖**域内冲突** (22 技能间 vs 68 技能间)
|
|||
|
|
域内冲突数: ~15 条 (vs 全局 38 条)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 复杂度分析
|
|||
|
|
|
|||
|
|
| 指标 | 当前 (平铺) | 层级化 | 改善 |
|
|||
|
|
|------|-----------|--------|------|
|
|||
|
|
| BM25 评分次数 | 68 | ~10 (域) + ~22 (域内) = 32 | -53% |
|
|||
|
|
| 消歧规则维护 | O(N²) ≈ 38 | O(K) + O((N/K)²) ≈ 15 | -60% |
|
|||
|
|
| N=100 时规则数 | ~80 | ~25 | -69% |
|
|||
|
|
| N=200 时规则数 | ~160 | ~45 | -72% |
|
|||
|
|
|
|||
|
|
### 兼容性设计
|
|||
|
|
|
|||
|
|
- 层级化路由与现有平铺路由**并行运行** (A/B 测试)
|
|||
|
|
- route-ab-test 框架可直接复用
|
|||
|
|
- 域分类器复用现有 BM25 引擎,仅更换索引维度
|
|||
|
|
- 消歧规则按域分组,无需全局重写
|
|||
|
|
|
|||
|
|
### 实现路径
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
v5.9: 在 skills-index.json 中为每个技能添加 domain 字段 (手工标注)
|
|||
|
|
v6.0: 实现 domain-router.js (Layer 1 路由器)
|
|||
|
|
v6.1: 在 route-interceptor 中 A/B 测试 flat vs hierarchical
|
|||
|
|
v6.2: 收敛后切换默认路由策略
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## P3-13: 视觉回归测试
|
|||
|
|
|
|||
|
|
### 设计方案
|
|||
|
|
|
|||
|
|
在 `zero-defect-guardian` 技能中集成 Playwright 截图对比:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
修改前截图 → 修改代码 → 修改后截图 → 像素对比 → 差异报告
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**实现**:
|
|||
|
|
```javascript
|
|||
|
|
// scripts/visual-regression.js
|
|||
|
|
const { chromium } = require('playwright');
|
|||
|
|
|
|||
|
|
async function captureBaseline(url, selector, outputPath) {
|
|||
|
|
const browser = await chromium.launch();
|
|||
|
|
const page = await browser.newPage({ viewport: { width: 1280, height: 720 } });
|
|||
|
|
await page.goto(url);
|
|||
|
|
const element = selector ? await page.$(selector) : page;
|
|||
|
|
await element.screenshot({ path: outputPath });
|
|||
|
|
await browser.close();
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
async function compareSnapshots(baselinePath, currentPath, diffPath) {
|
|||
|
|
// pixelmatch 库 (npm i pixelmatch pngjs)
|
|||
|
|
const pixelmatch = require('pixelmatch');
|
|||
|
|
const { PNG } = require('pngjs');
|
|||
|
|
// ... 像素对比逻辑
|
|||
|
|
return { mismatchPercentage, diffPixels };
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**集成点**:
|
|||
|
|
- `zero-defect-guardian` SKILL.md 增加 visual regression 章节
|
|||
|
|
- tester-expert 的 E2E 测试模板增加截图对比步骤
|
|||
|
|
- quality-gate D4 性能维度增加视觉回归检查
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## P3-14: LLM Gateway MCP
|
|||
|
|
|
|||
|
|
### 设计方案
|
|||
|
|
|
|||
|
|
创建统一 LLM 调用网关:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
// mcp-templates.md 新增
|
|||
|
|
"llm-gateway": {
|
|||
|
|
"args": ["/c", "npx", "-y", "mcp-server-litellm", "--config", "./litellm.yaml"],
|
|||
|
|
"command": "cmd", "type": "stdio"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**litellm.yaml 模板**:
|
|||
|
|
```yaml
|
|||
|
|
model_list:
|
|||
|
|
- model_name: gpt-4o
|
|||
|
|
litellm_params:
|
|||
|
|
model: openai/gpt-4o
|
|||
|
|
api_key: ${OPENAI_API_KEY}
|
|||
|
|
- model_name: qwen-plus
|
|||
|
|
litellm_params:
|
|||
|
|
model: dashscope/qwen-plus
|
|||
|
|
api_key: ${DASHSCOPE_API_KEY}
|
|||
|
|
- model_name: embedding
|
|||
|
|
litellm_params:
|
|||
|
|
model: openai/text-embedding-3-small
|
|||
|
|
api_key: ${OPENAI_API_KEY}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**用途**: ai-ml-expert 中的 RAG/Agent 开发可通过 MCP 直接调用多模型。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## P3-15: 多模态闭环
|
|||
|
|
|
|||
|
|
### 设计方案
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Figma 设计稿 → get_design_context (Figma MCP)
|
|||
|
|
↓
|
|||
|
|
AI 生成代码 → frontend-expert / canvas-ui-designer
|
|||
|
|
↓
|
|||
|
|
启动本地 dev server → Bash: npm run dev
|
|||
|
|
↓
|
|||
|
|
Playwright 截图 → mcp__playwright__browser_take_screenshot
|
|||
|
|
↓
|
|||
|
|
与 Figma 原稿对比 → visual-regression.js
|
|||
|
|
↓
|
|||
|
|
差异 > 5% → 自动修复循环 (最多 3 轮)
|
|||
|
|
差异 ≤ 5% → PASS
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**关键约束**: 需要 Figma MCP + Playwright MCP 同时可用。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## P3-16: Skill 热加载
|
|||
|
|
|
|||
|
|
### 设计方案
|
|||
|
|
|
|||
|
|
当前添加新技能需要:
|
|||
|
|
1. 创建 `skills/new-skill/SKILL.md`
|
|||
|
|
2. 运行 `node scripts/generate-skill-index.js` 重建索引
|
|||
|
|
3. 重启 Claude Code 会话
|
|||
|
|
|
|||
|
|
**热加载方案**:
|
|||
|
|
- 在 `post-edit-dispatcher.js` 检测 `skills/*/SKILL.md` 的写入
|
|||
|
|
- 自动触发 `generate-skill-index.js` 重建索引
|
|||
|
|
- 路由引擎在下次调用时自动加载新索引 (已有的 loadIndex() 每次读文件)
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// post-edit-dispatcher.js 增强
|
|||
|
|
function checkSkillHotReload(filePath) {
|
|||
|
|
if (/skills\/[^/]+\/SKILL\.md$/.test(filePath)) {
|
|||
|
|
// 异步重建索引 (不阻塞当前 hook)
|
|||
|
|
const { execFile } = require('child_process');
|
|||
|
|
execFile(process.execPath,
|
|||
|
|
[path.join(ROOT, 'scripts/generate-skill-index.js')],
|
|||
|
|
{ timeout: 10000 },
|
|||
|
|
() => {} // 静默完成
|
|||
|
|
);
|
|||
|
|
return '[skill-hot-reload] 检测到技能文件变更,索引已自动重建';
|
|||
|
|
}
|
|||
|
|
return null;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**无需重启会话**: 因为 `loadIndex()` 每次路由都从磁盘读取 skills-index.json。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 实施路线图
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
v5.9 (2周内):
|
|||
|
|
├── S1-Phase1: implicit-feedback 自动化 + 消歧规则 A/B 验证
|
|||
|
|
├── S2-Step1: skills-index.json 添加 domain 字段
|
|||
|
|
└── P3-16: Skill 热加载 (dispatcher 增强)
|
|||
|
|
|
|||
|
|
v6.0 (1月内):
|
|||
|
|
├── S1-Phase2: Bayesian 学习型消歧器
|
|||
|
|
├── S2-Step2: domain-router.js 实现
|
|||
|
|
├── P3-13: visual-regression.js 实现
|
|||
|
|
└── P3-14: LLM Gateway MCP 模板
|
|||
|
|
|
|||
|
|
v6.1 (2月内):
|
|||
|
|
├── S2-Step3: flat vs hierarchical A/B 测试
|
|||
|
|
├── P3-15: 多模态闭环 (Figma→Code→Screenshot→Diff)
|
|||
|
|
└── S1-Phase3: 向量嵌入补充 (依赖 LLM Gateway)
|
|||
|
|
|
|||
|
|
v6.2 (季度):
|
|||
|
|
└── 收敛与清理: 退役无效消歧规则, 切换默认路由策略
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*设计完成: 2026-03-04 | 审阅状态: 待 self-auditor 验证*
|