557 lines
15 KiB
Markdown
557 lines
15 KiB
Markdown
|
|
# 系统化调试方法论与工具链 (Debugging Playbook)
|
|||
|
|
|
|||
|
|
> 本文档为 debugger-expert 技能的核心参考,涵盖调试方法论、心智模型和主流语言工具链。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 一、科学调试方法论
|
|||
|
|
|
|||
|
|
### 1.1 假设-验证循环 (Scientific Debugging)
|
|||
|
|
|
|||
|
|
调试的本质是科学实验:观察现象 -> 形成假设 -> 设计实验 -> 验证结果。
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────┐
|
|||
|
|
│ 观察现象 │ 收集错误信息、日志、堆栈
|
|||
|
|
└──────┬──────┘
|
|||
|
|
▼
|
|||
|
|
┌─────────────┐
|
|||
|
|
│ 形成假设 │ 基于经验和证据提出可能原因
|
|||
|
|
└──────┬──────┘
|
|||
|
|
▼
|
|||
|
|
┌─────────────┐
|
|||
|
|
│ 设计实验 │ 构造能验证或推翻假设的测试
|
|||
|
|
└──────┬──────┘
|
|||
|
|
▼
|
|||
|
|
┌─────────────┐
|
|||
|
|
│ 验证结果 │ 假设成立 → 修复;假设失败 → 新假设
|
|||
|
|
└──────┬──────┘
|
|||
|
|
▼
|
|||
|
|
┌─────────────┐
|
|||
|
|
│ 根因分析 │ 找到根本原因,而非表面症状
|
|||
|
|
└─────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**关键原则**:
|
|||
|
|
- 每次只改一个变量,否则无法确定哪个改动有效
|
|||
|
|
- 记录每次尝试和结果,避免重复无用操作
|
|||
|
|
- 不要猜测,用证据说话
|
|||
|
|
|
|||
|
|
### 1.2 二分法定位 (Binary Search Debugging)
|
|||
|
|
|
|||
|
|
适用于"某处出错但不知道在哪"的场景:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Git bisect 自动定位引入 Bug 的 commit
|
|||
|
|
git bisect start
|
|||
|
|
git bisect bad # 当前版本有问题
|
|||
|
|
git bisect good v1.2.0 # 这个版本没问题
|
|||
|
|
# Git 会自动 checkout 中间的 commit,你只需测试并标记
|
|||
|
|
git bisect good # 或 git bisect bad
|
|||
|
|
# 最终定位到引入 Bug 的精确 commit
|
|||
|
|
git bisect reset # 结束 bisect
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**代码中的二分法**:
|
|||
|
|
- 在代码中间插入日志,确认上半段还是下半段出问题
|
|||
|
|
- 注释掉一半代码,逐步缩小范围
|
|||
|
|
- 对于数据问题,检查中间步骤的数据是否正确
|
|||
|
|
|
|||
|
|
### 1.3 最小复现用例 (Minimal Reproduction)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
完整应用 → 剥离无关模块 → 剥离无关依赖 → 最小可复现代码
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**构建步骤**:
|
|||
|
|
1. 在全新环境中尝试复现
|
|||
|
|
2. 逐步去除无关代码,直到去掉任何一行都无法复现
|
|||
|
|
3. 记录精确的复现步骤(环境、输入、操作序列)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 二、调试心智模型
|
|||
|
|
|
|||
|
|
### 2.1 自上而下 (Top-Down)
|
|||
|
|
|
|||
|
|
从用户可见的症状出发,沿调用链向下追踪:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
用户看到的错误
|
|||
|
|
→ 前端组件
|
|||
|
|
→ API 调用
|
|||
|
|
→ 后端 Handler
|
|||
|
|
→ Service 层
|
|||
|
|
→ 数据库查询
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**适用场景**:错误信息明确,能清晰追踪调用链。
|
|||
|
|
|
|||
|
|
### 2.2 自下而上 (Bottom-Up)
|
|||
|
|
|
|||
|
|
从底层日志或异常出发,向上追溯调用者:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
数据库报错 connection refused
|
|||
|
|
→ ORM 连接池状态
|
|||
|
|
→ 服务配置
|
|||
|
|
→ 环境变量
|
|||
|
|
→ Docker Compose 配置
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**适用场景**:底层有明确的报错日志,需要理解为什么被触发。
|
|||
|
|
|
|||
|
|
### 2.3 差异对比法 (Differential Debugging)
|
|||
|
|
|
|||
|
|
比较"正常"和"异常"两种情况的差异:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 对比两个环境的配置差异
|
|||
|
|
diff <(ssh prod "env | sort") <(ssh staging "env | sort")
|
|||
|
|
|
|||
|
|
# 对比两个请求的差异
|
|||
|
|
diff response_good.json response_bad.json
|
|||
|
|
|
|||
|
|
# 对比两个 commit 的代码差异
|
|||
|
|
git diff abc123 def456 -- src/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.4 回退法 (Rollback Debugging)
|
|||
|
|
|
|||
|
|
当问题突然出现时,回退到已知正常状态:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 回退到上一个正常的 commit
|
|||
|
|
git stash # 保存当前改动
|
|||
|
|
git checkout <good-commit>
|
|||
|
|
# 测试是否正常
|
|||
|
|
# 然后逐个引入改动,找到引入问题的变更
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 三、JavaScript/TypeScript 调试工具链
|
|||
|
|
|
|||
|
|
### 3.1 Chrome DevTools
|
|||
|
|
|
|||
|
|
#### Sources 面板
|
|||
|
|
```javascript
|
|||
|
|
// 代码中插入断点
|
|||
|
|
debugger; // 浏览器会在此暂停
|
|||
|
|
|
|||
|
|
// 条件断点(在 DevTools 中右键行号设置)
|
|||
|
|
// 条件示例:item.id === 42
|
|||
|
|
|
|||
|
|
// Logpoints(不暂停,只输出日志)
|
|||
|
|
// 右键行号 → Add logpoint → 输入: "value is", myVar
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Network 面板
|
|||
|
|
```
|
|||
|
|
关键检查项:
|
|||
|
|
- Status: 检查 HTTP 状态码
|
|||
|
|
- Timing: 查看各阶段耗时(DNS, TCP, TTFB, Content Download)
|
|||
|
|
- Headers: 验证请求头/响应头(Content-Type, Authorization, CORS headers)
|
|||
|
|
- Preview/Response: 检查实际响应数据
|
|||
|
|
- 右键 → Copy as cURL: 在终端中重现请求
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Performance 面板
|
|||
|
|
```
|
|||
|
|
录制步骤:
|
|||
|
|
1. 点击录制按钮
|
|||
|
|
2. 执行要分析的操作
|
|||
|
|
3. 停止录制
|
|||
|
|
4. 检查 Main 线程火焰图
|
|||
|
|
5. 关注长任务(超过 50ms 的红色标记)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Memory 面板
|
|||
|
|
```
|
|||
|
|
内存泄漏排查:
|
|||
|
|
1. 拍摄 Heap Snapshot(快照1)
|
|||
|
|
2. 执行可疑操作
|
|||
|
|
3. 拍摄 Heap Snapshot(快照2)
|
|||
|
|
4. 选择 "Comparison" 视图,对比两个快照
|
|||
|
|
5. 按 "Size Delta" 排序,查看增长最多的对象
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.2 VS Code 调试配置
|
|||
|
|
|
|||
|
|
```jsonc
|
|||
|
|
// .vscode/launch.json
|
|||
|
|
{
|
|||
|
|
"version": "0.2.0",
|
|||
|
|
"configurations": [
|
|||
|
|
// Node.js 应用调试
|
|||
|
|
{
|
|||
|
|
"type": "node",
|
|||
|
|
"request": "launch",
|
|||
|
|
"name": "Debug Node App",
|
|||
|
|
"program": "${workspaceFolder}/src/index.ts",
|
|||
|
|
"preLaunchTask": "tsc: build",
|
|||
|
|
"outFiles": ["${workspaceFolder}/dist/**/*.js"],
|
|||
|
|
"console": "integratedTerminal"
|
|||
|
|
},
|
|||
|
|
// Next.js 全栈调试
|
|||
|
|
{
|
|||
|
|
"type": "node",
|
|||
|
|
"request": "launch",
|
|||
|
|
"name": "Debug Next.js",
|
|||
|
|
"runtimeExecutable": "pnpm",
|
|||
|
|
"runtimeArgs": ["dev"],
|
|||
|
|
"port": 9230,
|
|||
|
|
"console": "integratedTerminal",
|
|||
|
|
"serverReadyAction": {
|
|||
|
|
"pattern": "- Local:.+(https?://.+)",
|
|||
|
|
"uriFormat": "%s",
|
|||
|
|
"action": "debugWithChrome"
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
// 附加到已运行的进程
|
|||
|
|
{
|
|||
|
|
"type": "node",
|
|||
|
|
"request": "attach",
|
|||
|
|
"name": "Attach to Process",
|
|||
|
|
"port": 9229,
|
|||
|
|
"restart": true
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 条件断点与 Logpoints
|
|||
|
|
```
|
|||
|
|
VS Code 中:
|
|||
|
|
- 条件断点:右键行号 → Add Conditional Breakpoint → 输入条件表达式
|
|||
|
|
- Hit Count 断点:右键 → Add Conditional Breakpoint → Hit Count → 输入次数
|
|||
|
|
- Logpoints:右键行号 → Add Logpoint → 使用 {} 插入表达式
|
|||
|
|
示例: "User {user.name} logged in, role: {user.role}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.3 Node.js 调试
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 启动调试模式
|
|||
|
|
node --inspect src/server.js # 默认 9229 端口
|
|||
|
|
node --inspect-brk src/server.js # 在第一行暂停
|
|||
|
|
|
|||
|
|
# 使用 ndb(更好的 Node 调试器)
|
|||
|
|
npx ndb node src/server.js
|
|||
|
|
|
|||
|
|
# 内存快照对比
|
|||
|
|
node --expose-gc -e "
|
|||
|
|
global.gc();
|
|||
|
|
const before = process.memoryUsage();
|
|||
|
|
// ... 执行操作 ...
|
|||
|
|
global.gc();
|
|||
|
|
const after = process.memoryUsage();
|
|||
|
|
console.log('Heap used delta:', after.heapUsed - before.heapUsed);
|
|||
|
|
"
|
|||
|
|
|
|||
|
|
# 生成 Heap Snapshot
|
|||
|
|
node -e "
|
|||
|
|
const v8 = require('v8');
|
|||
|
|
const fs = require('fs');
|
|||
|
|
const snapshot = v8.writeHeapSnapshot();
|
|||
|
|
console.log('Snapshot written to:', snapshot);
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 四、Python 调试工具链
|
|||
|
|
|
|||
|
|
### 4.1 pdb/ipdb 常用命令速查
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 在代码中设置断点
|
|||
|
|
import pdb; pdb.set_trace() # 标准 pdb
|
|||
|
|
import ipdb; ipdb.set_trace() # 增强版(支持语法高亮、Tab 补全)
|
|||
|
|
breakpoint() # Python 3.7+ 推荐写法
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
常用命令:
|
|||
|
|
n (next) - 执行下一行(不进入函数)
|
|||
|
|
s (step) - 单步执行(进入函数)
|
|||
|
|
c (continue) - 继续执行到下一个断点
|
|||
|
|
r (return) - 执行到当前函数返回
|
|||
|
|
l (list) - 显示当前代码上下文
|
|||
|
|
ll (longlist) - 显示整个函数代码
|
|||
|
|
p expr - 打印表达式值
|
|||
|
|
pp expr - 美观打印表达式值
|
|||
|
|
w (where) - 显示调用堆栈
|
|||
|
|
u (up) - 向上移动堆栈帧
|
|||
|
|
d (down) - 向下移动堆栈帧
|
|||
|
|
b 42 - 在第 42 行设置断点
|
|||
|
|
b func_name - 在函数入口设置断点
|
|||
|
|
b 42, x > 10 - 条件断点:仅当 x > 10 时触发
|
|||
|
|
cl (clear) - 清除所有断点
|
|||
|
|
q (quit) - 退出调试器
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.2 debugpy (VS Code 远程调试)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 在代码中嵌入调试服务器
|
|||
|
|
import debugpy
|
|||
|
|
debugpy.listen(("0.0.0.0", 5678))
|
|||
|
|
print("等待调试器连接...")
|
|||
|
|
debugpy.wait_for_client() # 阻塞直到 VS Code 连接
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```jsonc
|
|||
|
|
// .vscode/launch.json - 远程附加
|
|||
|
|
{
|
|||
|
|
"type": "debugpy",
|
|||
|
|
"request": "attach",
|
|||
|
|
"name": "Attach to Remote Python",
|
|||
|
|
"connect": { "host": "localhost", "port": 5678 },
|
|||
|
|
"pathMappings": [
|
|||
|
|
{
|
|||
|
|
"localRoot": "${workspaceFolder}",
|
|||
|
|
"remoteRoot": "/app" // Docker 容器中的路径
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.3 内存分析
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# tracemalloc - 内置内存跟踪
|
|||
|
|
import tracemalloc
|
|||
|
|
tracemalloc.start()
|
|||
|
|
|
|||
|
|
# ... 执行可疑代码 ...
|
|||
|
|
|
|||
|
|
snapshot = tracemalloc.take_snapshot()
|
|||
|
|
top_stats = snapshot.statistics('lineno')
|
|||
|
|
print("[ 内存分配 Top 10 ]")
|
|||
|
|
for stat in top_stats[:10]:
|
|||
|
|
print(stat)
|
|||
|
|
|
|||
|
|
# objgraph - 对象引用图
|
|||
|
|
import objgraph
|
|||
|
|
objgraph.show_most_common_types(limit=20) # 最多的对象类型
|
|||
|
|
objgraph.show_growth(limit=10) # 对象增长情况
|
|||
|
|
objgraph.show_backrefs(obj, max_depth=5, # 谁引用了这个对象
|
|||
|
|
filename='refs.png')
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 五、Go 调试工具链
|
|||
|
|
|
|||
|
|
### 5.1 Delve (dlv) 常用命令
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 启动调试
|
|||
|
|
dlv debug ./cmd/server # 编译并调试
|
|||
|
|
dlv debug ./cmd/server -- --port 8080 # 带参数
|
|||
|
|
dlv attach <pid> # 附加到运行中的进程
|
|||
|
|
dlv test ./pkg/service # 调试测试
|
|||
|
|
|
|||
|
|
# 远程调试(服务器端)
|
|||
|
|
dlv debug --headless --listen=:2345 --api-version=2 ./cmd/server
|
|||
|
|
# 本地连接
|
|||
|
|
dlv connect localhost:2345
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
常用命令:
|
|||
|
|
break (b) main.go:42 - 设置断点
|
|||
|
|
break funcName - 在函数入口设置断点
|
|||
|
|
condition <id> i == 5 - 条件断点
|
|||
|
|
continue (c) - 继续执行
|
|||
|
|
next (n) - 下一行
|
|||
|
|
step (s) - 单步进入
|
|||
|
|
stepout (so) - 跳出当前函数
|
|||
|
|
print (p) variable - 打印变量
|
|||
|
|
locals - 显示所有局部变量
|
|||
|
|
goroutines (grs) - 列出所有 goroutine
|
|||
|
|
goroutine <id> - 切换到指定 goroutine
|
|||
|
|
stack (bt) - 显示调用堆栈
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```jsonc
|
|||
|
|
// .vscode/launch.json - Go 调试
|
|||
|
|
{
|
|||
|
|
"type": "go",
|
|||
|
|
"request": "launch",
|
|||
|
|
"name": "Debug Go Server",
|
|||
|
|
"program": "${workspaceFolder}/cmd/server",
|
|||
|
|
"args": ["--config", "config.dev.yaml"],
|
|||
|
|
"env": { "GO_ENV": "development" }
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.2 pprof 性能分析
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// 在应用中启用 pprof
|
|||
|
|
import _ "net/http/pprof"
|
|||
|
|
// 确保有 HTTP 服务在运行,pprof 会注册到 DefaultServeMux
|
|||
|
|
|
|||
|
|
// 或手动注册到自定义 mux
|
|||
|
|
import "net/http/pprof"
|
|||
|
|
mux.HandleFunc("/debug/pprof/", pprof.Index)
|
|||
|
|
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
|
|||
|
|
mux.HandleFunc("/debug/pprof/heap", pprof.Handler("heap").ServeHTTP)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# CPU 分析(采集 30 秒)
|
|||
|
|
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
|
|||
|
|
|
|||
|
|
# 内存分析
|
|||
|
|
go tool pprof http://localhost:6060/debug/pprof/heap
|
|||
|
|
|
|||
|
|
# Goroutine 分析(排查泄漏)
|
|||
|
|
go tool pprof http://localhost:6060/debug/pprof/goroutine
|
|||
|
|
|
|||
|
|
# 交互式命令
|
|||
|
|
(pprof) top 20 # 热点函数 Top 20
|
|||
|
|
(pprof) list funcName # 查看函数逐行消耗
|
|||
|
|
(pprof) web # 生成调用图并在浏览器打开
|
|||
|
|
(pprof) flame # 生成火焰图
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.3 Race Detector
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 编译和运行时检测数据竞争
|
|||
|
|
go run -race ./cmd/server
|
|||
|
|
go test -race ./...
|
|||
|
|
go build -race -o server ./cmd/server
|
|||
|
|
|
|||
|
|
# 输出示例:
|
|||
|
|
# WARNING: DATA RACE
|
|||
|
|
# Goroutine 7 (running) at:
|
|||
|
|
# main.go:42 +0x1a8
|
|||
|
|
# Previous write at:
|
|||
|
|
# main.go:38 +0x130
|
|||
|
|
# Goroutine 6 (running) at:
|
|||
|
|
# main.go:38 +0x130
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**常见修复方式**:
|
|||
|
|
- 使用 `sync.Mutex` 或 `sync.RWMutex` 保护共享数据
|
|||
|
|
- 使用 `channel` 代替共享内存
|
|||
|
|
- 使用 `sync/atomic` 处理简单的计数器
|
|||
|
|
- 使用 `sync.Map` 代替普通 map 的并发读写
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 六、日志分析技巧
|
|||
|
|
|
|||
|
|
### 6.1 结构化日志
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Node.js - 使用 pino 结构化日志
|
|||
|
|
import pino from 'pino';
|
|||
|
|
const logger = pino({
|
|||
|
|
level: process.env.LOG_LEVEL || 'info',
|
|||
|
|
formatters: {
|
|||
|
|
level: (label) => ({ level: label }),
|
|||
|
|
},
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
logger.info({ userId: 123, action: 'login', ip: '10.0.0.1' }, '用户登录成功');
|
|||
|
|
// 输出: {"level":"info","userId":123,"action":"login","ip":"10.0.0.1","msg":"用户登录成功"}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Python - 使用 structlog 结构化日志
|
|||
|
|
import structlog
|
|||
|
|
logger = structlog.get_logger()
|
|||
|
|
logger.info("用户登录", user_id=123, action="login", ip="10.0.0.1")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6.2 日志级别策略
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
ERROR - 需要立即处理的错误(数据库挂了、支付失败)
|
|||
|
|
WARN - 可恢复但异常的情况(重试成功、降级处理)
|
|||
|
|
INFO - 重要业务事件(用户注册、订单创建)
|
|||
|
|
DEBUG - 开发调试信息(函数入参、中间状态)
|
|||
|
|
TRACE - 极细粒度追踪(循环内每步状态)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6.3 关联 ID 追踪 (Correlation ID)
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Express 中间件 - 为每个请求生成关联 ID
|
|||
|
|
import { randomUUID } from 'crypto';
|
|||
|
|
|
|||
|
|
app.use((req, res, next) => {
|
|||
|
|
req.correlationId = req.headers['x-correlation-id'] || randomUUID();
|
|||
|
|
res.setHeader('x-correlation-id', req.correlationId);
|
|||
|
|
// 注入到日志上下文
|
|||
|
|
req.logger = logger.child({ correlationId: req.correlationId });
|
|||
|
|
next();
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
// 在所有后续日志中自动携带 correlationId
|
|||
|
|
req.logger.info({ userId: user.id }, '处理用户请求');
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 七、生产环境调试
|
|||
|
|
|
|||
|
|
### 7.1 只读调试原则
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
生产环境调试铁律:
|
|||
|
|
1. 绝不修改生产数据
|
|||
|
|
2. 绝不在生产环境执行写操作
|
|||
|
|
3. 使用只读副本进行数据查询
|
|||
|
|
4. 优先分析日志和监控数据
|
|||
|
|
5. 必要时使用 feature flag 控制变更
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 7.2 Feature Flag 回退
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// 使用 feature flag 安全回退
|
|||
|
|
if (featureFlags.isEnabled('new-payment-flow')) {
|
|||
|
|
return newPaymentHandler(req);
|
|||
|
|
} else {
|
|||
|
|
return legacyPaymentHandler(req); // 随时可回退
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 关闭 flag 不需要部署,立即生效
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 7.3 蓝绿切换排查
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 检查当前活跃环境
|
|||
|
|
kubectl get service myapp -o jsonpath='{.spec.selector.version}'
|
|||
|
|
|
|||
|
|
# 流量切换到旧版本
|
|||
|
|
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
|
|||
|
|
|
|||
|
|
# 确认切换成功
|
|||
|
|
kubectl get endpoints myapp
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 7.4 生产日志快速过滤
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 按错误级别过滤
|
|||
|
|
kubectl logs deploy/myapp --since=1h | jq 'select(.level == "error")'
|
|||
|
|
|
|||
|
|
# 按关联 ID 追踪一个请求的完整链路
|
|||
|
|
kubectl logs deploy/myapp --since=1h | jq 'select(.correlationId == "abc-123")'
|
|||
|
|
|
|||
|
|
# 按用户 ID 过滤
|
|||
|
|
kubectl logs deploy/myapp --since=1h | jq 'select(.userId == 42)'
|
|||
|
|
|
|||
|
|
# Docker Compose 环境
|
|||
|
|
docker compose logs --since=1h app | grep "ERROR"
|
|||
|
|
docker compose logs -f app 2>&1 | jq 'select(.level == "error")'
|
|||
|
|
```
|