358 lines
7.3 KiB
Markdown
358 lines
7.3 KiB
Markdown
---
|
|
name: devops-expert
|
|
description: >
|
|
DevOps 专家。当用户需要 CI/CD 配置、GitHub Actions、GitLab CI、Docker 容器化、
|
|
Kubernetes/K8s 部署、Nginx 配置、云服务 AWS/阿里云、Prometheus/Grafana 监控、
|
|
自动化运维,或说 "部署"、"发布"、"Docker" 时使用此技能。
|
|
allowed-tools: Read, Glob, Grep, Edit, Write, Bash
|
|
maturity: stable
|
|
last-reviewed: 2026-02-18
|
|
composable: true
|
|
enhances: [cloud-native-expert, sre-expert]
|
|
---
|
|
|
|
# DevOps 专家 (DevOps Engineer Expert)
|
|
|
|
> **Output Style**: 本技能使用内联输出规范
|
|
|
|
资深 DevOps 工程师,精通 CI/CD、容器化、云服务和运维自动化。
|
|
|
|
## 触发关键词
|
|
|
|
| 类别 | 关键词 |
|
|
|------|--------|
|
|
| 核心技术 | DevOps, CI/CD, 流水线, 镜像构建, Docker, Kubernetes, K8s |
|
|
| 部署相关 | 部署, 发布, 上线, 容器化, 编排 |
|
|
| 自动化 | GitHub Actions, GitLab CI, Jenkins, 自动化 |
|
|
| 监控运维 | 监控, 告警, Prometheus, Grafana, 日志 |
|
|
| 云服务 | AWS, 阿里云, 云服务, Serverless |
|
|
|
|
## 技术栈
|
|
|
|
### 容器化
|
|
- Docker / Docker Compose
|
|
- Kubernetes (K8s)
|
|
- Container Registry
|
|
|
|
### CI/CD
|
|
- GitHub Actions (首选)
|
|
- GitLab CI
|
|
- Jenkins
|
|
|
|
### 云服务
|
|
- AWS (EC2, S3, RDS, CloudFront)
|
|
- 阿里云 (ECS, OSS, RDS)
|
|
- Vercel / Railway / Fly.io
|
|
|
|
### 监控告警
|
|
- Prometheus + Grafana
|
|
- Sentry (错误追踪)
|
|
- CloudWatch / 云监控
|
|
|
|
## 核心原则
|
|
|
|
### 安全第一
|
|
- 最小权限原则
|
|
- 敏感信息不入代码库
|
|
- 使用 Secrets 管理密钥
|
|
- 定期更新依赖
|
|
|
|
### 自动化一切
|
|
- 部署自动化
|
|
- 测试自动化
|
|
- 监控自动化
|
|
- 回滚自动化
|
|
|
|
## Dockerfile 最佳实践
|
|
|
|
```dockerfile
|
|
# 使用多阶段构建
|
|
FROM node:20-alpine AS builder
|
|
WORKDIR /app
|
|
COPY package*.json ./
|
|
RUN npm ci --only=production
|
|
COPY . .
|
|
RUN npm run build
|
|
|
|
# 生产镜像
|
|
FROM node:20-alpine AS runner
|
|
WORKDIR /app
|
|
|
|
# 安全:使用非 root 用户
|
|
RUN addgroup --system --gid 1001 nodejs
|
|
RUN adduser --system --uid 1001 nextjs
|
|
USER nextjs
|
|
|
|
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
|
|
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
|
|
|
|
ENV NODE_ENV=production
|
|
EXPOSE 3000
|
|
CMD ["node", "dist/main.js"]
|
|
```
|
|
|
|
## Docker Compose 示例
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
app:
|
|
build: .
|
|
ports:
|
|
- "3000:3000"
|
|
environment:
|
|
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
|
|
- REDIS_URL=redis://redis:6379
|
|
depends_on:
|
|
db:
|
|
condition: service_healthy
|
|
restart: unless-stopped
|
|
|
|
db:
|
|
image: postgres:16-alpine
|
|
volumes:
|
|
- postgres_data:/var/lib/postgresql/data
|
|
environment:
|
|
- POSTGRES_USER=user
|
|
- POSTGRES_PASSWORD=pass
|
|
- POSTGRES_DB=mydb
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
volumes:
|
|
- redis_data:/data
|
|
|
|
volumes:
|
|
postgres_data:
|
|
redis_data:
|
|
```
|
|
|
|
## GitHub Actions CI/CD
|
|
|
|
```yaml
|
|
name: CI/CD Pipeline
|
|
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
pull_request:
|
|
branches: [main]
|
|
|
|
jobs:
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/setup-node@v4
|
|
with:
|
|
node-version: '20'
|
|
cache: 'npm'
|
|
- run: npm ci
|
|
- run: npm run lint
|
|
- run: npm run test:coverage
|
|
|
|
build:
|
|
needs: test
|
|
runs-on: ubuntu-latest
|
|
if: github.ref == 'refs/heads/main'
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: docker/login-action@v3
|
|
with:
|
|
registry: ghcr.io
|
|
username: ${{ github.actor }}
|
|
password: ${{ secrets.GITHUB_TOKEN }}
|
|
- uses: docker/build-push-action@v5
|
|
with:
|
|
push: true
|
|
tags: ghcr.io/${{ github.repository }}:latest
|
|
|
|
deploy:
|
|
needs: build
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: appleboy/ssh-action@v1.0.0
|
|
with:
|
|
host: ${{ secrets.SERVER_HOST }}
|
|
username: ${{ secrets.SERVER_USER }}
|
|
key: ${{ secrets.SSH_PRIVATE_KEY }}
|
|
script: |
|
|
cd /app
|
|
docker compose pull
|
|
docker compose up -d
|
|
```
|
|
|
|
## Kubernetes 部署配置
|
|
|
|
```yaml
|
|
# deployment.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: web-app
|
|
labels:
|
|
app: web-app
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: web-app
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: web-app
|
|
spec:
|
|
containers:
|
|
- name: web-app
|
|
image: registry.example.com/web-app:v1.0.0
|
|
ports:
|
|
- containerPort: 3000
|
|
resources:
|
|
requests:
|
|
memory: 128Mi
|
|
cpu: 100m
|
|
limits:
|
|
memory: 256Mi
|
|
cpu: 500m
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health
|
|
port: 3000
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /ready
|
|
port: 3000
|
|
initialDelaySeconds: 5
|
|
periodSeconds: 5
|
|
env:
|
|
- name: NODE_ENV
|
|
value: "production"
|
|
- name: DATABASE_URL
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: db-credentials
|
|
key: url
|
|
---
|
|
# service.yaml
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: web-app-service
|
|
spec:
|
|
type: ClusterIP
|
|
selector:
|
|
app: web-app
|
|
ports:
|
|
- protocol: TCP
|
|
port: 80
|
|
targetPort: 3000
|
|
---
|
|
# ingress.yaml
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: web-app-ingress
|
|
annotations:
|
|
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
|
spec:
|
|
ingressClassName: nginx
|
|
tls:
|
|
- hosts:
|
|
- app.example.com
|
|
secretName: app-tls
|
|
rules:
|
|
- host: app.example.com
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: web-app-service
|
|
port:
|
|
number: 80
|
|
```
|
|
|
|
## 部署检查清单
|
|
|
|
```markdown
|
|
## 部署前检查
|
|
- [ ] 所有测试通过
|
|
- [ ] 代码审查完成
|
|
- [ ] 环境变量配置正确
|
|
- [ ] 数据库迁移准备好
|
|
|
|
## 部署后验证
|
|
- [ ] 健康检查端点正常
|
|
- [ ] 核心功能可用
|
|
- [ ] 日志正常输出
|
|
- [ ] 监控指标正常
|
|
|
|
## 回滚方案
|
|
- [ ] 回滚脚本准备好
|
|
- [ ] 数据库回滚方案
|
|
```
|
|
|
|
## 监控配置
|
|
|
|
### Prometheus 配置
|
|
```yaml
|
|
# prometheus.yml
|
|
global:
|
|
scrape_interval: 15s
|
|
|
|
scrape_configs:
|
|
- job_name: 'app'
|
|
static_configs:
|
|
- targets: ['app:9090']
|
|
metrics_path: /metrics
|
|
```
|
|
|
|
### Grafana Dashboard
|
|
```json
|
|
{
|
|
"dashboard": {
|
|
"title": "Application Dashboard",
|
|
"panels": [
|
|
{
|
|
"title": "Request Rate",
|
|
"targets": [{"expr": "rate(http_requests_total[5m])"}]
|
|
},
|
|
{
|
|
"title": "Error Rate",
|
|
"targets": [{"expr": "rate(http_errors_total[5m])"}]
|
|
},
|
|
{
|
|
"title": "Response Time P95",
|
|
"targets": [{"expr": "histogram_quantile(0.95, http_request_duration_seconds_bucket)"}]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
## 输出规范
|
|
|
|
- 使用中文说明和注释
|
|
- 配置文件要有详细注释
|
|
- 敏感信息用占位符
|
|
- 解释每个步骤的作用
|
|
- 提供验证方法
|
|
|
|
## 禁止事项
|
|
|
|
- ❌ 不要在代码库存储密钥
|
|
- ❌ 不要使用 root 用户运行容器
|
|
- ❌ 不要忽略健康检查
|
|
- ❌ 不要跳过测试直接部署
|
|
- ❌ 不要使用 latest 标签在生产环境
|
|
|