bookworm/bookworm-smart-assistant

Fork 0

bookworm-admin 1c14c60d3f Initial: Bookworm Smart Assistant v6.5.1 (byte-preserved, 809 files, fp 26b83e1b38cdf64a)

2026-04-21 17:57:05 +08:00

11 KiB

Raw Blame History

AWS Architecture Reference

Comprehensive guide for AWS services, patterns, and Well-Architected Framework implementation.

Well-Architected Framework

Six Pillars

Operational Excellence
- Infrastructure as Code (CloudFormation, CDK, Terraform)
- Continuous integration/deployment
- Observability (CloudWatch, X-Ray)
- Runbooks and playbooks
- Game days and failure injection
Security
- Identity and Access Management (IAM)
- Detective controls (GuardDuty, Security Hub)
- Infrastructure protection (VPC, security groups, NACLs)
- Data protection (KMS, encryption at rest/transit)
- Incident response automation
Reliability
- Multi-AZ deployments
- Auto Scaling groups
- Route 53 health checks and failover
- Backup and restore (AWS Backup)
- Chaos engineering (AWS FIS)
Performance Efficiency
- Right-sizing with Compute Optimizer
- Caching strategies (CloudFront, ElastiCache)
- Database optimization (RDS Performance Insights)
- Serverless architectures
- Global content delivery
Cost Optimization
- Reserved Instances and Savings Plans
- Spot Instances for fault-tolerant workloads
- S3 Intelligent-Tiering and lifecycle policies
- Right-sizing recommendations
- Cost allocation tags and budgets
Sustainability
- Region selection for renewable energy
- Serverless to minimize idle resources
- Efficient data storage patterns
- Resource utilization optimization

Core Services Architecture

Compute

EC2 (Elastic Compute Cloud)

Instance families: General (t3, m5), Compute (c5), Memory (r5), GPU (p3, g4)
Auto Scaling: Target tracking, step scaling, scheduled scaling
Placement groups: Cluster, partition, spread
Best practices: Use latest generation, right-size, enable detailed monitoring

Lambda

Invocation models: Synchronous, asynchronous, event source mapping
Concurrency: Reserved, provisioned, burst limits
Layers for shared dependencies
Best practices: Keep functions small, use environment variables, set timeouts

ECS/EKS (Container Services)

ECS: Fargate for serverless, EC2 for control
EKS: Managed Kubernetes with AWS integration
Service mesh: App Mesh for observability
Best practices: Use Fargate for simplicity, EKS for portability

Elastic Beanstalk

Managed platform for web apps
Auto-scaling and load balancing included
Support for multiple languages and Docker

Storage

S3 (Simple Storage Service)

Storage classes: Standard, IA, One Zone-IA, Glacier, Deep Archive
Lifecycle policies for automatic tiering
Versioning and MFA delete for protection
Cross-region replication for DR
Best practices: Enable versioning, use lifecycle policies, block public access

EBS (Elastic Block Store)

Volume types: gp3 (general), io2 (IOPS), st1 (throughput), sc1 (cold)
Snapshots to S3 for backup
Encryption by default
Best practices: Use gp3 for most workloads, enable encryption

EFS (Elastic File System)

NFSv4 file system for shared access
Performance modes: General purpose, Max I/O
Throughput modes: Bursting, provisioned
Best practices: Use lifecycle management, enable encryption

FSx

FSx for Windows File Server (SMB)
FSx for Lustre (HPC workloads)
FSx for NetApp ONTAP
FSx for OpenZFS

Database

RDS (Relational Database Service)

Engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora
Multi-AZ for high availability
Read replicas for scalability
Automated backups and point-in-time recovery
Best practices: Use Aurora for performance, enable Multi-AZ, use read replicas

Aurora

MySQL and PostgreSQL compatible
5x MySQL, 3x PostgreSQL performance
Global databases for cross-region DR
Serverless v2 for variable workloads
Best practices: Use Aurora Serverless for unpredictable workloads

DynamoDB

NoSQL key-value and document database
On-demand or provisioned capacity
Global tables for multi-region replication
DynamoDB Streams for change data capture
Best practices: Use on-demand for unpredictable traffic, implement GSI carefully

ElastiCache

Redis or Memcached in-memory caching
Cluster mode for Redis scalability
Best practices: Use for session storage, API caching

Networking

VPC (Virtual Private Cloud)

CIDR planning: Avoid overlaps, plan for growth
Subnets: Public (IGW), private (NAT), isolated (no internet)
Route tables and routing decisions
Security groups (stateful) and NACLs (stateless)
Best practices: Use /16 for VPC, /24 for subnets, plan IP space

Route 53

DNS service with health checks
Routing policies: Simple, weighted, latency, failover, geolocation
Best practices: Use alias records, enable DNSSEC

CloudFront

Global CDN with edge locations
Origin types: S3, ALB, custom origins
Lambda@Edge for request/response manipulation
Best practices: Enable compression, use field-level encryption

VPN and Direct Connect

Site-to-Site VPN for encrypted tunnels
Direct Connect for dedicated bandwidth
Transit Gateway for hub-and-spoke topology
Best practices: Use Direct Connect for high bandwidth, Transit Gateway for complex routing

API Gateway

REST APIs, HTTP APIs, WebSocket APIs
Throttling and quotas
Integration with Lambda, HTTP endpoints, AWS services
Best practices: Use HTTP APIs for lower cost, implement caching

Security

IAM (Identity and Access Management)

Principle of least privilege
Roles for applications, not access keys
MFA for privileged users
Service Control Policies (SCPs) for organization-wide controls
Best practices: Use roles, enable MFA, rotate credentials

KMS (Key Management Service)

Customer managed keys (CMKs)
Automatic key rotation
Envelope encryption pattern
Best practices: Enable automatic rotation, use grants for temporary access

Secrets Manager

Automatic rotation for RDS credentials
Versioning and rollback
Best practices: Rotate secrets regularly, use VPC endpoints

Security Hub

Centralized security findings
CIS AWS Foundations Benchmark
Integration with GuardDuty, Inspector, Macie

GuardDuty

Threat detection using ML
Monitors CloudTrail, VPC Flow Logs, DNS logs

Architecture Patterns

High Availability

Multi-AZ Pattern

- Application Load Balancer across 3 AZs
- Auto Scaling group with instances in each AZ
- RDS Multi-AZ for database
- S3 for static assets (11 9's durability)

Multi-Region Pattern

- Route 53 with health checks and failover
- CloudFront for global distribution
- Aurora Global Database for <1s RPO
- S3 Cross-Region Replication

Serverless Architecture

API-Driven Pattern

API Gateway -> Lambda -> DynamoDB
              |
              v
          EventBridge -> Lambda (async processing)

Event-Driven Pattern

S3 Event -> Lambda -> Process -> SNS
                                  |
                                  v
                              Multiple subscribers

Microservices on AWS

Container-Based

ALB -> ECS Fargate (multiple services)
    |
    v
Service Discovery (Cloud Map)
    |
    v
RDS/DynamoDB per service

Service Mesh

App Mesh for traffic management
X-Ray for distributed tracing
CloudWatch Container Insights

Data Lake Architecture

Data Sources -> Kinesis Data Streams
                      |
                      v
              Kinesis Firehose
                      |
                      v
                S3 (raw bucket)
                      |
                      v
        Glue ETL or Lambda processing
                      |
                      v
            S3 (processed bucket)
                      |
                      v
          Athena/Redshift Spectrum
                      |
                      v
              QuickSight dashboards

Migration Strategies (6Rs)

Rehost (Lift-and-Shift)
- AWS Application Migration Service (MGN)
- Minimal changes, quick migration
- Use for legacy apps with compliance constraints
Replatform (Lift-Tinker-and-Shift)
- Migrate to RDS instead of self-managed databases
- Use Elastic Beanstalk instead of custom app servers
- Small optimizations during migration
Repurchase (Drop-and-Shop)
- Move to SaaS (e.g., Salesforce, Workday)
- Reduce maintenance burden
Refactor/Re-architect
- Modernize to serverless or containers
- Highest effort, highest benefit
- Use for competitive advantage applications
Retire
- Decommission unused applications
- Reduce attack surface and costs
Retain
- Keep on-premises temporarily
- Migrate later or keep for regulatory reasons

Landing Zone Design

AWS Control Tower

Multi-account strategy (AWS Organizations)
Account factory for standardization
Guardrails for governance (SCPs)
Centralized logging (CloudTrail, Config)

Account Structure

Root
├── Security OU
│   ├── Log Archive Account
│   └── Security Tooling Account
├── Infrastructure OU
│   ├── Network Account (Transit Gateway, VPN)
│   └── Shared Services Account
└── Workloads OU
    ├── Production Account
    ├── Staging Account
    └── Development Account

Network Design

Transit Gateway (hub)
    |
    ├── Production VPC
    ├── Staging VPC
    ├── Development VPC
    └── On-premises (Direct Connect/VPN)

Cost Optimization Strategies

Compute Savings

Compute Savings Plans (up to 66% savings)
EC2 Reserved Instances (1-year or 3-year)
Spot Instances for batch/fault-tolerant workloads
Lambda: Reduce memory if possible, use reserved concurrency

Storage Savings

S3 Intelligent-Tiering for unpredictable access
Lifecycle policies to Glacier/Deep Archive
EBS gp3 instead of gp2 (20% cheaper, better performance)
Delete unused snapshots and volumes

Database Savings

Aurora Serverless v2 for variable workloads
RDS Reserved Instances
DynamoDB on-demand for unpredictable workloads
Read replicas in same region to reduce cross-AZ data transfer

Monitoring and Alerting

AWS Cost Explorer for analysis
AWS Budgets for alerts
Cost Anomaly Detection
Trusted Advisor for recommendations

Disaster Recovery

RPO and RTO Targets

Backup and Restore: Hours RPO/RTO (lowest cost)
Pilot Light: Minutes RPO, hours RTO
Warm Standby: Seconds RPO, minutes RTO
Multi-Site Active/Active: Near-zero RPO/RTO (highest cost)

Implementation

AWS Backup for centralized backup management
Aurora Global Database for cross-region replication
S3 Cross-Region Replication
Route 53 health checks and failover routing
Regular DR testing with CloudFormation/Terraform

Monitoring and Observability

CloudWatch

Metrics: Standard (5 min) and detailed (1 min)
Alarms with SNS notifications
Logs Insights for log analysis
Dashboards for visualization

X-Ray

Distributed tracing for microservices
Service map visualization
Trace annotations and metadata

AWS Config

Resource inventory and change tracking
Compliance rules evaluation
Relationship tracking between resources

11 KiB Raw Blame History