11 KiB
AWS Architecture Reference
Comprehensive guide for AWS services, patterns, and Well-Architected Framework implementation.
Well-Architected Framework
Six Pillars
-
Operational Excellence
- Infrastructure as Code (CloudFormation, CDK, Terraform)
- Continuous integration/deployment
- Observability (CloudWatch, X-Ray)
- Runbooks and playbooks
- Game days and failure injection
-
Security
- Identity and Access Management (IAM)
- Detective controls (GuardDuty, Security Hub)
- Infrastructure protection (VPC, security groups, NACLs)
- Data protection (KMS, encryption at rest/transit)
- Incident response automation
-
Reliability
- Multi-AZ deployments
- Auto Scaling groups
- Route 53 health checks and failover
- Backup and restore (AWS Backup)
- Chaos engineering (AWS FIS)
-
Performance Efficiency
- Right-sizing with Compute Optimizer
- Caching strategies (CloudFront, ElastiCache)
- Database optimization (RDS Performance Insights)
- Serverless architectures
- Global content delivery
-
Cost Optimization
- Reserved Instances and Savings Plans
- Spot Instances for fault-tolerant workloads
- S3 Intelligent-Tiering and lifecycle policies
- Right-sizing recommendations
- Cost allocation tags and budgets
-
Sustainability
- Region selection for renewable energy
- Serverless to minimize idle resources
- Efficient data storage patterns
- Resource utilization optimization
Core Services Architecture
Compute
EC2 (Elastic Compute Cloud)
- Instance families: General (t3, m5), Compute (c5), Memory (r5), GPU (p3, g4)
- Auto Scaling: Target tracking, step scaling, scheduled scaling
- Placement groups: Cluster, partition, spread
- Best practices: Use latest generation, right-size, enable detailed monitoring
Lambda
- Invocation models: Synchronous, asynchronous, event source mapping
- Concurrency: Reserved, provisioned, burst limits
- Layers for shared dependencies
- Best practices: Keep functions small, use environment variables, set timeouts
ECS/EKS (Container Services)
- ECS: Fargate for serverless, EC2 for control
- EKS: Managed Kubernetes with AWS integration
- Service mesh: App Mesh for observability
- Best practices: Use Fargate for simplicity, EKS for portability
Elastic Beanstalk
- Managed platform for web apps
- Auto-scaling and load balancing included
- Support for multiple languages and Docker
Storage
S3 (Simple Storage Service)
- Storage classes: Standard, IA, One Zone-IA, Glacier, Deep Archive
- Lifecycle policies for automatic tiering
- Versioning and MFA delete for protection
- Cross-region replication for DR
- Best practices: Enable versioning, use lifecycle policies, block public access
EBS (Elastic Block Store)
- Volume types: gp3 (general), io2 (IOPS), st1 (throughput), sc1 (cold)
- Snapshots to S3 for backup
- Encryption by default
- Best practices: Use gp3 for most workloads, enable encryption
EFS (Elastic File System)
- NFSv4 file system for shared access
- Performance modes: General purpose, Max I/O
- Throughput modes: Bursting, provisioned
- Best practices: Use lifecycle management, enable encryption
FSx
- FSx for Windows File Server (SMB)
- FSx for Lustre (HPC workloads)
- FSx for NetApp ONTAP
- FSx for OpenZFS
Database
RDS (Relational Database Service)
- Engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora
- Multi-AZ for high availability
- Read replicas for scalability
- Automated backups and point-in-time recovery
- Best practices: Use Aurora for performance, enable Multi-AZ, use read replicas
Aurora
- MySQL and PostgreSQL compatible
- 5x MySQL, 3x PostgreSQL performance
- Global databases for cross-region DR
- Serverless v2 for variable workloads
- Best practices: Use Aurora Serverless for unpredictable workloads
DynamoDB
- NoSQL key-value and document database
- On-demand or provisioned capacity
- Global tables for multi-region replication
- DynamoDB Streams for change data capture
- Best practices: Use on-demand for unpredictable traffic, implement GSI carefully
ElastiCache
- Redis or Memcached in-memory caching
- Cluster mode for Redis scalability
- Best practices: Use for session storage, API caching
Networking
VPC (Virtual Private Cloud)
- CIDR planning: Avoid overlaps, plan for growth
- Subnets: Public (IGW), private (NAT), isolated (no internet)
- Route tables and routing decisions
- Security groups (stateful) and NACLs (stateless)
- Best practices: Use /16 for VPC, /24 for subnets, plan IP space
Route 53
- DNS service with health checks
- Routing policies: Simple, weighted, latency, failover, geolocation
- Best practices: Use alias records, enable DNSSEC
CloudFront
- Global CDN with edge locations
- Origin types: S3, ALB, custom origins
- Lambda@Edge for request/response manipulation
- Best practices: Enable compression, use field-level encryption
VPN and Direct Connect
- Site-to-Site VPN for encrypted tunnels
- Direct Connect for dedicated bandwidth
- Transit Gateway for hub-and-spoke topology
- Best practices: Use Direct Connect for high bandwidth, Transit Gateway for complex routing
API Gateway
- REST APIs, HTTP APIs, WebSocket APIs
- Throttling and quotas
- Integration with Lambda, HTTP endpoints, AWS services
- Best practices: Use HTTP APIs for lower cost, implement caching
Security
IAM (Identity and Access Management)
- Principle of least privilege
- Roles for applications, not access keys
- MFA for privileged users
- Service Control Policies (SCPs) for organization-wide controls
- Best practices: Use roles, enable MFA, rotate credentials
KMS (Key Management Service)
- Customer managed keys (CMKs)
- Automatic key rotation
- Envelope encryption pattern
- Best practices: Enable automatic rotation, use grants for temporary access
Secrets Manager
- Automatic rotation for RDS credentials
- Versioning and rollback
- Best practices: Rotate secrets regularly, use VPC endpoints
Security Hub
- Centralized security findings
- CIS AWS Foundations Benchmark
- Integration with GuardDuty, Inspector, Macie
GuardDuty
- Threat detection using ML
- Monitors CloudTrail, VPC Flow Logs, DNS logs
Architecture Patterns
High Availability
Multi-AZ Pattern
- Application Load Balancer across 3 AZs
- Auto Scaling group with instances in each AZ
- RDS Multi-AZ for database
- S3 for static assets (11 9's durability)
Multi-Region Pattern
- Route 53 with health checks and failover
- CloudFront for global distribution
- Aurora Global Database for <1s RPO
- S3 Cross-Region Replication
Serverless Architecture
API-Driven Pattern
API Gateway -> Lambda -> DynamoDB
|
v
EventBridge -> Lambda (async processing)
Event-Driven Pattern
S3 Event -> Lambda -> Process -> SNS
|
v
Multiple subscribers
Microservices on AWS
Container-Based
ALB -> ECS Fargate (multiple services)
|
v
Service Discovery (Cloud Map)
|
v
RDS/DynamoDB per service
Service Mesh
App Mesh for traffic management
X-Ray for distributed tracing
CloudWatch Container Insights
Data Lake Architecture
Data Sources -> Kinesis Data Streams
|
v
Kinesis Firehose
|
v
S3 (raw bucket)
|
v
Glue ETL or Lambda processing
|
v
S3 (processed bucket)
|
v
Athena/Redshift Spectrum
|
v
QuickSight dashboards
Migration Strategies (6Rs)
-
Rehost (Lift-and-Shift)
- AWS Application Migration Service (MGN)
- Minimal changes, quick migration
- Use for legacy apps with compliance constraints
-
Replatform (Lift-Tinker-and-Shift)
- Migrate to RDS instead of self-managed databases
- Use Elastic Beanstalk instead of custom app servers
- Small optimizations during migration
-
Repurchase (Drop-and-Shop)
- Move to SaaS (e.g., Salesforce, Workday)
- Reduce maintenance burden
-
Refactor/Re-architect
- Modernize to serverless or containers
- Highest effort, highest benefit
- Use for competitive advantage applications
-
Retire
- Decommission unused applications
- Reduce attack surface and costs
-
Retain
- Keep on-premises temporarily
- Migrate later or keep for regulatory reasons
Landing Zone Design
AWS Control Tower
- Multi-account strategy (AWS Organizations)
- Account factory for standardization
- Guardrails for governance (SCPs)
- Centralized logging (CloudTrail, Config)
Account Structure
Root
├── Security OU
│ ├── Log Archive Account
│ └── Security Tooling Account
├── Infrastructure OU
│ ├── Network Account (Transit Gateway, VPN)
│ └── Shared Services Account
└── Workloads OU
├── Production Account
├── Staging Account
└── Development Account
Network Design
Transit Gateway (hub)
|
├── Production VPC
├── Staging VPC
├── Development VPC
└── On-premises (Direct Connect/VPN)
Cost Optimization Strategies
Compute Savings
- Compute Savings Plans (up to 66% savings)
- EC2 Reserved Instances (1-year or 3-year)
- Spot Instances for batch/fault-tolerant workloads
- Lambda: Reduce memory if possible, use reserved concurrency
Storage Savings
- S3 Intelligent-Tiering for unpredictable access
- Lifecycle policies to Glacier/Deep Archive
- EBS gp3 instead of gp2 (20% cheaper, better performance)
- Delete unused snapshots and volumes
Database Savings
- Aurora Serverless v2 for variable workloads
- RDS Reserved Instances
- DynamoDB on-demand for unpredictable workloads
- Read replicas in same region to reduce cross-AZ data transfer
Monitoring and Alerting
- AWS Cost Explorer for analysis
- AWS Budgets for alerts
- Cost Anomaly Detection
- Trusted Advisor for recommendations
Disaster Recovery
RPO and RTO Targets
- Backup and Restore: Hours RPO/RTO (lowest cost)
- Pilot Light: Minutes RPO, hours RTO
- Warm Standby: Seconds RPO, minutes RTO
- Multi-Site Active/Active: Near-zero RPO/RTO (highest cost)
Implementation
- AWS Backup for centralized backup management
- Aurora Global Database for cross-region replication
- S3 Cross-Region Replication
- Route 53 health checks and failover routing
- Regular DR testing with CloudFormation/Terraform
Monitoring and Observability
CloudWatch
- Metrics: Standard (5 min) and detailed (1 min)
- Alarms with SNS notifications
- Logs Insights for log analysis
- Dashboards for visualization
X-Ray
- Distributed tracing for microservices
- Service map visualization
- Trace annotations and metadata
AWS Config
- Resource inventory and change tracking
- Compliance rules evaluation
- Relationship tracking between resources