395 lines
11 KiB
Markdown
395 lines
11 KiB
Markdown
# AWS Architecture Reference
|
|
|
|
Comprehensive guide for AWS services, patterns, and Well-Architected Framework implementation.
|
|
|
|
## Well-Architected Framework
|
|
|
|
### Six Pillars
|
|
|
|
1. **Operational Excellence**
|
|
- Infrastructure as Code (CloudFormation, CDK, Terraform)
|
|
- Continuous integration/deployment
|
|
- Observability (CloudWatch, X-Ray)
|
|
- Runbooks and playbooks
|
|
- Game days and failure injection
|
|
|
|
2. **Security**
|
|
- Identity and Access Management (IAM)
|
|
- Detective controls (GuardDuty, Security Hub)
|
|
- Infrastructure protection (VPC, security groups, NACLs)
|
|
- Data protection (KMS, encryption at rest/transit)
|
|
- Incident response automation
|
|
|
|
3. **Reliability**
|
|
- Multi-AZ deployments
|
|
- Auto Scaling groups
|
|
- Route 53 health checks and failover
|
|
- Backup and restore (AWS Backup)
|
|
- Chaos engineering (AWS FIS)
|
|
|
|
4. **Performance Efficiency**
|
|
- Right-sizing with Compute Optimizer
|
|
- Caching strategies (CloudFront, ElastiCache)
|
|
- Database optimization (RDS Performance Insights)
|
|
- Serverless architectures
|
|
- Global content delivery
|
|
|
|
5. **Cost Optimization**
|
|
- Reserved Instances and Savings Plans
|
|
- Spot Instances for fault-tolerant workloads
|
|
- S3 Intelligent-Tiering and lifecycle policies
|
|
- Right-sizing recommendations
|
|
- Cost allocation tags and budgets
|
|
|
|
6. **Sustainability**
|
|
- Region selection for renewable energy
|
|
- Serverless to minimize idle resources
|
|
- Efficient data storage patterns
|
|
- Resource utilization optimization
|
|
|
|
## Core Services Architecture
|
|
|
|
### Compute
|
|
|
|
**EC2 (Elastic Compute Cloud)**
|
|
- Instance families: General (t3, m5), Compute (c5), Memory (r5), GPU (p3, g4)
|
|
- Auto Scaling: Target tracking, step scaling, scheduled scaling
|
|
- Placement groups: Cluster, partition, spread
|
|
- Best practices: Use latest generation, right-size, enable detailed monitoring
|
|
|
|
**Lambda**
|
|
- Invocation models: Synchronous, asynchronous, event source mapping
|
|
- Concurrency: Reserved, provisioned, burst limits
|
|
- Layers for shared dependencies
|
|
- Best practices: Keep functions small, use environment variables, set timeouts
|
|
|
|
**ECS/EKS (Container Services)**
|
|
- ECS: Fargate for serverless, EC2 for control
|
|
- EKS: Managed Kubernetes with AWS integration
|
|
- Service mesh: App Mesh for observability
|
|
- Best practices: Use Fargate for simplicity, EKS for portability
|
|
|
|
**Elastic Beanstalk**
|
|
- Managed platform for web apps
|
|
- Auto-scaling and load balancing included
|
|
- Support for multiple languages and Docker
|
|
|
|
### Storage
|
|
|
|
**S3 (Simple Storage Service)**
|
|
- Storage classes: Standard, IA, One Zone-IA, Glacier, Deep Archive
|
|
- Lifecycle policies for automatic tiering
|
|
- Versioning and MFA delete for protection
|
|
- Cross-region replication for DR
|
|
- Best practices: Enable versioning, use lifecycle policies, block public access
|
|
|
|
**EBS (Elastic Block Store)**
|
|
- Volume types: gp3 (general), io2 (IOPS), st1 (throughput), sc1 (cold)
|
|
- Snapshots to S3 for backup
|
|
- Encryption by default
|
|
- Best practices: Use gp3 for most workloads, enable encryption
|
|
|
|
**EFS (Elastic File System)**
|
|
- NFSv4 file system for shared access
|
|
- Performance modes: General purpose, Max I/O
|
|
- Throughput modes: Bursting, provisioned
|
|
- Best practices: Use lifecycle management, enable encryption
|
|
|
|
**FSx**
|
|
- FSx for Windows File Server (SMB)
|
|
- FSx for Lustre (HPC workloads)
|
|
- FSx for NetApp ONTAP
|
|
- FSx for OpenZFS
|
|
|
|
### Database
|
|
|
|
**RDS (Relational Database Service)**
|
|
- Engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora
|
|
- Multi-AZ for high availability
|
|
- Read replicas for scalability
|
|
- Automated backups and point-in-time recovery
|
|
- Best practices: Use Aurora for performance, enable Multi-AZ, use read replicas
|
|
|
|
**Aurora**
|
|
- MySQL and PostgreSQL compatible
|
|
- 5x MySQL, 3x PostgreSQL performance
|
|
- Global databases for cross-region DR
|
|
- Serverless v2 for variable workloads
|
|
- Best practices: Use Aurora Serverless for unpredictable workloads
|
|
|
|
**DynamoDB**
|
|
- NoSQL key-value and document database
|
|
- On-demand or provisioned capacity
|
|
- Global tables for multi-region replication
|
|
- DynamoDB Streams for change data capture
|
|
- Best practices: Use on-demand for unpredictable traffic, implement GSI carefully
|
|
|
|
**ElastiCache**
|
|
- Redis or Memcached in-memory caching
|
|
- Cluster mode for Redis scalability
|
|
- Best practices: Use for session storage, API caching
|
|
|
|
### Networking
|
|
|
|
**VPC (Virtual Private Cloud)**
|
|
- CIDR planning: Avoid overlaps, plan for growth
|
|
- Subnets: Public (IGW), private (NAT), isolated (no internet)
|
|
- Route tables and routing decisions
|
|
- Security groups (stateful) and NACLs (stateless)
|
|
- Best practices: Use /16 for VPC, /24 for subnets, plan IP space
|
|
|
|
**Route 53**
|
|
- DNS service with health checks
|
|
- Routing policies: Simple, weighted, latency, failover, geolocation
|
|
- Best practices: Use alias records, enable DNSSEC
|
|
|
|
**CloudFront**
|
|
- Global CDN with edge locations
|
|
- Origin types: S3, ALB, custom origins
|
|
- Lambda@Edge for request/response manipulation
|
|
- Best practices: Enable compression, use field-level encryption
|
|
|
|
**VPN and Direct Connect**
|
|
- Site-to-Site VPN for encrypted tunnels
|
|
- Direct Connect for dedicated bandwidth
|
|
- Transit Gateway for hub-and-spoke topology
|
|
- Best practices: Use Direct Connect for high bandwidth, Transit Gateway for complex routing
|
|
|
|
**API Gateway**
|
|
- REST APIs, HTTP APIs, WebSocket APIs
|
|
- Throttling and quotas
|
|
- Integration with Lambda, HTTP endpoints, AWS services
|
|
- Best practices: Use HTTP APIs for lower cost, implement caching
|
|
|
|
### Security
|
|
|
|
**IAM (Identity and Access Management)**
|
|
- Principle of least privilege
|
|
- Roles for applications, not access keys
|
|
- MFA for privileged users
|
|
- Service Control Policies (SCPs) for organization-wide controls
|
|
- Best practices: Use roles, enable MFA, rotate credentials
|
|
|
|
**KMS (Key Management Service)**
|
|
- Customer managed keys (CMKs)
|
|
- Automatic key rotation
|
|
- Envelope encryption pattern
|
|
- Best practices: Enable automatic rotation, use grants for temporary access
|
|
|
|
**Secrets Manager**
|
|
- Automatic rotation for RDS credentials
|
|
- Versioning and rollback
|
|
- Best practices: Rotate secrets regularly, use VPC endpoints
|
|
|
|
**Security Hub**
|
|
- Centralized security findings
|
|
- CIS AWS Foundations Benchmark
|
|
- Integration with GuardDuty, Inspector, Macie
|
|
|
|
**GuardDuty**
|
|
- Threat detection using ML
|
|
- Monitors CloudTrail, VPC Flow Logs, DNS logs
|
|
|
|
## Architecture Patterns
|
|
|
|
### High Availability
|
|
|
|
**Multi-AZ Pattern**
|
|
```
|
|
- Application Load Balancer across 3 AZs
|
|
- Auto Scaling group with instances in each AZ
|
|
- RDS Multi-AZ for database
|
|
- S3 for static assets (11 9's durability)
|
|
```
|
|
|
|
**Multi-Region Pattern**
|
|
```
|
|
- Route 53 with health checks and failover
|
|
- CloudFront for global distribution
|
|
- Aurora Global Database for <1s RPO
|
|
- S3 Cross-Region Replication
|
|
```
|
|
|
|
### Serverless Architecture
|
|
|
|
**API-Driven Pattern**
|
|
```
|
|
API Gateway -> Lambda -> DynamoDB
|
|
|
|
|
v
|
|
EventBridge -> Lambda (async processing)
|
|
```
|
|
|
|
**Event-Driven Pattern**
|
|
```
|
|
S3 Event -> Lambda -> Process -> SNS
|
|
|
|
|
v
|
|
Multiple subscribers
|
|
```
|
|
|
|
### Microservices on AWS
|
|
|
|
**Container-Based**
|
|
```
|
|
ALB -> ECS Fargate (multiple services)
|
|
|
|
|
v
|
|
Service Discovery (Cloud Map)
|
|
|
|
|
v
|
|
RDS/DynamoDB per service
|
|
```
|
|
|
|
**Service Mesh**
|
|
```
|
|
App Mesh for traffic management
|
|
X-Ray for distributed tracing
|
|
CloudWatch Container Insights
|
|
```
|
|
|
|
### Data Lake Architecture
|
|
|
|
```
|
|
Data Sources -> Kinesis Data Streams
|
|
|
|
|
v
|
|
Kinesis Firehose
|
|
|
|
|
v
|
|
S3 (raw bucket)
|
|
|
|
|
v
|
|
Glue ETL or Lambda processing
|
|
|
|
|
v
|
|
S3 (processed bucket)
|
|
|
|
|
v
|
|
Athena/Redshift Spectrum
|
|
|
|
|
v
|
|
QuickSight dashboards
|
|
```
|
|
|
|
## Migration Strategies (6Rs)
|
|
|
|
1. **Rehost (Lift-and-Shift)**
|
|
- AWS Application Migration Service (MGN)
|
|
- Minimal changes, quick migration
|
|
- Use for legacy apps with compliance constraints
|
|
|
|
2. **Replatform (Lift-Tinker-and-Shift)**
|
|
- Migrate to RDS instead of self-managed databases
|
|
- Use Elastic Beanstalk instead of custom app servers
|
|
- Small optimizations during migration
|
|
|
|
3. **Repurchase (Drop-and-Shop)**
|
|
- Move to SaaS (e.g., Salesforce, Workday)
|
|
- Reduce maintenance burden
|
|
|
|
4. **Refactor/Re-architect**
|
|
- Modernize to serverless or containers
|
|
- Highest effort, highest benefit
|
|
- Use for competitive advantage applications
|
|
|
|
5. **Retire**
|
|
- Decommission unused applications
|
|
- Reduce attack surface and costs
|
|
|
|
6. **Retain**
|
|
- Keep on-premises temporarily
|
|
- Migrate later or keep for regulatory reasons
|
|
|
|
## Landing Zone Design
|
|
|
|
**AWS Control Tower**
|
|
- Multi-account strategy (AWS Organizations)
|
|
- Account factory for standardization
|
|
- Guardrails for governance (SCPs)
|
|
- Centralized logging (CloudTrail, Config)
|
|
|
|
**Account Structure**
|
|
```
|
|
Root
|
|
├── Security OU
|
|
│ ├── Log Archive Account
|
|
│ └── Security Tooling Account
|
|
├── Infrastructure OU
|
|
│ ├── Network Account (Transit Gateway, VPN)
|
|
│ └── Shared Services Account
|
|
└── Workloads OU
|
|
├── Production Account
|
|
├── Staging Account
|
|
└── Development Account
|
|
```
|
|
|
|
**Network Design**
|
|
```
|
|
Transit Gateway (hub)
|
|
|
|
|
├── Production VPC
|
|
├── Staging VPC
|
|
├── Development VPC
|
|
└── On-premises (Direct Connect/VPN)
|
|
```
|
|
|
|
## Cost Optimization Strategies
|
|
|
|
**Compute Savings**
|
|
- Compute Savings Plans (up to 66% savings)
|
|
- EC2 Reserved Instances (1-year or 3-year)
|
|
- Spot Instances for batch/fault-tolerant workloads
|
|
- Lambda: Reduce memory if possible, use reserved concurrency
|
|
|
|
**Storage Savings**
|
|
- S3 Intelligent-Tiering for unpredictable access
|
|
- Lifecycle policies to Glacier/Deep Archive
|
|
- EBS gp3 instead of gp2 (20% cheaper, better performance)
|
|
- Delete unused snapshots and volumes
|
|
|
|
**Database Savings**
|
|
- Aurora Serverless v2 for variable workloads
|
|
- RDS Reserved Instances
|
|
- DynamoDB on-demand for unpredictable workloads
|
|
- Read replicas in same region to reduce cross-AZ data transfer
|
|
|
|
**Monitoring and Alerting**
|
|
- AWS Cost Explorer for analysis
|
|
- AWS Budgets for alerts
|
|
- Cost Anomaly Detection
|
|
- Trusted Advisor for recommendations
|
|
|
|
## Disaster Recovery
|
|
|
|
**RPO and RTO Targets**
|
|
- Backup and Restore: Hours RPO/RTO (lowest cost)
|
|
- Pilot Light: Minutes RPO, hours RTO
|
|
- Warm Standby: Seconds RPO, minutes RTO
|
|
- Multi-Site Active/Active: Near-zero RPO/RTO (highest cost)
|
|
|
|
**Implementation**
|
|
- AWS Backup for centralized backup management
|
|
- Aurora Global Database for cross-region replication
|
|
- S3 Cross-Region Replication
|
|
- Route 53 health checks and failover routing
|
|
- Regular DR testing with CloudFormation/Terraform
|
|
|
|
## Monitoring and Observability
|
|
|
|
**CloudWatch**
|
|
- Metrics: Standard (5 min) and detailed (1 min)
|
|
- Alarms with SNS notifications
|
|
- Logs Insights for log analysis
|
|
- Dashboards for visualization
|
|
|
|
**X-Ray**
|
|
- Distributed tracing for microservices
|
|
- Service map visualization
|
|
- Trace annotations and metadata
|
|
|
|
**AWS Config**
|
|
- Resource inventory and change tracking
|
|
- Compliance rules evaluation
|
|
- Relationship tracking between resources
|