bookworm-smart-assistant/skills/cloud-architect/references/azure.md

563 lines
15 KiB
Markdown

# Azure Architecture Reference
Comprehensive guide for Azure services, patterns, and Cloud Adoption Framework implementation.
## Cloud Adoption Framework
### Framework Phases
1. **Strategy**
- Define business justification
- Expected business outcomes
- Business case development
- First project prioritization
2. **Plan**
- Digital estate assessment
- Initial organization alignment
- Skills readiness plan
- Cloud adoption plan
3. **Ready**
- Azure landing zone setup
- Azure setup guide
- Migration readiness
- Best practices validation
4. **Adopt (Migrate + Innovate)**
- Migration: Assess, migrate, optimize
- Innovate: Build cloud-native solutions
- Best practices and patterns
5. **Govern**
- Methodology for governance
- Governance benchmark
- Initial governance foundation
- Mature governance evolution
6. **Manage**
- Business commitments
- Operations baseline
- Platform and workload specialization
## Azure Well-Architected Framework
### Five Pillars
1. **Cost Optimization**
- Azure Cost Management and Billing
- Reserved instances and Savings Plans
- Azure Hybrid Benefit
- Auto-scaling and right-sizing
2. **Operational Excellence**
- Infrastructure as Code (ARM, Bicep, Terraform)
- Azure DevOps and GitHub Actions
- Azure Monitor and Application Insights
- Deployment slots and blue-green deployments
3. **Performance Efficiency**
- Azure CDN and Front Door
- Auto-scaling (VMSS, App Service)
- Caching (Redis, CDN)
- Performance diagnostics
4. **Reliability**
- Availability Zones and regions
- Azure Site Recovery
- Load Balancer and Traffic Manager
- Backup and disaster recovery
5. **Security**
- Azure AD (Entra ID)
- Network Security Groups and Firewalls
- Azure Key Vault
- Microsoft Defender for Cloud
## Core Services Architecture
### Compute
**Virtual Machines**
- VM sizes: General (D-series), Compute (F-series), Memory (E-series), GPU (N-series)
- Availability Sets (99.95% SLA)
- Availability Zones (99.99% SLA)
- VM Scale Sets for auto-scaling
- Best practices: Use managed disks, enable accelerated networking, use proximity placement groups
**App Service**
- Web Apps, API Apps, Mobile Apps
- Deployment slots for staging
- Auto-scaling based on metrics or schedule
- Supports .NET, Java, Node.js, Python, PHP, Ruby
- Best practices: Use deployment slots, enable auto-scaling, use App Service Plan efficiently
**Azure Functions**
- Consumption Plan (serverless)
- Premium Plan (VNet integration, no cold start)
- Dedicated Plan (App Service Plan)
- Durable Functions for orchestration
- Best practices: Keep functions small, use Premium for production, implement retry policies
**Azure Kubernetes Service (AKS)**
- Managed Kubernetes control plane
- Azure CNI or kubenet networking
- Azure AD integration
- Virtual nodes (Azure Container Instances)
- Best practices: Use system node pools, enable autoscaling, implement network policies
**Container Instances**
- Serverless containers
- Fast startup without infrastructure management
- Best for batch jobs and burstable workloads
**Azure Batch**
- Large-scale parallel and HPC workloads
- Auto-scaling compute nodes
- Task scheduling and dependencies
### Storage
**Blob Storage**
- Storage tiers: Hot, Cool, Archive
- Access tiers: Premium, Standard
- Lifecycle management policies
- Immutable storage for compliance
- Best practices: Use lifecycle policies, enable soft delete, implement versioning
**Azure Files**
- SMB and NFS file shares
- Integration with Azure File Sync
- Premium tier for high performance
- Best practices: Use Premium for databases, implement snapshots
**Disk Storage**
- Managed Disks: Premium SSD, Standard SSD, Standard HDD, Ultra Disk
- Disk encryption with Azure Disk Encryption
- Snapshots and incremental backups
- Best practices: Use Premium SSD for production, enable encryption
**Data Lake Storage Gen2**
- Hierarchical namespace for big data
- Built on Blob Storage
- Integration with Azure Synapse and Databricks
- Best practices: Enable hierarchical namespace, use lifecycle policies
**Azure NetApp Files**
- Enterprise-grade NFS and SMB shares
- High performance and low latency
- Snapshots and data protection
### Database
**Azure SQL Database**
- Serverless and provisioned compute
- Hyperscale for up to 100TB
- Elastic pools for multiple databases
- Auto-tuning and intelligent insights
- Best practices: Use serverless for dev/test, enable geo-replication
**Azure SQL Managed Instance**
- Near 100% compatibility with SQL Server
- VNet integration for isolation
- Native virtual network implementation
- Best practices: Use for lift-and-shift migrations
**Cosmos DB**
- Multi-model NoSQL database
- Global distribution with multi-master
- Consistency levels: Strong, Bounded staleness, Session, Consistent prefix, Eventual
- APIs: SQL, MongoDB, Cassandra, Gremlin, Table
- Best practices: Choose appropriate consistency, partition key design critical
**Azure Database for PostgreSQL/MySQL/MariaDB**
- Flexible Server (newer) vs Single Server (legacy)
- High availability with zone redundancy
- Read replicas for scaling
- Best practices: Use Flexible Server, enable HA, implement connection pooling
**Azure Cache for Redis**
- In-memory caching
- Clustering for scalability
- Geo-replication for disaster recovery
- Best practices: Use Premium tier for production, enable persistence
### Networking
**Virtual Network (VNet)**
- CIDR planning (avoid overlaps)
- Subnets with Network Security Groups
- Service endpoints and Private Link
- VNet peering for connectivity
- Best practices: Plan IP address space, use NSGs, implement Private Link
**Azure Load Balancer**
- Layer 4 load balancing
- Standard SKU (zone-redundant, SLA)
- Health probes and distribution algorithms
- Best practices: Use Standard SKU, configure health probes
**Application Gateway**
- Layer 7 load balancing
- WAF (Web Application Firewall)
- URL-based routing and SSL termination
- Best practices: Enable WAF, use autoscaling
**Azure Front Door**
- Global load balancing and CDN
- WAF at edge
- Anycast for low latency
- Best practices: Use for global applications, enable caching
**VPN Gateway and ExpressRoute**
- Site-to-Site VPN for encrypted connectivity
- ExpressRoute for private, dedicated connection
- Virtual WAN for global transit network
- Best practices: Use ExpressRoute for production, implement redundancy
**Azure Firewall**
- Managed firewall service
- Application and network rules
- Threat intelligence
- Best practices: Use in hub-spoke topology, enable DNS proxy
**Azure Private Link**
- Private connectivity to Azure services
- No public internet exposure
- Available for PaaS services
- Best practices: Use for all PaaS services in production
### Security and Identity
**Azure Active Directory (Microsoft Entra ID)**
- Identity and access management
- Conditional Access policies
- Multi-factor authentication
- B2B and B2C scenarios
- Best practices: Enable MFA, use Conditional Access, implement PIM
**Azure Key Vault**
- Secrets, keys, and certificates management
- Hardware Security Module (HSM) backed
- Soft delete and purge protection
- Best practices: Enable soft delete, use RBAC, implement Private Link
**Microsoft Defender for Cloud**
- Security posture management
- Threat protection for hybrid workloads
- Regulatory compliance dashboard
- Just-in-time VM access
- Best practices: Enable enhanced security, implement recommendations
**Azure Policy**
- Governance and compliance at scale
- Built-in and custom policies
- Deny, audit, append effects
- Best practices: Assign at management group level, test before enforce
**Azure Sentinel**
- Cloud-native SIEM and SOAR
- AI-powered threat detection
- Integration with Microsoft 365, third-party tools
- Best practices: Enable data connectors, create custom analytics rules
## Architecture Patterns
### High Availability
**Zone-Redundant Pattern**
```
Azure Front Door (global)
|
v
Application Gateway (zone-redundant)
|
v
VM Scale Set (across availability zones)
|
v
Azure SQL Database (zone-redundant)
```
**Multi-Region Pattern**
```
Azure Traffic Manager (DNS-based routing)
|
├── Region 1: App Service + SQL Database (primary)
└── Region 2: App Service + SQL Database (geo-replica)
```
### Hub-Spoke Topology
```
Hub VNet
├── Azure Firewall
├── VPN Gateway
└── Shared Services
|
├── Spoke VNet 1 (Production)
├── Spoke VNet 2 (Development)
└── Spoke VNet 3 (DMZ)
```
### Serverless Architecture
**Event-Driven Pattern**
```
Event Grid -> Azure Functions -> Cosmos DB
|
v
Service Bus -> Functions (processing)
```
**API-First Pattern**
```
API Management
|
├── Function App 1 (auth)
├── Function App 2 (business logic)
└── Function App 3 (data access)
```
### Microservices on Azure
**AKS-Based**
```
Azure Front Door
|
v
Application Gateway + WAF
|
v
AKS (multiple microservices)
|
├── Cosmos DB (microservice A)
├── SQL Database (microservice B)
└── Service Bus (async communication)
```
**Container Apps Pattern**
```
Azure Container Apps
├── Dapr for state management
├── KEDA for event-driven scaling
└── Azure Monitor for observability
```
### Data Platform
```
Data Sources
|
v
Event Hubs / IoT Hub
|
v
Stream Analytics (real-time processing)
|
v
Data Lake Storage Gen2
|
v
Azure Synapse Analytics
|
v
Power BI (visualization)
```
## Landing Zone Design
### Enterprise-Scale Landing Zone
**Management Group Hierarchy**
```
Tenant Root Group
├── Platform
│ ├── Management (monitoring, automation)
│ ├── Connectivity (hub networks, VPN)
│ └── Identity (domain controllers)
└── Landing Zones
├── Corp (internal workloads)
└── Online (internet-facing workloads)
```
**Network Topology**
```
Hub VNet (Connectivity subscription)
├── Azure Firewall
├── VPN Gateway
├── ExpressRoute Gateway
└── Bastion
Spoke VNets (Workload subscriptions)
├── Production VNet
├── Staging VNet
└── Development VNet
```
**Governance**
- Azure Policy for compliance
- Management groups for hierarchy
- RBAC assignments at appropriate scope
- Resource tags for cost allocation
- Azure Blueprints for repeatable deployments
## Migration Strategies
### Azure Migrate
1. **Assess**
- Discovery with Azure Migrate appliance
- Dependency analysis
- Performance-based sizing
- Cost estimation
2. **Migrate**
- Azure Migrate: Server Migration (agentless)
- Database Migration Service
- App Service Migration Assistant
- Data Box for large data transfers
3. **Optimize**
- Right-sizing recommendations
- Reserved instances
- Azure Hybrid Benefit
### Migration Patterns
**Rehost**: Azure Migrate for VMs
**Replatform**: App Service, Azure SQL Database
**Refactor**: Container Apps, AKS, Functions
**Rebuild**: Azure-native services (Cosmos DB, Cognitive Services)
## Cost Optimization
### Compute Savings
- Azure Reserved Instances (1-year or 3-year, up to 72% savings)
- Azure Savings Plans for Compute (up to 65% savings)
- Spot VMs for fault-tolerant workloads (up to 90% savings)
- Azure Hybrid Benefit (use existing Windows Server/SQL licenses)
- Auto-shutdown for dev/test VMs
### Storage Savings
- Blob Storage lifecycle policies (Hot -> Cool -> Archive)
- Azure Files: Standard tier for general use
- Managed Disks: Standard SSD instead of Premium if possible
- Delete unused snapshots and disks
### Database Savings
- Serverless tier for Azure SQL Database
- Reserved capacity for Cosmos DB
- DTU model vs vCore (choose based on workload)
- Pause Azure Synapse when not in use
### Monitoring
- Azure Cost Management + Billing
- Cost alerts and budgets
- Azure Advisor recommendations
- Resource tagging for cost allocation
## Disaster Recovery
### Azure Site Recovery
**VM Replication**
- Azure to Azure replication
- On-premises to Azure (VMware, Hyper-V, physical)
- RPO: 30 seconds to a few minutes
- Automated failover and failback
**Recovery Plans**
- Multi-tier application recovery
- Customizable scripts and manual actions
- Integration with Azure Automation
### Backup Strategies
**Azure Backup**
- VM backups (application-consistent)
- SQL Server and SAP HANA in Azure VMs
- Azure Files backup
- Cross-region restore
**Database Backup**
- SQL Database: Automated backups (7-35 days)
- Cosmos DB: Continuous backup (30 days)
- Long-term retention policies
### High Availability
**RTO/RPO Targets**
- Active-Active: Multi-region with Traffic Manager (near-zero)
- Active-Passive: Geo-replication with failover (minutes)
- Backup and Restore: Azure Backup (hours)
## Monitoring and Observability
### Azure Monitor
**Components**
- Metrics: Time-series data (1-minute resolution)
- Logs: Log Analytics workspace for queries (KQL)
- Alerts: Metric, log, and activity log alerts
- Dashboards: Custom visualizations
**Application Insights**
- APM for web applications
- Distributed tracing
- Live Metrics Stream
- Smart detection and anomaly detection
- Best practices: Instrument all applications, set up availability tests
### Log Analytics
**KQL Queries**
```kusto
// Performance analysis
Perf
| where CounterName == "% Processor Time"
| summarize avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| render timechart
// Failed requests
requests
| where success == false
| summarize count() by resultCode, bin(timestamp, 1h)
```
**Workbooks**
- Interactive reports
- Parameterized queries
- Combining metrics and logs
## Identity and Access
### Azure AD Best Practices
- Enable MFA for all users
- Use Conditional Access policies
- Implement Privileged Identity Management (PIM)
- Regular access reviews
- Break-glass accounts
### RBAC Design
**Built-in Roles**
- Owner: Full access including RBAC
- Contributor: Full access except RBAC
- Reader: Read-only access
- Custom roles for specific needs
**Scope Hierarchy**
```
Management Group (highest)
|
Subscription
|
Resource Group
|
Resource (lowest)
```
Best practices: Assign at highest appropriate scope, use groups not individual users, apply least privilege