563 lines
15 KiB
Markdown
563 lines
15 KiB
Markdown
# Azure Architecture Reference
|
|
|
|
Comprehensive guide for Azure services, patterns, and Cloud Adoption Framework implementation.
|
|
|
|
## Cloud Adoption Framework
|
|
|
|
### Framework Phases
|
|
|
|
1. **Strategy**
|
|
- Define business justification
|
|
- Expected business outcomes
|
|
- Business case development
|
|
- First project prioritization
|
|
|
|
2. **Plan**
|
|
- Digital estate assessment
|
|
- Initial organization alignment
|
|
- Skills readiness plan
|
|
- Cloud adoption plan
|
|
|
|
3. **Ready**
|
|
- Azure landing zone setup
|
|
- Azure setup guide
|
|
- Migration readiness
|
|
- Best practices validation
|
|
|
|
4. **Adopt (Migrate + Innovate)**
|
|
- Migration: Assess, migrate, optimize
|
|
- Innovate: Build cloud-native solutions
|
|
- Best practices and patterns
|
|
|
|
5. **Govern**
|
|
- Methodology for governance
|
|
- Governance benchmark
|
|
- Initial governance foundation
|
|
- Mature governance evolution
|
|
|
|
6. **Manage**
|
|
- Business commitments
|
|
- Operations baseline
|
|
- Platform and workload specialization
|
|
|
|
## Azure Well-Architected Framework
|
|
|
|
### Five Pillars
|
|
|
|
1. **Cost Optimization**
|
|
- Azure Cost Management and Billing
|
|
- Reserved instances and Savings Plans
|
|
- Azure Hybrid Benefit
|
|
- Auto-scaling and right-sizing
|
|
|
|
2. **Operational Excellence**
|
|
- Infrastructure as Code (ARM, Bicep, Terraform)
|
|
- Azure DevOps and GitHub Actions
|
|
- Azure Monitor and Application Insights
|
|
- Deployment slots and blue-green deployments
|
|
|
|
3. **Performance Efficiency**
|
|
- Azure CDN and Front Door
|
|
- Auto-scaling (VMSS, App Service)
|
|
- Caching (Redis, CDN)
|
|
- Performance diagnostics
|
|
|
|
4. **Reliability**
|
|
- Availability Zones and regions
|
|
- Azure Site Recovery
|
|
- Load Balancer and Traffic Manager
|
|
- Backup and disaster recovery
|
|
|
|
5. **Security**
|
|
- Azure AD (Entra ID)
|
|
- Network Security Groups and Firewalls
|
|
- Azure Key Vault
|
|
- Microsoft Defender for Cloud
|
|
|
|
## Core Services Architecture
|
|
|
|
### Compute
|
|
|
|
**Virtual Machines**
|
|
- VM sizes: General (D-series), Compute (F-series), Memory (E-series), GPU (N-series)
|
|
- Availability Sets (99.95% SLA)
|
|
- Availability Zones (99.99% SLA)
|
|
- VM Scale Sets for auto-scaling
|
|
- Best practices: Use managed disks, enable accelerated networking, use proximity placement groups
|
|
|
|
**App Service**
|
|
- Web Apps, API Apps, Mobile Apps
|
|
- Deployment slots for staging
|
|
- Auto-scaling based on metrics or schedule
|
|
- Supports .NET, Java, Node.js, Python, PHP, Ruby
|
|
- Best practices: Use deployment slots, enable auto-scaling, use App Service Plan efficiently
|
|
|
|
**Azure Functions**
|
|
- Consumption Plan (serverless)
|
|
- Premium Plan (VNet integration, no cold start)
|
|
- Dedicated Plan (App Service Plan)
|
|
- Durable Functions for orchestration
|
|
- Best practices: Keep functions small, use Premium for production, implement retry policies
|
|
|
|
**Azure Kubernetes Service (AKS)**
|
|
- Managed Kubernetes control plane
|
|
- Azure CNI or kubenet networking
|
|
- Azure AD integration
|
|
- Virtual nodes (Azure Container Instances)
|
|
- Best practices: Use system node pools, enable autoscaling, implement network policies
|
|
|
|
**Container Instances**
|
|
- Serverless containers
|
|
- Fast startup without infrastructure management
|
|
- Best for batch jobs and burstable workloads
|
|
|
|
**Azure Batch**
|
|
- Large-scale parallel and HPC workloads
|
|
- Auto-scaling compute nodes
|
|
- Task scheduling and dependencies
|
|
|
|
### Storage
|
|
|
|
**Blob Storage**
|
|
- Storage tiers: Hot, Cool, Archive
|
|
- Access tiers: Premium, Standard
|
|
- Lifecycle management policies
|
|
- Immutable storage for compliance
|
|
- Best practices: Use lifecycle policies, enable soft delete, implement versioning
|
|
|
|
**Azure Files**
|
|
- SMB and NFS file shares
|
|
- Integration with Azure File Sync
|
|
- Premium tier for high performance
|
|
- Best practices: Use Premium for databases, implement snapshots
|
|
|
|
**Disk Storage**
|
|
- Managed Disks: Premium SSD, Standard SSD, Standard HDD, Ultra Disk
|
|
- Disk encryption with Azure Disk Encryption
|
|
- Snapshots and incremental backups
|
|
- Best practices: Use Premium SSD for production, enable encryption
|
|
|
|
**Data Lake Storage Gen2**
|
|
- Hierarchical namespace for big data
|
|
- Built on Blob Storage
|
|
- Integration with Azure Synapse and Databricks
|
|
- Best practices: Enable hierarchical namespace, use lifecycle policies
|
|
|
|
**Azure NetApp Files**
|
|
- Enterprise-grade NFS and SMB shares
|
|
- High performance and low latency
|
|
- Snapshots and data protection
|
|
|
|
### Database
|
|
|
|
**Azure SQL Database**
|
|
- Serverless and provisioned compute
|
|
- Hyperscale for up to 100TB
|
|
- Elastic pools for multiple databases
|
|
- Auto-tuning and intelligent insights
|
|
- Best practices: Use serverless for dev/test, enable geo-replication
|
|
|
|
**Azure SQL Managed Instance**
|
|
- Near 100% compatibility with SQL Server
|
|
- VNet integration for isolation
|
|
- Native virtual network implementation
|
|
- Best practices: Use for lift-and-shift migrations
|
|
|
|
**Cosmos DB**
|
|
- Multi-model NoSQL database
|
|
- Global distribution with multi-master
|
|
- Consistency levels: Strong, Bounded staleness, Session, Consistent prefix, Eventual
|
|
- APIs: SQL, MongoDB, Cassandra, Gremlin, Table
|
|
- Best practices: Choose appropriate consistency, partition key design critical
|
|
|
|
**Azure Database for PostgreSQL/MySQL/MariaDB**
|
|
- Flexible Server (newer) vs Single Server (legacy)
|
|
- High availability with zone redundancy
|
|
- Read replicas for scaling
|
|
- Best practices: Use Flexible Server, enable HA, implement connection pooling
|
|
|
|
**Azure Cache for Redis**
|
|
- In-memory caching
|
|
- Clustering for scalability
|
|
- Geo-replication for disaster recovery
|
|
- Best practices: Use Premium tier for production, enable persistence
|
|
|
|
### Networking
|
|
|
|
**Virtual Network (VNet)**
|
|
- CIDR planning (avoid overlaps)
|
|
- Subnets with Network Security Groups
|
|
- Service endpoints and Private Link
|
|
- VNet peering for connectivity
|
|
- Best practices: Plan IP address space, use NSGs, implement Private Link
|
|
|
|
**Azure Load Balancer**
|
|
- Layer 4 load balancing
|
|
- Standard SKU (zone-redundant, SLA)
|
|
- Health probes and distribution algorithms
|
|
- Best practices: Use Standard SKU, configure health probes
|
|
|
|
**Application Gateway**
|
|
- Layer 7 load balancing
|
|
- WAF (Web Application Firewall)
|
|
- URL-based routing and SSL termination
|
|
- Best practices: Enable WAF, use autoscaling
|
|
|
|
**Azure Front Door**
|
|
- Global load balancing and CDN
|
|
- WAF at edge
|
|
- Anycast for low latency
|
|
- Best practices: Use for global applications, enable caching
|
|
|
|
**VPN Gateway and ExpressRoute**
|
|
- Site-to-Site VPN for encrypted connectivity
|
|
- ExpressRoute for private, dedicated connection
|
|
- Virtual WAN for global transit network
|
|
- Best practices: Use ExpressRoute for production, implement redundancy
|
|
|
|
**Azure Firewall**
|
|
- Managed firewall service
|
|
- Application and network rules
|
|
- Threat intelligence
|
|
- Best practices: Use in hub-spoke topology, enable DNS proxy
|
|
|
|
**Azure Private Link**
|
|
- Private connectivity to Azure services
|
|
- No public internet exposure
|
|
- Available for PaaS services
|
|
- Best practices: Use for all PaaS services in production
|
|
|
|
### Security and Identity
|
|
|
|
**Azure Active Directory (Microsoft Entra ID)**
|
|
- Identity and access management
|
|
- Conditional Access policies
|
|
- Multi-factor authentication
|
|
- B2B and B2C scenarios
|
|
- Best practices: Enable MFA, use Conditional Access, implement PIM
|
|
|
|
**Azure Key Vault**
|
|
- Secrets, keys, and certificates management
|
|
- Hardware Security Module (HSM) backed
|
|
- Soft delete and purge protection
|
|
- Best practices: Enable soft delete, use RBAC, implement Private Link
|
|
|
|
**Microsoft Defender for Cloud**
|
|
- Security posture management
|
|
- Threat protection for hybrid workloads
|
|
- Regulatory compliance dashboard
|
|
- Just-in-time VM access
|
|
- Best practices: Enable enhanced security, implement recommendations
|
|
|
|
**Azure Policy**
|
|
- Governance and compliance at scale
|
|
- Built-in and custom policies
|
|
- Deny, audit, append effects
|
|
- Best practices: Assign at management group level, test before enforce
|
|
|
|
**Azure Sentinel**
|
|
- Cloud-native SIEM and SOAR
|
|
- AI-powered threat detection
|
|
- Integration with Microsoft 365, third-party tools
|
|
- Best practices: Enable data connectors, create custom analytics rules
|
|
|
|
## Architecture Patterns
|
|
|
|
### High Availability
|
|
|
|
**Zone-Redundant Pattern**
|
|
```
|
|
Azure Front Door (global)
|
|
|
|
|
v
|
|
Application Gateway (zone-redundant)
|
|
|
|
|
v
|
|
VM Scale Set (across availability zones)
|
|
|
|
|
v
|
|
Azure SQL Database (zone-redundant)
|
|
```
|
|
|
|
**Multi-Region Pattern**
|
|
```
|
|
Azure Traffic Manager (DNS-based routing)
|
|
|
|
|
├── Region 1: App Service + SQL Database (primary)
|
|
└── Region 2: App Service + SQL Database (geo-replica)
|
|
```
|
|
|
|
### Hub-Spoke Topology
|
|
|
|
```
|
|
Hub VNet
|
|
├── Azure Firewall
|
|
├── VPN Gateway
|
|
└── Shared Services
|
|
|
|
|
├── Spoke VNet 1 (Production)
|
|
├── Spoke VNet 2 (Development)
|
|
└── Spoke VNet 3 (DMZ)
|
|
```
|
|
|
|
### Serverless Architecture
|
|
|
|
**Event-Driven Pattern**
|
|
```
|
|
Event Grid -> Azure Functions -> Cosmos DB
|
|
|
|
|
v
|
|
Service Bus -> Functions (processing)
|
|
```
|
|
|
|
**API-First Pattern**
|
|
```
|
|
API Management
|
|
|
|
|
├── Function App 1 (auth)
|
|
├── Function App 2 (business logic)
|
|
└── Function App 3 (data access)
|
|
```
|
|
|
|
### Microservices on Azure
|
|
|
|
**AKS-Based**
|
|
```
|
|
Azure Front Door
|
|
|
|
|
v
|
|
Application Gateway + WAF
|
|
|
|
|
v
|
|
AKS (multiple microservices)
|
|
|
|
|
├── Cosmos DB (microservice A)
|
|
├── SQL Database (microservice B)
|
|
└── Service Bus (async communication)
|
|
```
|
|
|
|
**Container Apps Pattern**
|
|
```
|
|
Azure Container Apps
|
|
├── Dapr for state management
|
|
├── KEDA for event-driven scaling
|
|
└── Azure Monitor for observability
|
|
```
|
|
|
|
### Data Platform
|
|
|
|
```
|
|
Data Sources
|
|
|
|
|
v
|
|
Event Hubs / IoT Hub
|
|
|
|
|
v
|
|
Stream Analytics (real-time processing)
|
|
|
|
|
v
|
|
Data Lake Storage Gen2
|
|
|
|
|
v
|
|
Azure Synapse Analytics
|
|
|
|
|
v
|
|
Power BI (visualization)
|
|
```
|
|
|
|
## Landing Zone Design
|
|
|
|
### Enterprise-Scale Landing Zone
|
|
|
|
**Management Group Hierarchy**
|
|
```
|
|
Tenant Root Group
|
|
├── Platform
|
|
│ ├── Management (monitoring, automation)
|
|
│ ├── Connectivity (hub networks, VPN)
|
|
│ └── Identity (domain controllers)
|
|
└── Landing Zones
|
|
├── Corp (internal workloads)
|
|
└── Online (internet-facing workloads)
|
|
```
|
|
|
|
**Network Topology**
|
|
```
|
|
Hub VNet (Connectivity subscription)
|
|
├── Azure Firewall
|
|
├── VPN Gateway
|
|
├── ExpressRoute Gateway
|
|
└── Bastion
|
|
|
|
Spoke VNets (Workload subscriptions)
|
|
├── Production VNet
|
|
├── Staging VNet
|
|
└── Development VNet
|
|
```
|
|
|
|
**Governance**
|
|
- Azure Policy for compliance
|
|
- Management groups for hierarchy
|
|
- RBAC assignments at appropriate scope
|
|
- Resource tags for cost allocation
|
|
- Azure Blueprints for repeatable deployments
|
|
|
|
## Migration Strategies
|
|
|
|
### Azure Migrate
|
|
|
|
1. **Assess**
|
|
- Discovery with Azure Migrate appliance
|
|
- Dependency analysis
|
|
- Performance-based sizing
|
|
- Cost estimation
|
|
|
|
2. **Migrate**
|
|
- Azure Migrate: Server Migration (agentless)
|
|
- Database Migration Service
|
|
- App Service Migration Assistant
|
|
- Data Box for large data transfers
|
|
|
|
3. **Optimize**
|
|
- Right-sizing recommendations
|
|
- Reserved instances
|
|
- Azure Hybrid Benefit
|
|
|
|
### Migration Patterns
|
|
|
|
**Rehost**: Azure Migrate for VMs
|
|
**Replatform**: App Service, Azure SQL Database
|
|
**Refactor**: Container Apps, AKS, Functions
|
|
**Rebuild**: Azure-native services (Cosmos DB, Cognitive Services)
|
|
|
|
## Cost Optimization
|
|
|
|
### Compute Savings
|
|
- Azure Reserved Instances (1-year or 3-year, up to 72% savings)
|
|
- Azure Savings Plans for Compute (up to 65% savings)
|
|
- Spot VMs for fault-tolerant workloads (up to 90% savings)
|
|
- Azure Hybrid Benefit (use existing Windows Server/SQL licenses)
|
|
- Auto-shutdown for dev/test VMs
|
|
|
|
### Storage Savings
|
|
- Blob Storage lifecycle policies (Hot -> Cool -> Archive)
|
|
- Azure Files: Standard tier for general use
|
|
- Managed Disks: Standard SSD instead of Premium if possible
|
|
- Delete unused snapshots and disks
|
|
|
|
### Database Savings
|
|
- Serverless tier for Azure SQL Database
|
|
- Reserved capacity for Cosmos DB
|
|
- DTU model vs vCore (choose based on workload)
|
|
- Pause Azure Synapse when not in use
|
|
|
|
### Monitoring
|
|
- Azure Cost Management + Billing
|
|
- Cost alerts and budgets
|
|
- Azure Advisor recommendations
|
|
- Resource tagging for cost allocation
|
|
|
|
## Disaster Recovery
|
|
|
|
### Azure Site Recovery
|
|
|
|
**VM Replication**
|
|
- Azure to Azure replication
|
|
- On-premises to Azure (VMware, Hyper-V, physical)
|
|
- RPO: 30 seconds to a few minutes
|
|
- Automated failover and failback
|
|
|
|
**Recovery Plans**
|
|
- Multi-tier application recovery
|
|
- Customizable scripts and manual actions
|
|
- Integration with Azure Automation
|
|
|
|
### Backup Strategies
|
|
|
|
**Azure Backup**
|
|
- VM backups (application-consistent)
|
|
- SQL Server and SAP HANA in Azure VMs
|
|
- Azure Files backup
|
|
- Cross-region restore
|
|
|
|
**Database Backup**
|
|
- SQL Database: Automated backups (7-35 days)
|
|
- Cosmos DB: Continuous backup (30 days)
|
|
- Long-term retention policies
|
|
|
|
### High Availability
|
|
|
|
**RTO/RPO Targets**
|
|
- Active-Active: Multi-region with Traffic Manager (near-zero)
|
|
- Active-Passive: Geo-replication with failover (minutes)
|
|
- Backup and Restore: Azure Backup (hours)
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Azure Monitor
|
|
|
|
**Components**
|
|
- Metrics: Time-series data (1-minute resolution)
|
|
- Logs: Log Analytics workspace for queries (KQL)
|
|
- Alerts: Metric, log, and activity log alerts
|
|
- Dashboards: Custom visualizations
|
|
|
|
**Application Insights**
|
|
- APM for web applications
|
|
- Distributed tracing
|
|
- Live Metrics Stream
|
|
- Smart detection and anomaly detection
|
|
- Best practices: Instrument all applications, set up availability tests
|
|
|
|
### Log Analytics
|
|
|
|
**KQL Queries**
|
|
```kusto
|
|
// Performance analysis
|
|
Perf
|
|
| where CounterName == "% Processor Time"
|
|
| summarize avg(CounterValue) by bin(TimeGenerated, 5m), Computer
|
|
| render timechart
|
|
|
|
// Failed requests
|
|
requests
|
|
| where success == false
|
|
| summarize count() by resultCode, bin(timestamp, 1h)
|
|
```
|
|
|
|
**Workbooks**
|
|
- Interactive reports
|
|
- Parameterized queries
|
|
- Combining metrics and logs
|
|
|
|
## Identity and Access
|
|
|
|
### Azure AD Best Practices
|
|
|
|
- Enable MFA for all users
|
|
- Use Conditional Access policies
|
|
- Implement Privileged Identity Management (PIM)
|
|
- Regular access reviews
|
|
- Break-glass accounts
|
|
|
|
### RBAC Design
|
|
|
|
**Built-in Roles**
|
|
- Owner: Full access including RBAC
|
|
- Contributor: Full access except RBAC
|
|
- Reader: Read-only access
|
|
- Custom roles for specific needs
|
|
|
|
**Scope Hierarchy**
|
|
```
|
|
Management Group (highest)
|
|
|
|
|
Subscription
|
|
|
|
|
Resource Group
|
|
|
|
|
Resource (lowest)
|
|
```
|
|
|
|
Best practices: Assign at highest appropriate scope, use groups not individual users, apply least privilege
|