15 KiB
Azure Architecture Reference
Comprehensive guide for Azure services, patterns, and Cloud Adoption Framework implementation.
Cloud Adoption Framework
Framework Phases
-
Strategy
- Define business justification
- Expected business outcomes
- Business case development
- First project prioritization
-
Plan
- Digital estate assessment
- Initial organization alignment
- Skills readiness plan
- Cloud adoption plan
-
Ready
- Azure landing zone setup
- Azure setup guide
- Migration readiness
- Best practices validation
-
Adopt (Migrate + Innovate)
- Migration: Assess, migrate, optimize
- Innovate: Build cloud-native solutions
- Best practices and patterns
-
Govern
- Methodology for governance
- Governance benchmark
- Initial governance foundation
- Mature governance evolution
-
Manage
- Business commitments
- Operations baseline
- Platform and workload specialization
Azure Well-Architected Framework
Five Pillars
-
Cost Optimization
- Azure Cost Management and Billing
- Reserved instances and Savings Plans
- Azure Hybrid Benefit
- Auto-scaling and right-sizing
-
Operational Excellence
- Infrastructure as Code (ARM, Bicep, Terraform)
- Azure DevOps and GitHub Actions
- Azure Monitor and Application Insights
- Deployment slots and blue-green deployments
-
Performance Efficiency
- Azure CDN and Front Door
- Auto-scaling (VMSS, App Service)
- Caching (Redis, CDN)
- Performance diagnostics
-
Reliability
- Availability Zones and regions
- Azure Site Recovery
- Load Balancer and Traffic Manager
- Backup and disaster recovery
-
Security
- Azure AD (Entra ID)
- Network Security Groups and Firewalls
- Azure Key Vault
- Microsoft Defender for Cloud
Core Services Architecture
Compute
Virtual Machines
- VM sizes: General (D-series), Compute (F-series), Memory (E-series), GPU (N-series)
- Availability Sets (99.95% SLA)
- Availability Zones (99.99% SLA)
- VM Scale Sets for auto-scaling
- Best practices: Use managed disks, enable accelerated networking, use proximity placement groups
App Service
- Web Apps, API Apps, Mobile Apps
- Deployment slots for staging
- Auto-scaling based on metrics or schedule
- Supports .NET, Java, Node.js, Python, PHP, Ruby
- Best practices: Use deployment slots, enable auto-scaling, use App Service Plan efficiently
Azure Functions
- Consumption Plan (serverless)
- Premium Plan (VNet integration, no cold start)
- Dedicated Plan (App Service Plan)
- Durable Functions for orchestration
- Best practices: Keep functions small, use Premium for production, implement retry policies
Azure Kubernetes Service (AKS)
- Managed Kubernetes control plane
- Azure CNI or kubenet networking
- Azure AD integration
- Virtual nodes (Azure Container Instances)
- Best practices: Use system node pools, enable autoscaling, implement network policies
Container Instances
- Serverless containers
- Fast startup without infrastructure management
- Best for batch jobs and burstable workloads
Azure Batch
- Large-scale parallel and HPC workloads
- Auto-scaling compute nodes
- Task scheduling and dependencies
Storage
Blob Storage
- Storage tiers: Hot, Cool, Archive
- Access tiers: Premium, Standard
- Lifecycle management policies
- Immutable storage for compliance
- Best practices: Use lifecycle policies, enable soft delete, implement versioning
Azure Files
- SMB and NFS file shares
- Integration with Azure File Sync
- Premium tier for high performance
- Best practices: Use Premium for databases, implement snapshots
Disk Storage
- Managed Disks: Premium SSD, Standard SSD, Standard HDD, Ultra Disk
- Disk encryption with Azure Disk Encryption
- Snapshots and incremental backups
- Best practices: Use Premium SSD for production, enable encryption
Data Lake Storage Gen2
- Hierarchical namespace for big data
- Built on Blob Storage
- Integration with Azure Synapse and Databricks
- Best practices: Enable hierarchical namespace, use lifecycle policies
Azure NetApp Files
- Enterprise-grade NFS and SMB shares
- High performance and low latency
- Snapshots and data protection
Database
Azure SQL Database
- Serverless and provisioned compute
- Hyperscale for up to 100TB
- Elastic pools for multiple databases
- Auto-tuning and intelligent insights
- Best practices: Use serverless for dev/test, enable geo-replication
Azure SQL Managed Instance
- Near 100% compatibility with SQL Server
- VNet integration for isolation
- Native virtual network implementation
- Best practices: Use for lift-and-shift migrations
Cosmos DB
- Multi-model NoSQL database
- Global distribution with multi-master
- Consistency levels: Strong, Bounded staleness, Session, Consistent prefix, Eventual
- APIs: SQL, MongoDB, Cassandra, Gremlin, Table
- Best practices: Choose appropriate consistency, partition key design critical
Azure Database for PostgreSQL/MySQL/MariaDB
- Flexible Server (newer) vs Single Server (legacy)
- High availability with zone redundancy
- Read replicas for scaling
- Best practices: Use Flexible Server, enable HA, implement connection pooling
Azure Cache for Redis
- In-memory caching
- Clustering for scalability
- Geo-replication for disaster recovery
- Best practices: Use Premium tier for production, enable persistence
Networking
Virtual Network (VNet)
- CIDR planning (avoid overlaps)
- Subnets with Network Security Groups
- Service endpoints and Private Link
- VNet peering for connectivity
- Best practices: Plan IP address space, use NSGs, implement Private Link
Azure Load Balancer
- Layer 4 load balancing
- Standard SKU (zone-redundant, SLA)
- Health probes and distribution algorithms
- Best practices: Use Standard SKU, configure health probes
Application Gateway
- Layer 7 load balancing
- WAF (Web Application Firewall)
- URL-based routing and SSL termination
- Best practices: Enable WAF, use autoscaling
Azure Front Door
- Global load balancing and CDN
- WAF at edge
- Anycast for low latency
- Best practices: Use for global applications, enable caching
VPN Gateway and ExpressRoute
- Site-to-Site VPN for encrypted connectivity
- ExpressRoute for private, dedicated connection
- Virtual WAN for global transit network
- Best practices: Use ExpressRoute for production, implement redundancy
Azure Firewall
- Managed firewall service
- Application and network rules
- Threat intelligence
- Best practices: Use in hub-spoke topology, enable DNS proxy
Azure Private Link
- Private connectivity to Azure services
- No public internet exposure
- Available for PaaS services
- Best practices: Use for all PaaS services in production
Security and Identity
Azure Active Directory (Microsoft Entra ID)
- Identity and access management
- Conditional Access policies
- Multi-factor authentication
- B2B and B2C scenarios
- Best practices: Enable MFA, use Conditional Access, implement PIM
Azure Key Vault
- Secrets, keys, and certificates management
- Hardware Security Module (HSM) backed
- Soft delete and purge protection
- Best practices: Enable soft delete, use RBAC, implement Private Link
Microsoft Defender for Cloud
- Security posture management
- Threat protection for hybrid workloads
- Regulatory compliance dashboard
- Just-in-time VM access
- Best practices: Enable enhanced security, implement recommendations
Azure Policy
- Governance and compliance at scale
- Built-in and custom policies
- Deny, audit, append effects
- Best practices: Assign at management group level, test before enforce
Azure Sentinel
- Cloud-native SIEM and SOAR
- AI-powered threat detection
- Integration with Microsoft 365, third-party tools
- Best practices: Enable data connectors, create custom analytics rules
Architecture Patterns
High Availability
Zone-Redundant Pattern
Azure Front Door (global)
|
v
Application Gateway (zone-redundant)
|
v
VM Scale Set (across availability zones)
|
v
Azure SQL Database (zone-redundant)
Multi-Region Pattern
Azure Traffic Manager (DNS-based routing)
|
├── Region 1: App Service + SQL Database (primary)
└── Region 2: App Service + SQL Database (geo-replica)
Hub-Spoke Topology
Hub VNet
├── Azure Firewall
├── VPN Gateway
└── Shared Services
|
├── Spoke VNet 1 (Production)
├── Spoke VNet 2 (Development)
└── Spoke VNet 3 (DMZ)
Serverless Architecture
Event-Driven Pattern
Event Grid -> Azure Functions -> Cosmos DB
|
v
Service Bus -> Functions (processing)
API-First Pattern
API Management
|
├── Function App 1 (auth)
├── Function App 2 (business logic)
└── Function App 3 (data access)
Microservices on Azure
AKS-Based
Azure Front Door
|
v
Application Gateway + WAF
|
v
AKS (multiple microservices)
|
├── Cosmos DB (microservice A)
├── SQL Database (microservice B)
└── Service Bus (async communication)
Container Apps Pattern
Azure Container Apps
├── Dapr for state management
├── KEDA for event-driven scaling
└── Azure Monitor for observability
Data Platform
Data Sources
|
v
Event Hubs / IoT Hub
|
v
Stream Analytics (real-time processing)
|
v
Data Lake Storage Gen2
|
v
Azure Synapse Analytics
|
v
Power BI (visualization)
Landing Zone Design
Enterprise-Scale Landing Zone
Management Group Hierarchy
Tenant Root Group
├── Platform
│ ├── Management (monitoring, automation)
│ ├── Connectivity (hub networks, VPN)
│ └── Identity (domain controllers)
└── Landing Zones
├── Corp (internal workloads)
└── Online (internet-facing workloads)
Network Topology
Hub VNet (Connectivity subscription)
├── Azure Firewall
├── VPN Gateway
├── ExpressRoute Gateway
└── Bastion
Spoke VNets (Workload subscriptions)
├── Production VNet
├── Staging VNet
└── Development VNet
Governance
- Azure Policy for compliance
- Management groups for hierarchy
- RBAC assignments at appropriate scope
- Resource tags for cost allocation
- Azure Blueprints for repeatable deployments
Migration Strategies
Azure Migrate
-
Assess
- Discovery with Azure Migrate appliance
- Dependency analysis
- Performance-based sizing
- Cost estimation
-
Migrate
- Azure Migrate: Server Migration (agentless)
- Database Migration Service
- App Service Migration Assistant
- Data Box for large data transfers
-
Optimize
- Right-sizing recommendations
- Reserved instances
- Azure Hybrid Benefit
Migration Patterns
Rehost: Azure Migrate for VMs Replatform: App Service, Azure SQL Database Refactor: Container Apps, AKS, Functions Rebuild: Azure-native services (Cosmos DB, Cognitive Services)
Cost Optimization
Compute Savings
- Azure Reserved Instances (1-year or 3-year, up to 72% savings)
- Azure Savings Plans for Compute (up to 65% savings)
- Spot VMs for fault-tolerant workloads (up to 90% savings)
- Azure Hybrid Benefit (use existing Windows Server/SQL licenses)
- Auto-shutdown for dev/test VMs
Storage Savings
- Blob Storage lifecycle policies (Hot -> Cool -> Archive)
- Azure Files: Standard tier for general use
- Managed Disks: Standard SSD instead of Premium if possible
- Delete unused snapshots and disks
Database Savings
- Serverless tier for Azure SQL Database
- Reserved capacity for Cosmos DB
- DTU model vs vCore (choose based on workload)
- Pause Azure Synapse when not in use
Monitoring
- Azure Cost Management + Billing
- Cost alerts and budgets
- Azure Advisor recommendations
- Resource tagging for cost allocation
Disaster Recovery
Azure Site Recovery
VM Replication
- Azure to Azure replication
- On-premises to Azure (VMware, Hyper-V, physical)
- RPO: 30 seconds to a few minutes
- Automated failover and failback
Recovery Plans
- Multi-tier application recovery
- Customizable scripts and manual actions
- Integration with Azure Automation
Backup Strategies
Azure Backup
- VM backups (application-consistent)
- SQL Server and SAP HANA in Azure VMs
- Azure Files backup
- Cross-region restore
Database Backup
- SQL Database: Automated backups (7-35 days)
- Cosmos DB: Continuous backup (30 days)
- Long-term retention policies
High Availability
RTO/RPO Targets
- Active-Active: Multi-region with Traffic Manager (near-zero)
- Active-Passive: Geo-replication with failover (minutes)
- Backup and Restore: Azure Backup (hours)
Monitoring and Observability
Azure Monitor
Components
- Metrics: Time-series data (1-minute resolution)
- Logs: Log Analytics workspace for queries (KQL)
- Alerts: Metric, log, and activity log alerts
- Dashboards: Custom visualizations
Application Insights
- APM for web applications
- Distributed tracing
- Live Metrics Stream
- Smart detection and anomaly detection
- Best practices: Instrument all applications, set up availability tests
Log Analytics
KQL Queries
// Performance analysis
Perf
| where CounterName == "% Processor Time"
| summarize avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| render timechart
// Failed requests
requests
| where success == false
| summarize count() by resultCode, bin(timestamp, 1h)
Workbooks
- Interactive reports
- Parameterized queries
- Combining metrics and logs
Identity and Access
Azure AD Best Practices
- Enable MFA for all users
- Use Conditional Access policies
- Implement Privileged Identity Management (PIM)
- Regular access reviews
- Break-glass accounts
RBAC Design
Built-in Roles
- Owner: Full access including RBAC
- Contributor: Full access except RBAC
- Reader: Read-only access
- Custom roles for specific needs
Scope Hierarchy
Management Group (highest)
|
Subscription
|
Resource Group
|
Resource (lowest)
Best practices: Assign at highest appropriate scope, use groups not individual users, apply least privilege