Cost Optimization Strategy¶
Overview¶
This document provides comprehensive cost optimization strategies for the metrics platform, targeting a monthly budget of <$400 while maintaining scalability and performance. The focus is on leveraging Azure's consumption-based pricing and implementing intelligent resource management.
Cost Targets
- Total Monthly Cost: <$400 (current scale: 1K events/day)
- Scalability Cost: <$1200 at 100K events/day
- Cost per Event: <$0.0004 at scale
- ROI Target: 10x value from analytics insights
Current Cost Breakdown¶
Detailed Monthly Costs (1K events/day)¶
pie title Monthly Cost Distribution ($375 total)
"Event Hubs" : 180
"Data Factory" : 150
"Storage (ADLS)" : 15
"Synapse Serverless" : 10
"Functions" : 5
"Monitoring" : 10
"Network" : 5
Component | Configuration | Monthly Cost | Cost Driver | Optimization Potential |
---|---|---|---|---|
Event Hubs | 7 namespaces, Standard tier | $180 | Fixed throughput units | High - Auto-scale, Basic tier |
Data Factory | 3 pipelines, hourly execution | $150 | Pipeline orchestration | Medium - Reduce frequency |
ADLS Gen2 | 2GB hot, 10GB cool, 100GB archive | $15 | Storage and transactions | Low - Already optimized |
Synapse Serverless | 50GB scanned/month | $10 | Query volume | Medium - Query optimization |
Azure Functions | 10K executions/month | $5 | Consumption plan | Low - Already optimized |
Monitoring | Log Analytics, alerts | $10 | Log retention | Low - Retention tuning |
Network | Data egress, VNet | $5 | Data transfer | Low - Regional placement |
Cost Scaling Projections¶
graph LR
subgraph "Cost Scaling (Monthly)"
C1K[1K events/day<br/>$375/month]
C10K[10K events/day<br/>$450/month]
C50K[50K events/day<br/>$750/month]
C100K[100K events/day<br/>$1,200/month]
end
subgraph "Per-Event Cost"
E1K[$0.012/event]
E10K[$0.0015/event]
E50K[$0.0005/event]
E100K[$0.0004/event]
end
C1K --> C10K
C10K --> C50K
C50K --> C100K
C1K --> E1K
C10K --> E10K
C50K --> E50K
C100K --> E100K
Event Hubs Optimization¶
Auto-Scaling Configuration¶
Current state uses fixed throughput units (TU). Implement auto-scaling to reduce costs during low-traffic periods.
resource "azurerm_eventhub_namespace" "optimized" {
name = "crm-events-namespace"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
sku = "Standard"
capacity = 1 # Start with 1 TU
# Enable auto-inflate to handle traffic spikes
auto_inflate_enabled = true
maximum_throughput_units = 5 # Scale up to 5 TU when needed
tags = {
Environment = "Production"
CostCenter = "Analytics"
}
}
Potential Savings: $50-80/month (30-45% reduction)
Traffic Pattern Analysis¶
// Analyze Event Hub usage patterns
EventHubMetrics
| where TimeGenerated >= ago(30d)
| where MetricName == "IncomingMessages"
| summarize
HourlyMessages = avg(Total),
PeakMessages = max(Total),
MinMessages = min(Total)
by bin(TimeGenerated, 1h)
| extend Hour = hourofday(TimeGenerated)
| summarize
AvgHourly = avg(HourlyMessages),
PeakHourly = max(HourlyMessages)
by Hour
| order by Hour
Basic Tier Evaluation¶
For non-critical event hubs, consider Basic tier:
Feature | Basic | Standard | Savings |
---|---|---|---|
Throughput Units | 1 TU | 1-20 TU | - |
Message Retention | 1 day | 7 days | - |
Consumer Groups | 1 | 20 | - |
Partitions | 2 | 32 | - |
Cost per TU | $11.16 | $25.95 | 57% less |
# Cost comparison for 7 namespaces
# Standard: 7 × $25.95 = $181.65/month
# Basic: 7 × $11.16 = $78.12/month
# Savings: $103.53/month (57%)
Data Factory Optimization¶
Execution Frequency Optimization¶
Current hourly execution may be excessive for business needs. Analyze data freshness requirements:
graph TD
subgraph "Current Schedule"
H1[00:00 - Process]
H2[01:00 - Process]
H3[02:00 - Process]
H24[23:00 - Process]
end
subgraph "Optimized Schedule"
B1[06:00 - Business Hours Start]
B2[12:00 - Midday Update]
B3[18:00 - End of Day]
B4[23:00 - Daily Rollup]
end
subgraph "Savings"
S1[Executions: 24 → 4]
S2[Cost Reduction: 83%]
S3[Monthly Savings: ~$125]
end
Pipeline Consolidation¶
{
"name": "ConsolidatedProcessing",
"properties": {
"activities": [
{
"name": "ProcessAllDomains",
"type": "ForEach",
"typeProperties": {
"items": {
"value": ["CRM", "Matching", "Profile", "Assessment"],
"type": "Expression"
},
"batchCount": 4,
"activities": [
{
"name": "ProcessDomain",
"type": "SqlServerStoredProcedure",
"typeProperties": {
"storedProcedureName": "ProcessDomainEvents",
"storedProcedureParameters": {
"DomainName": "@item()"
}
}
}
]
}
}
]
}
}
Potential Savings: $125/month (83% reduction in executions)
Storage Optimization¶
Lifecycle Policy Tuning¶
Fine-tune storage lifecycle policies for optimal cost/performance:
{
"rules": [
{
"name": "aggressive-archival",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["raw-events/"]
},
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterModificationGreaterThan": 7
},
"tierToArchive": {
"daysAfterModificationGreaterThan": 30
},
"delete": {
"daysAfterModificationGreaterThan": 2555
}
}
}
}
},
{
"name": "bronze-optimization",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": {
"prefixMatch": ["bronze/"]
},
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterModificationGreaterThan": 14
},
"tierToArchive": {
"daysAfterModificationGreaterThan": 60
}
}
}
}
}
]
}
Storage Analytics¶
// Analyze storage cost distribution
StorageAccount
| where TimeGenerated >= ago(30d)
| where MetricName == "UsedCapacity"
| extend StorageTier = case(
BlobType == "BlockBlob" and AccessTier == "Hot", "Hot",
BlobType == "BlockBlob" and AccessTier == "Cool", "Cool",
BlobType == "BlockBlob" and AccessTier == "Archive", "Archive",
"Unknown"
)
| summarize
AvgSize_GB = avg(Average) / 1024 / 1024 / 1024,
MaxSize_GB = max(Average) / 1024 / 1024 / 1024
by StorageTier, bin(TimeGenerated, 1d)
| extend EstimatedCost = case(
StorageTier == "Hot", AvgSize_GB * 0.0208,
StorageTier == "Cool", AvgSize_GB * 0.01,
StorageTier == "Archive", AvgSize_GB * 0.00099,
0
)
Synapse Serverless Optimization¶
Query Optimization Strategies¶
1. Partition Pruning¶
-- Optimized query with partition elimination
SELECT
aggregate_type,
COUNT(*) as event_count,
SUM(CAST(JSON_VALUE(payload, '$.value') AS FLOAT)) as total_value
FROM silver.business_events
WHERE year = 2024
AND month = 1 -- Partition pruning reduces scan by 92%
AND day >= 15
AND aggregate_type IN ('Contact', 'Match') -- Predicate pushdown
GROUP BY aggregate_type;
-- Before optimization: Scans 12 months = 100GB
-- After optimization: Scans 16 days = 5GB
-- Cost savings: 95% reduction in scan costs
2. Column Pruning¶
-- Only select needed columns to reduce I/O
SELECT
event_id,
timestamp,
aggregate_type,
JSON_VALUE(payload, '$.status') as status -- Extract only needed JSON fields
FROM silver.business_events
WHERE year = 2024 AND month = 1
-- Don't select entire payload column (saves 60% scan volume)
3. Query Result Caching¶
Enable result caching for frequently executed queries:
-- Enable query result cache (automatic in Synapse)
-- Cached results valid for 24 hours
SELECT
aggregate_type,
COUNT(*) as daily_count
FROM silver.business_events
WHERE CAST(timestamp AS DATE) = CAST(GETDATE() AS DATE)
GROUP BY aggregate_type;
Query Cost Analysis:
graph LR
subgraph "Query Optimization Impact"
BEFORE[Before: $50/month<br/>500GB scanned]
AFTER[After: $10/month<br/>100GB scanned]
SAVINGS[Savings: $40/month<br/>80% reduction]
end
BEFORE --> AFTER
AFTER --> SAVINGS
Materialized Views for Common Queries¶
-- Create materialized view for daily aggregations
CREATE MATERIALIZED VIEW gold.daily_summary_mv
AS
SELECT
CAST(timestamp AS DATE) as date,
aggregate_type,
COUNT(*) as event_count,
COUNT(DISTINCT aggregate_id) as unique_entities,
COUNT(DISTINCT user_id) as unique_users
FROM silver.business_events
GROUP BY CAST(timestamp AS DATE), aggregate_type;
-- Auto-refresh materialized view
ALTER MATERIALIZED VIEW gold.daily_summary_mv SET (AUTO_REFRESH = ON);
Resource Scheduling¶
Off-Peak Processing¶
Schedule compute-intensive tasks during off-peak hours for discounted rates:
gantt
title Optimized Processing Schedule
dateFormat HH:mm
axisFormat %H:%M
section Business Hours (Peak Rate)
Light Monitoring :active, monitor, 08:00, 18:00
section Off-Peak Hours (Discounted)
Heavy Processing :process, 01:00, 04:00
Data Archival :archive, 04:00, 05:00
Maintenance :maint, 05:00, 06:00
section Weekend (Lowest Rate)
Full Reprocessing :weekend, 02:00, 08:00
Azure Functions Optimization¶
// Optimize function memory allocation
[FunctionName("ProcessEvents")]
[MemorySize(512)] // Reduced from default 1024MB
[Timeout("00:05:00")] // 5-minute timeout
public static async Task<IActionResult> ProcessEvents(
[HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequest req,
ILogger log)
{
// Use connection pooling
using var connection = connectionPool.Get();
// Batch processing for efficiency
var events = await ReadEventsBatch(req, batchSize: 100);
await ProcessEventsBatch(events);
return new OkObjectResult($"Processed {events.Count} events");
}
Monitoring and Alerting Optimization¶
Cost-Aware Alerting¶
// Alert on unexpected cost increases
AzureBilling
| where TimeGenerated >= ago(7d)
| where ServiceName in ("Event Hubs", "Data Factory", "Storage", "Synapse")
| summarize DailyCost = sum(Cost) by bin(TimeGenerated, 1d), ServiceName
| extend CostChange = DailyCost - prev(DailyCost, 1)
| where CostChange > 10 // Alert if daily cost increases by $10
| project TimeGenerated, ServiceName, DailyCost, CostChange
Optimized Log Retention¶
{
"workspaceCapping": {
"dailyQuotaGb": 1,
"quotaNextResetTime": "2024-01-01T00:00:00Z"
},
"retentionInDays": 30,
"sku": {
"name": "PerGB2018"
}
}
Log Analytics Optimization: - Reduce retention from 90 to 30 days: Save $15/month - Set daily quota to 1GB: Prevent overage charges - Use sampling for non-critical logs: Save $10/month
Development and Testing Cost Management¶
Separate Environments¶
graph TD
subgraph "Production (Optimized)"
PROD_EH[Event Hubs: Standard]
PROD_ADF[Data Factory: Full schedule]
PROD_SYNAPSE[Synapse: Optimized queries]
PROD_COST[$375/month]
end
subgraph "Development (Minimal)"
DEV_EH[Event Hubs: Basic]
DEV_ADF[Data Factory: Manual triggers]
DEV_SYNAPSE[Synapse: Sample data]
DEV_COST[$50/month]
end
subgraph "Testing (Auto-shutdown)"
TEST_EH[Event Hubs: Basic]
TEST_ADF[Data Factory: On-demand]
TEST_SCHEDULE[Auto-shutdown: 18:00-08:00]
TEST_COST[$25/month]
end
Auto-Shutdown for Dev/Test¶
#!/bin/bash
# Auto-shutdown script for development environments
# Stop Event Hub (scale to 0 TU during off-hours)
az eventhubs namespace update \
--name dev-events-namespace \
--resource-group rg-dev \
--capacity 0
# Pause Data Factory triggers
az datafactory trigger stop \
--factory-name dev-data-factory \
--resource-group rg-dev \
--name dev-processing-trigger
# Schedule restart at 8 AM
at 08:00 <<EOF
az eventhubs namespace update --name dev-events-namespace --resource-group rg-dev --capacity 1
az datafactory trigger start --factory-name dev-data-factory --resource-group rg-dev --name dev-processing-trigger
EOF
Reserved Capacity and Savings Plans¶
Azure Reserved Instances¶
For predictable workloads, consider 1-year reserved capacity:
Service | On-Demand | 1-Year Reserved | Savings |
---|---|---|---|
Event Hubs | $25.95/TU/month | $18.17/TU/month | 30% |
Storage | $0.0208/GB | $0.0166/GB | 20% |
Synapse | $5/TB scanned | $4/TB scanned | 20% |
Annual Commitment Analysis:
# Current annual cost: $375 × 12 = $4,500
# With reserved instances: $375 × 0.75 × 12 = $3,375
# Annual savings: $1,125 (25% reduction)
Azure Hybrid Benefit¶
If you have existing SQL Server licenses, apply Azure Hybrid Benefit to Synapse: - Savings: Up to 40% on compute costs - Applicable to: Synapse dedicated SQL pools (if used in future)
Cost Monitoring Dashboard¶
Real-Time Cost Tracking¶
graph TD
subgraph "Cost Monitoring Dashboard"
subgraph "Daily Metrics"
DAILY_COST[Daily Cost<br/>Target: <$12.50]
DAILY_TREND[Cost Trend<br/>7-day average]
BUDGET_ALERT[Budget Alert<br/>80% threshold]
end
subgraph "Service Breakdown"
EH_COST[Event Hubs<br/>$6/day]
ADF_COST[Data Factory<br/>$5/day]
STORAGE_COST[Storage<br/>$0.50/day]
SYNAPSE_COST[Synapse<br/>$0.33/day]
end
subgraph "Optimization Alerts"
UNUSED_RESOURCES[Unused Resources]
SCALE_DOWN[Scale Down Opportunities]
ARCHIVE_READY[Archive Ready Data]
end
end
Power BI Cost Dashboard¶
// Cost tracking measures
DailyCost =
CALCULATE(
SUM(CostTable[Cost]),
CostTable[Date] = TODAY()
)
MonthlyBudgetUtilization =
VAR MonthlyBudget = 400
VAR MonthToDateCost =
CALCULATE(
SUM(CostTable[Cost]),
DATESINPERIOD(CostTable[Date], TODAY(), -1, MONTH)
)
RETURN
DIVIDE(MonthToDateCost, MonthlyBudget, 0)
CostEfficiencyScore =
VAR EventsProcessed = SUM(MetricsTable[EventCount])
VAR TotalCost = [MonthToDateCost]
RETURN
DIVIDE(EventsProcessed, TotalCost, 0) // Events per dollar
// Alert if utilization > 80%
BudgetAlert =
IF(
[MonthlyBudgetUtilization] > 0.8,
"⚠️ Budget Alert: " & FORMAT([MonthlyBudgetUtilization], "0%") & " used",
"✅ Within Budget: " & FORMAT([MonthlyBudgetUtilization], "0%") & " used"
)
Cost Optimization Roadmap¶
Phase 1: Quick Wins (Week 1-2)¶
- [ ] Implement Event Hub auto-scaling
- [ ] Reduce Data Factory execution frequency
- [ ] Optimize storage lifecycle policies
- [ ] Set up cost monitoring alerts
Expected Savings: $100-150/month
Phase 2: Query Optimization (Week 3-4)¶
- [ ] Optimize Synapse queries for partition pruning
- [ ] Implement materialized views
- [ ] Enable query result caching
- [ ] Create cost monitoring dashboard
Expected Savings: $40-60/month
Phase 3: Architecture Refinement (Month 2)¶
- [ ] Evaluate Basic tier Event Hubs
- [ ] Implement reserved capacity
- [ ] Optimize development environments
- [ ] Review and consolidate resources
Expected Savings: $80-120/month
Phase 4: Advanced Optimization (Month 3)¶
- [ ] Implement auto-pause for dev/test
- [ ] Advanced query optimization
- [ ] Review data retention policies
- [ ] Capacity planning for growth
Expected Savings: $50-75/month
Success Metrics¶
Cost KPIs¶
Metric | Current | Target | Tracking |
---|---|---|---|
Monthly Cost | $375 | <$250 | Daily monitoring |
Cost per Event | $0.012 | <$0.008 | Weekly calculation |
Budget Utilization | 94% | <80% | Real-time dashboard |
Waste Ratio | 15% | <10% | Monthly audit |
ROI Measurement¶
graph LR
subgraph "Cost Investment"
INFRA[Infrastructure: $250/month]
OPS[Operations: $50/month]
DEV[Development: $100/month]
TOTAL[Total: $400/month]
end
subgraph "Business Value"
INSIGHTS[Data Insights: $2000/month]
EFFICIENCY[Operational Efficiency: $1500/month]
DECISIONS[Better Decisions: $1000/month]
VALUE[Total Value: $4500/month]
end
subgraph "ROI"
RATIO[ROI Ratio: 11.25x]
PAYBACK[Payback Period: 26 days]
end
TOTAL --> RATIO
VALUE --> RATIO
RATIO --> PAYBACK
Best Practices Summary¶
- Continuous Monitoring
- Set up real-time cost alerts
- Review costs weekly
- Track cost per business metric
-
Monitor unused resources
-
Right-Sizing Resources
- Use auto-scaling where available
- Match resource size to workload
- Regular capacity reviews
-
Eliminate over-provisioning
-
Intelligent Scheduling
- Off-peak processing for batch jobs
- Auto-shutdown for dev/test
- Optimize refresh frequencies
-
Load balance across time zones
-
Data Lifecycle Management
- Aggressive archival policies
- Regular cleanup procedures
- Retention policy optimization
- Delete unnecessary data
With these optimizations, the target monthly cost of <$250 is achievable while maintaining system performance and preparing for future scale.