Skip to content

Event-Driven Metrics Platform

Overview

The Event-Driven Metrics Platform provides a unified, cost-effective solution for collecting, processing, and analyzing business metrics across the AgentHub distributed application ecosystem. This platform combines business events from microservices, frontend usage analytics from Matomo, and error tracking from Sentry into a cohesive analytics layer accessible through Power BI.

Key Benefits

  • Schema Independence: No direct database coupling, enabling schema evolution
  • Cost Optimized: ~$350/month for complete solution using serverless technologies
  • Scalable: Handles 100x growth (1K to 100K events/day) without redesign
  • Unified Analytics: Combines multiple data sources in a single reporting layer

Architecture Principles

1. Contract-Based Integration

All data exchange happens through well-defined contracts (schemas) rather than direct database access. This ensures: - Services can evolve their internal schemas independently - Clear boundaries between domains - Versioning support for gradual migrations

2. Event-First Design

Business events are the primary source of truth for metrics: - Events capture business state changes - Immutable event log provides audit trail - Event replay enables historical analysis

3. Cost-Conscious Technology Selection

Every technology choice prioritizes cost-effectiveness: - Serverless compute (pay-per-use) - Tiered storage (hot/cool/archive) - Batch processing over real-time where appropriate

4. Separation of Concerns

Clear separation between: - Ingestion: Event Hub with automatic capture - Storage: Azure Data Lake Gen2 with Delta Lake - Processing: Synapse Serverless SQL - Serving: Power BI with Direct Lake mode

High-Level Architecture

graph TB
    subgraph "Event Sources"
        S1[Service 1<br/>Business Events]
        S2[Service 2<br/>Business Events]
        S3[Service N<br/>Business Events]
        FE[Angular Frontend<br/>User Actions]
        MA[Matomo Cloud<br/>Usage Analytics]
        SE[Sentry<br/>Error Tracking]
    end

    subgraph "Event Ingestion Layer"
        EH1[Event Hub 1<br/>Domain A]
        EH2[Event Hub 2<br/>Domain B]
        EHN[Event Hub N<br/>Domain N]
        EHC[Event Hub Capture<br/>Automatic Persistence]
    end

    subgraph "Storage Layer"
        ADLS[Azure Data Lake Gen2<br/>Raw Events<br/>~$23/TB/month]
        COLD[Cool/Archive Tier<br/>Historical Data<br/>~$10/TB/month]
    end

    subgraph "Processing Layer"
        ADF[Azure Data Factory<br/>Orchestration<br/>~$150/month]
        SYN[Synapse Serverless<br/>SQL Queries<br/>$5/TB scanned]
        FUNC[Azure Functions<br/>Event Processing<br/>Consumption Plan]
    end

    subgraph "Serving Layer"
        DL[Delta Lake Tables<br/>ACID Transactions]
        VIEWS[Materialized Views<br/>Pre-aggregated Metrics]
    end

    subgraph "Analytics Layer"
        PBI[Power BI<br/>Reports & Dashboards]
        API1[Matomo API<br/>Direct Integration]
        API2[Sentry API<br/>Direct Integration]
    end

    S1 --> EH1
    S2 --> EH2
    S3 --> EHN

    EH1 --> EHC
    EH2 --> EHC
    EHN --> EHC

    EHC --> ADLS

    ADLS --> ADF
    ADF --> SYN
    SYN --> DL
    DL --> VIEWS

    ADLS --> COLD

    FUNC -.-> ADLS

    VIEWS --> PBI
    MA --> API1
    SE --> API2
    API1 --> PBI
    API2 --> PBI
    FE --> MA

    style ADLS fill:#90EE90
    style ADF fill:#87CEEB
    style PBI fill:#FFD700
    style SYN fill:#DDA0DD

Data Flow Patterns

Event Flow

sequenceDiagram
    participant Service
    participant EventHub
    participant Capture
    participant ADLS
    participant ADF
    participant Synapse
    participant PowerBI

    Service->>EventHub: Publish Business Event
    EventHub->>Capture: Auto-capture (5min intervals)
    Capture->>ADLS: Write Avro/Parquet files

    Note over ADF: Daily/Hourly Schedule
    ADF->>ADLS: Read raw events
    ADF->>Synapse: Transform via SQL
    Synapse->>ADLS: Write aggregated metrics

    PowerBI->>Synapse: Query metrics
    PowerBI->>ADLS: Direct Lake access

Data Lake Organization

graph LR
    subgraph "Data Lake Structure"
        RAW[/raw/<br/>Landing Zone]
        BRONZE[/bronze/<br/>Validated Events]
        SILVER[/silver/<br/>Cleansed & Enriched]
        GOLD[/gold/<br/>Business Metrics]
    end

    subgraph "File Organization"
        YEAR[/2024/]
        MONTH[/01/]
        DAY[/15/]
        HOUR[/14/]
        FILE[events.parquet]
    end

    RAW --> BRONZE
    BRONZE --> SILVER
    SILVER --> GOLD

    RAW --> YEAR
    YEAR --> MONTH
    MONTH --> DAY
    DAY --> HOUR
    HOUR --> FILE

Technology Stack

Component Technology Purpose Monthly Cost
Event Streaming Azure Event Hub Event ingestion and distribution ~$150-200
Data Capture Event Hub Capture Automatic event persistence ~$3
Object Storage Azure Data Lake Gen2 Scalable data storage ~$11
Orchestration Azure Data Factory Pipeline orchestration ~$150
Query Engine Synapse Serverless SQL analytics ~$5
Event Processing Azure Functions Real-time processing ~$5
Analytics Power BI Reporting and dashboards Existing
Usage Analytics Matomo Cloud Frontend analytics Existing
Error Tracking Sentry Application monitoring Existing

Event Schema Contract

All business events follow a standardized contract to ensure consistency:

{
  "eventId": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T14:30:00Z",
  "aggregateId": "order-123",
  "aggregateType": "Order",
  "eventType": "OrderCreated",
  "version": "1.0",
  "metadata": {
    "correlationId": "660e8400-e29b-41d4-a716-446655440000",
    "causationId": "770e8400-e29b-41d4-a716-446655440000",
    "userId": "user-456",
    "source": "OrderService",
    "partitionKey": "order-123"
  },
  "payload": {
    // Business-specific data
    "orderId": "order-123",
    "customerId": "customer-789",
    "totalAmount": 299.99,
    "currency": "USD",
    "items": [...]
  }
}

Cost Optimization Strategy

Storage Tiering

  • Hot Tier: Last 30 days of events for active querying
  • Cool Tier: 30-90 days for occasional access
  • Archive Tier: >90 days for compliance/audit

Compute Optimization

  • Serverless SQL: Pay only for queries executed
  • Consumption Functions: Pay per execution
  • Scheduled Batches: Process during off-peak hours

Data Optimization

  • Partitioning: By date/hour to minimize scan costs
  • Compression: Parquet format with Snappy compression
  • Aggregation: Pre-compute common metrics

Implementation Roadmap

Phase 1: Foundation (Weeks 1-2)

  • [ ] Set up Event Hub namespaces and capture
  • [ ] Configure Azure Data Lake Gen2
  • [ ] Implement basic event schema

Phase 2: Ingestion (Weeks 3-4)

  • [ ] Connect services to Event Hubs
  • [ ] Verify capture to Data Lake
  • [ ] Implement schema validation

Phase 3: Processing (Weeks 5-6)

  • [ ] Create Synapse workspace
  • [ ] Develop SQL transformation queries
  • [ ] Set up Azure Data Factory pipelines

Phase 4: Analytics (Weeks 7-8)

  • [ ] Configure Power BI datasets
  • [ ] Integrate Matomo and Sentry APIs
  • [ ] Build initial dashboards

Key Decisions

Decision Choice Rationale
Storage Azure Data Lake Gen2 10x cheaper than SQL Database, schema flexibility
Compute Synapse Serverless No idle costs, pay-per-query model
Ingestion Event Hub Capture Zero-code solution, guaranteed delivery
Format Delta Lake ACID transactions, time travel, Power BI optimization
Processing Batch over Stream Cost reduction, acceptable latency

Success Metrics

  • Cost Efficiency: <$400/month total infrastructure cost
  • Data Freshness: <1 hour latency for batch processing
  • Scalability: Support 100K events/day without redesign
  • Reliability: 99.9% data capture success rate
  • Query Performance: <5 seconds for dashboard refresh

Next Steps

  1. Review Event Ingestion for detailed Event Hub configuration
  2. Explore Data Storage for Data Lake setup
  3. Understand Data Processing for transformation pipelines
  4. Configure Power BI Integration for reporting
  5. Optimize with Cost Optimization strategies
  6. Follow the Implementation Guide for deployment