Event-Driven Metrics Platform¶

Overview¶

The Event-Driven Metrics Platform provides a unified, cost-effective solution for collecting, processing, and analyzing business metrics across the AgentHub distributed application ecosystem. This platform combines business events from microservices, frontend usage analytics from Matomo, and error tracking from Sentry into a cohesive analytics layer accessible through Power BI.

Key Benefits

Schema Independence: No direct database coupling, enabling schema evolution
Cost Optimized: ~$350/month for complete solution using serverless technologies
Scalable: Handles 100x growth (1K to 100K events/day) without redesign
Unified Analytics: Combines multiple data sources in a single reporting layer

Architecture Principles¶

1. Contract-Based Integration¶

All data exchange happens through well-defined contracts (schemas) rather than direct database access. This ensures: - Services can evolve their internal schemas independently - Clear boundaries between domains - Versioning support for gradual migrations

2. Event-First Design¶

Business events are the primary source of truth for metrics: - Events capture business state changes - Immutable event log provides audit trail - Event replay enables historical analysis

3. Cost-Conscious Technology Selection¶

Every technology choice prioritizes cost-effectiveness: - Serverless compute (pay-per-use) - Tiered storage (hot/cool/archive) - Batch processing over real-time where appropriate

4. Separation of Concerns¶

Clear separation between: - Ingestion: Event Hub with automatic capture - Storage: Azure Data Lake Gen2 with Delta Lake - Processing: Synapse Serverless SQL - Serving: Power BI with Direct Lake mode

High-Level Architecture¶

graph TB
    subgraph "Event Sources"
        S1[Service 1<br/>Business Events]
        S2[Service 2<br/>Business Events]
        S3[Service N<br/>Business Events]
        FE[Angular Frontend<br/>User Actions]
        MA[Matomo Cloud<br/>Usage Analytics]
        SE[Sentry<br/>Error Tracking]
    end

    subgraph "Event Ingestion Layer"
        EH1[Event Hub 1<br/>Domain A]
        EH2[Event Hub 2<br/>Domain B]
        EHN[Event Hub N<br/>Domain N]
        EHC[Event Hub Capture<br/>Automatic Persistence]
    end

    subgraph "Storage Layer"
        ADLS[Azure Data Lake Gen2<br/>Raw Events<br/>~$23/TB/month]
        COLD[Cool/Archive Tier<br/>Historical Data<br/>~$10/TB/month]
    end

    subgraph "Processing Layer"
        ADF[Azure Data Factory<br/>Orchestration<br/>~$150/month]
        SYN[Synapse Serverless<br/>SQL Queries<br/>$5/TB scanned]
        FUNC[Azure Functions<br/>Event Processing<br/>Consumption Plan]
    end

    subgraph "Serving Layer"
        DL[Delta Lake Tables<br/>ACID Transactions]
        VIEWS[Materialized Views<br/>Pre-aggregated Metrics]
    end

    subgraph "Analytics Layer"
        PBI[Power BI<br/>Reports & Dashboards]
        API1[Matomo API<br/>Direct Integration]
        API2[Sentry API<br/>Direct Integration]
    end

    S1 --> EH1
    S2 --> EH2
    S3 --> EHN

    EH1 --> EHC
    EH2 --> EHC
    EHN --> EHC

    EHC --> ADLS

    ADLS --> ADF
    ADF --> SYN
    SYN --> DL
    DL --> VIEWS

    ADLS --> COLD

    FUNC -.-> ADLS

    VIEWS --> PBI
    MA --> API1
    SE --> API2
    API1 --> PBI
    API2 --> PBI
    FE --> MA

    style ADLS fill:#90EE90
    style ADF fill:#87CEEB
    style PBI fill:#FFD700
    style SYN fill:#DDA0DD

Data Flow Patterns¶

Event Flow¶

sequenceDiagram
    participant Service
    participant EventHub
    participant Capture
    participant ADLS
    participant ADF
    participant Synapse
    participant PowerBI

    Service->>EventHub: Publish Business Event
    EventHub->>Capture: Auto-capture (5min intervals)
    Capture->>ADLS: Write Avro/Parquet files

    Note over ADF: Daily/Hourly Schedule
    ADF->>ADLS: Read raw events
    ADF->>Synapse: Transform via SQL
    Synapse->>ADLS: Write aggregated metrics

    PowerBI->>Synapse: Query metrics
    PowerBI->>ADLS: Direct Lake access

Data Lake Organization¶

graph LR
    subgraph "Data Lake Structure"
        RAW[/raw/<br/>Landing Zone]
        BRONZE[/bronze/<br/>Validated Events]
        SILVER[/silver/<br/>Cleansed & Enriched]
        GOLD[/gold/<br/>Business Metrics]
    end

    subgraph "File Organization"
        YEAR[/2024/]
        MONTH[/01/]
        DAY[/15/]
        HOUR[/14/]
        FILE[events.parquet]
    end

    RAW --> BRONZE
    BRONZE --> SILVER
    SILVER --> GOLD

    RAW --> YEAR
    YEAR --> MONTH
    MONTH --> DAY
    DAY --> HOUR
    HOUR --> FILE

Technology Stack¶

Component	Technology	Purpose	Monthly Cost
Event Streaming	Azure Event Hub	Event ingestion and distribution	~$150-200
Data Capture	Event Hub Capture	Automatic event persistence	~$3
Object Storage	Azure Data Lake Gen2	Scalable data storage	~$11
Orchestration	Azure Data Factory	Pipeline orchestration	~$150
Query Engine	Synapse Serverless	SQL analytics	~$5
Event Processing	Azure Functions	Real-time processing	~$5
Analytics	Power BI	Reporting and dashboards	Existing
Usage Analytics	Matomo Cloud	Frontend analytics	Existing
Error Tracking	Sentry	Application monitoring	Existing

Event Schema Contract¶

All business events follow a standardized contract to ensure consistency:

{
  "eventId": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T14:30:00Z",
  "aggregateId": "order-123",
  "aggregateType": "Order",
  "eventType": "OrderCreated",
  "version": "1.0",
  "metadata": {
    "correlationId": "660e8400-e29b-41d4-a716-446655440000",
    "causationId": "770e8400-e29b-41d4-a716-446655440000",
    "userId": "user-456",
    "source": "OrderService",
    "partitionKey": "order-123"
  },
  "payload": {
    // Business-specific data
    "orderId": "order-123",
    "customerId": "customer-789",
    "totalAmount": 299.99,
    "currency": "USD",
    "items": [...]
  }
}

Cost Optimization Strategy¶

Storage Tiering¶

Hot Tier: Last 30 days of events for active querying
Cool Tier: 30-90 days for occasional access
Archive Tier: >90 days for compliance/audit

Compute Optimization¶

Serverless SQL: Pay only for queries executed
Consumption Functions: Pay per execution
Scheduled Batches: Process during off-peak hours

Data Optimization¶

Partitioning: By date/hour to minimize scan costs
Compression: Parquet format with Snappy compression
Aggregation: Pre-compute common metrics

Implementation Roadmap¶

Phase 1: Foundation (Weeks 1-2)¶

[ ] Set up Event Hub namespaces and capture
[ ] Configure Azure Data Lake Gen2
[ ] Implement basic event schema

Phase 2: Ingestion (Weeks 3-4)¶

[ ] Connect services to Event Hubs
[ ] Verify capture to Data Lake
[ ] Implement schema validation

Phase 3: Processing (Weeks 5-6)¶

[ ] Create Synapse workspace
[ ] Develop SQL transformation queries
[ ] Set up Azure Data Factory pipelines

Phase 4: Analytics (Weeks 7-8)¶

[ ] Configure Power BI datasets
[ ] Integrate Matomo and Sentry APIs
[ ] Build initial dashboards

Key Decisions¶

Decision	Choice	Rationale
Storage	Azure Data Lake Gen2	10x cheaper than SQL Database, schema flexibility
Compute	Synapse Serverless	No idle costs, pay-per-query model
Ingestion	Event Hub Capture	Zero-code solution, guaranteed delivery
Format	Delta Lake	ACID transactions, time travel, Power BI optimization
Processing	Batch over Stream	Cost reduction, acceptable latency

Success Metrics¶

Cost Efficiency: <$400/month total infrastructure cost
Data Freshness: <1 hour latency for batch processing
Scalability: Support 100K events/day without redesign
Reliability: 99.9% data capture success rate
Query Performance: <5 seconds for dashboard refresh

Next Steps¶

Review Event Ingestion for detailed Event Hub configuration
Explore Data Storage for Data Lake setup
Understand Data Processing for transformation pipelines
Configure Power BI Integration for reporting
Optimize with Cost Optimization strategies
Follow the Implementation Guide for deployment