Skip to content

Annotations Service Metrics

This document details the telemetry metrics exposed by the UpdateAnnotationsInRedisService hosted service. These metrics provide insights into the performance and reliability of the annotation data synchronization into the Redis cache.

The primary meter for these metrics is CRMFacade.Annotations.

Service Goal

This service is responsible for periodically fetching all annotation data from DataVerse and caching it in Redis, making it quickly accessible for related entities.


Metrics

Metric Name Type Description
annotations_sync_total Counter Total number of annotation sync operations attempted. Incremented on success or when skipped.
annotations_sync_errors_total Counter Total number of failed annotation sync operations.
annotations_sync_duration_seconds Histogram The duration, in seconds, of each sync operation.
annotations_last_sync_timestamp Observable Gauge The Unix epoch timestamp of the last successful sync.

Dimensions

These dimensions (tags) can be used to filter and group metric data.

Metric Name Dimension Name Possible Values Description
annotations_sync_total operation success, skipped The outcome of the sync operation.
reason lock_not_acquired The reason an operation was skipped (only present if operation=skipped).
annotations_sync_errors_total operation error Indicates a failed operation.
error_type Exception Name (e.g., RedisException) The type of exception that caused the failure.
annotations_sync_duration_seconds operation success, error The outcome of the operation whose duration was measured.

Example KQL Queries

Here are some example queries you can use in Azure Application Insights to monitor the service.

let all_syncs = customMetrics
    | where name == 'annotations_sync_total'
    | where timestamp > ago(7d)
    | summarize total_count=sum(value) by bin(timestamp, 1h);
let failed_syncs = customMetrics
    | where name == 'annotations_sync_errors_total'
    | where timestamp > ago(7d)
    | summarize failure_count=sum(value) by bin(timestamp, 1h);
all_syncs
| join kind=leftouter (failed_syncs) on timestamp
| extend failure_rate = todouble(failure_count) / todouble(total_count) * 100
| project timestamp, total_count, failure_count, failure_rate
| render timechart

This query can be used to create an alert if the last successful sync is older than a specified threshold (e.g., 2 hours).

customMetrics
| where name == "annotations_last_sync_timestamp"
| summarize LastSync = max(value)
| extend AgeInSeconds = datetime_diff('second', now(), unixtime_seconds_todatetime(LastSync))
| where AgeInSeconds > 7200 // 2 hours
| project readable_time_utc = unixtime_seconds_todatetime(LastSync), AgeInSeconds

Metric Flow Diagram

The following diagram illustrates the flow of the UpdateAnnotationsInRedisService job and when each metric is recorded.

graph TD
    A[Start Job] --> B{Try Acquire Lock};
    B -- Lock Acquired --> C[Call CacheAnnotations];
    B -- Lock Not Acquired --> D["Record: sync_total<br>(skipped, lock_not_acquired)""];

    C -- Success --> E["Record: sync_total(success)<br>Record: sync_duration_seconds(success)<br>Update: last_sync_timestamp"];
    C -- Failure --> F["Record: sync_errors_total<br>Record: sync_duration_seconds(error)""];