Skip to main content

Incidents

When a monitor detects a problem, Beacon creates an incident. Incidents track the full timeline of an outage from detection to resolution.

Incident Lifecycle

Check Fails (Region 1)


Check Fails (Region 2) ── Incident Created ── Alerts Sent


    Investigating


    Identified


Check Passes ── Resolved ── Resolution Alert

Incident States

StateDescription
investigatingInitial state, problem detected
identifiedRoot cause found
monitoringFix deployed, watching
resolvedService restored

Automatic Detection

Beacon automatically creates incidents when:
  1. Multiple regions fail - Avoids false positives from local network issues
  2. Consecutive failures - Requires 2-3 failed checks before alerting
  3. Threshold exceeded - Response time exceeds configured limits

Incident Updates

Add updates to communicate status to your team and status page subscribers:
incident = BrainzLab::Beacon.incident("inc_abc123")

incident.add_update(
  status: :identified,
  message: "Database connection pool exhausted. Scaling up."
)

Auto-Resolution

Incidents are automatically resolved when:
  • All regions report successful checks
  • Consecutive successful checks meet threshold (default: 3)
# Configure auto-resolution
BrainzLab::Beacon.configure_monitor("production-api",
  auto_resolve_after: 3, # consecutive successes
  recovery_notification: true
)

Incident Timeline

Every incident maintains a full timeline:
incident = BrainzLab::Beacon.incident("inc_abc123")

incident.timeline.each do |event|
  puts "#{event.timestamp}: #{event.type} - #{event.message}"
end
# 2024-01-15 10:30:00: check_failed - NYC region timeout
# 2024-01-15 10:30:05: check_failed - London region 503
# 2024-01-15 10:30:10: incident_created - Incident opened
# 2024-01-15 10:35:00: status_update - Identified: DB issue
# 2024-01-15 10:45:00: check_passed - All regions healthy
# 2024-01-15 10:45:30: incident_resolved - Auto-resolved

Notification Channels

Incidents trigger alerts through Signal integration:
  • Slack - Channel notifications with action buttons
  • PagerDuty - On-call escalation
  • Email - Summary emails
  • Webhooks - Custom integrations

Configure Alerts

Set up notification channels in Signal

Downtime Metrics

Track reliability over time:
stats = BrainzLab::Beacon.uptime_stats("production-api", period: :month)

puts stats.uptime_percentage # => 99.95
puts stats.total_incidents   # => 2
puts stats.total_downtime    # => 21 (minutes)
puts stats.mttr              # => 10.5 (mean time to resolve)