Skip to main content

๐Ÿ“– Understanding Incidents

When something goes wrong with a monitored service, Monitron creates an incident. This page explains how incidents work and how to manage them.


๐Ÿ”„ Incident Lifecycleโ€‹

Monitor Check Fails
โ”‚
โ–ผ
Fail Threshold Met? โ”€โ”€โ”€โ”€ No โ”€โ”€โ†’ Wait for next check
โ”‚
Yes
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ INVESTIGATING โ”‚ โ† Incident created, notifications sent
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ IDENTIFIED โ”‚ โ† Team acknowledges and identifies cause
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ MONITORING โ”‚ โ† Fix applied, watching for stability
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ RESOLVED โ”‚ โ† Confirmed fixed, incident closed
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š Incident Statusesโ€‹

StatusMeaningColor
๐Ÿ”ด InvestigatingJust detected, team is looking into itRed
๐ŸŸก IdentifiedRoot cause found, working on a fixYellow
๐Ÿ”ต MonitoringFix applied, monitoring for stabilityBlue
๐ŸŸข ResolvedConfirmed fixed, all clearGreen

โšก Severity Levelsโ€‹

SeverityWhen to UseColor
โ„น๏ธ InfoMinor issues, degraded performanceBlue
โš ๏ธ WarningPartial outage, potential impactYellow
๐Ÿ”ด CriticalFull service outageRed
๐Ÿ”ฅ EmergencyMultiple services affected, major impactRed (flashing)

๐Ÿค– Automatic Incidentsโ€‹

Monitron automatically creates incidents when:

  1. A monitor goes down โ€” After the fail threshold is met (configurable), an incident is created with:

    • Title: "{Monitor Name} is down"
    • Severity: Critical
    • Status: Investigating
    • The error message from the check
  2. A heartbeat is missed โ€” If no ping is received within interval + grace period

  3. A monitor recovers โ€” The incident is automatically resolved


โœ‹ Manual Incidentsโ€‹

You can also create incidents manually from the Incidents page:

  1. Click "New Incident"
  2. Fill in the title, severity, description
  3. Optionally link to a monitor
  4. The incident appears on your dashboard and status pages

๐Ÿ‘† Managing Incidentsโ€‹

Acknowledgeโ€‹

Click the "Acknowledge" button to signal that someone is looking at the issue. This records who acknowledged it and when.

Resolveโ€‹

Click "Resolve" to close the incident. This records the resolution time and calculates the total duration.

Durationโ€‹

Monitron automatically calculates how long each incident lasted:

  • Start: When the incident was created
  • End: When it was resolved (or current time if still open)
  • Duration: Human-readable format (e.g., "2 hours 15 minutes")

๐Ÿค– AI-Powered Incident Managementโ€‹

If AI features are enabled, you get extra superpowers:

FeatureWhat It Does
AI Root CauseClick to get an AI analysis of why the incident happened
AI PostmortemAuto-generate a blameless postmortem report (for resolved incidents)
AI Status DraftGet a public-facing status update drafted by AI

See the AI Features section for details.


๐Ÿ’ก Tipsโ€‹

  • Keep incidents updated โ€” Add incident updates as you learn more. Your team and status page subscribers see these.
  • Use severity correctly โ€” Reserve "Emergency" for true emergencies. Alert fatigue is real!
  • Review resolved incidents โ€” Use AI Postmortem or manual review to learn from incidents and prevent recurrence.