Skip to content

Changelog

  • Added:

    • New workflow for Enterprise Application incidents the Incident Commander is now automatically assigned an action to provide an update when incident severity is downgraded, issue here
    • Updated lifecycle for Enterprise Application incidents to optionally ask for a RootCauseAnalysis ( RCA ), issue here
  • Changed:

    • We’ve enabled the incident.io “Teams” feature using Slack groups. This may change your default filter to only show incidents for the team you’re assigned to. You can set your default filter in the incident.io app preferences. This change does not affect what incidents you can interact with.
  • Changed:

    • Triage incidents for GitLab.com will no longer auto-decline when the alert clears. This ensures we perform proper follow-up for auto-resolving alerts (read more).
  • Changed:

    • CMOCs will no longer be paged automatically (read more) , to escalate please use the 1 Click Page CMOC button on the incident slack channel or alternatively via /inc escalate
  • Changed:

    • DBO tier2 escalations are moving from PagerDuty to incident.io. Process for paging remains the same: /inc escalate. Paging via /pd will remain available for one more week.
  • Changed:

    • Follow-up issues in GitLab will now be automatically assigned to either the “owner” or “creator” if the Owner isn’t set or doesn’t have a linked GitLab account. Formerly they were assigned only to the “Owner” and no one if the owner was not set.
  • Changed:
    • The “incident review” steps have been removed from the S3/S4 GitLab.com post-incident workflow. If a post-incident review is desired for an S3/S4 incident, you can still create a post-incident review manually from the “Post-incident” tab on incident.io.
  • Changed:
    • We have switched incident.io to use UTC by default. This will apply everywhere except their on-call product. If you see any issues please let us know in #g_networking_and_incident_management
  • Added:
    • Introduced Contributing Factors field to incident.io for categorizing root causes of incidents. This includes predefined categories such as monitoring/alerting gaps, human factors, technical issues, and an “unidentified” option for unknown factors. The field is now required during the post-incident workflow. - MR
    • Automated linking of incident reviews to incident issues via incident.io workflows - Issue
    • Automated MR linking to incident issues through woodhouse integration, allowing merge requests to be automatically associated with incident issues - MR
    • Added a nudge to remind folks that sev3/sev4 incidents do not automatically page EOC, and provides them a 1-click button that will page EOC if needed.
  • Changed:

    • Modified EOC (Engineer On Call) role description to clarify they serve as the initial incident responder with escalation paths when runbooks are insufficient - MR #13565
    • Refactored incident management documentation to separate roles (Incident Lead, Incident Responder, etc.) from response teams (EOC, IMOC, CMOC) for better clarity - MR #13565
    • Updated incident workflow documentation to reflect current incident.io processes - MR #13454
    • Enhanced pingdom monitoring to create triage incidents in incident.io for better alert visibility - MR #8890
    • Modified triage-ops rules to add a 10-day delay before auto-closing incidents to prevent premature closure of active incidents - MR #540
  • Added:

    • Introduced automated incident creation from alerts with triage channel (#incidents-dotcom-triage) for initial alert review - Issue #26541
    • Added automated linking between follow-up issues and incident issues via new Woodhouse integration - MR #621
    • Introduced workflow diagram for EOC responsibilities during incidents - MR #8910
    • Added emoji reaction feature that automatically posts incident channel messages to GitLab issues (excluding images) - Issue #26710
    • Documented incident lead responsibilities for managing follow-up issues within one business day - MR #14005
  • Fixed:

    • Corrected issue where incident issues were being closed while incidents remained active by adding delay to auto-close rules - MR #540
    • Updated broken links and references throughout incident management documentation - MR #13565
  • Removed:

    • Disabled automatic assignment of incident lead role in favor of deliberate assignment based on incident context - Issue #26678
    • Removed requirement for participants to add role information to their Zoom display names during incidents - MR #13565