Incident Center Management Guide for the Elve

On the Elven Platform, you have two practical ways to manage incidents: directly on the application page, where you can view specific incidents related to it, or through the Incident Center, a dedicated hub for efficiently monitoring and resolving issues. In the Incident Center, you can search and filter incidents easily, optimizing your experience and speeding up the identification of what truly matters. Additionally, the platform automatically detects incidents through continuous application monitoring, but also offers the flexibility to manually create incidents. This ensures that any relevant situation receives the necessary attention from your team.

Accessing the Incident Center

  • Navigate to the main menu and click on Incident Management.

  • In the submenu, select the Incidents item.

Searching for a Specific Incident

In the Incident Center of the Elven Platform, finding an incident is quick and straightforward. With an intuitive search bar, you can locate incidents by name, while advanced filters by status, severity, source, and date range help refine your search, providing an even more efficient experience. For example, when searching for the incident “API GO,” you can quickly view details such as High severity, Resolved status, start and end times, and the detailed cause, like a DNS error when attempting to locate the domain. This functionality not only makes it easier to access information but also helps your team act quickly, saving time and ensuring that issues are resolved efficiently. All of this was designed to offer you a smoother and more productive incident management experience.

Manually Opening an Incident

You can manually open an incident to ensure the team is immediately notified about specific issues that were not automatically detected. Below are the steps to configure a new manual incident:

How to Manually Open an Incident

Incident Name

Choose a clear and objective name that describes the issue. This will help your team quickly understand what is happening. Example: “Connection Failure with Service X API”.

Incident Cause

Explain what caused the issue so that everyone understands the context. Be direct and include useful details. Example: “The API failed to access the database due to a network outage.”

Start Time

Provide the date and time when the issue began. This helps track the impact and timeline of the incident. Example: 03/12/2024, 01:41 PM.

Incident Status

  • Alarmed: The incident has been identified but is still awaiting action.

  • Resolved: The issue has already been resolved. Note: If the incident is created as “Resolved,” the acknowledgment and resolution time metrics will be reset, and all involved parties will receive a notification.

Severity

Define the severity based on the impact of the issue:

  • Sev 1 – Critical: Severe and urgent impact.

  • Sev 2 – High: High priority, but not critical.

  • Sev 3 – Medium: Moderate impact.

  • Sev 4 – Low: Minor impact.

  • Not classified: Cannot be classified.

  • Linked Sources: Identify which service, application, or system is related to the incident.

  • Linked Alerts: Add the associated alerts to help the team better understand what happened.

Add Responders

Include the team members who will handle the incident. This ensures everyone knows who is responsible for addressing the issue.

Managing Incident Response

Now let's manage the Incident Details in the Incident Center of the Elven Platform. On the incident page, we have a clear and objective view of ongoing issues, focusing on facilitating communication and quick resolution. This allows us to monitor the incident's progress, log updates, and even integrate communication via Slack.

Additionally, it is possible to Acknowledge the incident, notifying the team that it has been recognized and is being investigated, or Resolve it, closing the incident and notifying all involved parties. These actions ensure greater transparency and agility in management, promoting an efficient and organized response.

Glossary of Technical Terms

Incident Center: Central hub for monitoring and managing incidents on the Elven Platform. It allows for viewing, searching, creating, and managing incidents, optimizing the team’s response and facilitating problem resolution.

Incidents: Term used to refer to events or issues identified in applications or systems that need to be monitored, investigated, and resolved by the team.

Search Bar: A search tool in the platform that allows you to quickly locate incidents using the name or other search criteria.

Advanced Filters: Feature that allows you to refine the search for incidents using parameters such as:

  • Status: Like “Alarmed” or “Resolved”.

  • Severity: Classification of the incident’s impact (SEV 1, SEV 2, etc.).

  • Source: The system or application where the incident originated.

  • Time Period: Time range to filter incidents based on when they occurred.

Manual Incident: An incident manually created by the team, usually to report an issue that was not automatically detected by the platform.

Incident Name: Field where you briefly describe the incident. It should be clear and objective to help quickly understand the issue.

Incident Cause: Description of what caused the incident. This field helps provide context and details for the team to understand the problem.

Start Time: The date and time when the incident began, helping to track the impact and resolution time.

Incident Status: Indicator showing the current state of the incident. Options include:

  • Alarmed: The incident has been identified but is still awaiting action.

  • Resolved: The incident has been resolved.

Severity: Classification of the incident’s impact, which helps prioritize resolution. Options include:

  • SEV 1 – Critical: Severe incident with significant impact.

  • SEV 2 – High: High priority, but not critical.

  • SEV 3 – Medium: Moderate impact.

  • SEV 4 – Low: Minor impact.

  • Not classified: Incident not classified.

Linked Sources: Field where you can identify which service, application, or system is related to the incident, helping to understand its origin.

Linked Alerts: Alerts associated with the incident, providing more context about what triggered or contributed to the issue.

Responders: Team members responsible for resolving the incident. Including them ensures everyone knows who is handling the issue.

Incident Response Management: Actions taken to resolve or monitor the progress of the incident, ensuring efficient and transparent management. Key actions include:

  • Acknowledge: Notify the team that the incident has been identified and is being investigated.

  • Resolve: Close the incident, indicating the issue has been resolved and notifying all involved.

Slack Communication: Integration of the Incident Center with Slack, allowing the team to communicate directly about the incident, facilitating real-time management and resolution.

Resolution Notification: Notification sent to all involved parties when an incident is resolved, ensuring transparency and proper closure of the event.

Last updated

Was this helpful?