# Kafka 2.0 Queue Monitor Configuration Guide on the Elven Platform

The **Kafka Monitor** from the **Elven Platform** enables checks based on **key queries** in **Kafka**. This feature helps configure **continuous checks**, set **alerts**, and define **thresholds** for **automatic incident creation**, ensuring you are quickly informed about any irregularities in **connectivity** or **service performance**.

**Apache Kafka** is an **open-source distributed streaming platform** designed for **real-time data stream processing and integration**. It operates both as a **pub/sub messaging system** and a **distributed log storage system**, enabling the transport of large volumes of data with **high resilience** and **low latency**. Kafka organizes data into **topics**, which can be consumed by multiple clients simultaneously, ensuring **horizontal scalability**.

Its ability to handle **millions of messages per second** makes it ideal for applications such as **real-time event analysis**, **data pipelines**, **system integrations**, and **microservices-based solutions**.

## **Accessing Kafka 2.0 Monitoring**

* Navigate to the **main menu** and click on **Services Hub**.
* In **Queue**, select the item **Kafka 2.0**.

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/IR1iQx8pvIBoHoJ2XNUy/kfkm201.png" alt=""><figcaption></figcaption></figure>

## **Monitoring Configuration**

**Monitoring your service availability has never been easier.** Start by giving a clear name to the resource you want to monitor (**Resource Name**) to make it easier to identify. Then, adjust the **Interval** between checks and the **Timeout** for receiving a response.

Select where the **monitoring agent** is located by choosing the appropriate **Checkpoint Cloud**. If no environment exists, you can create one by clicking **+ Checkpoint**. After this setup, in **Form Type**, you can choose between **User and Password** or **Healthcheck URL**.

If you select **User and Password**, you must provide the server address in the **Host**, **Port**, **Topic** (the name of the Kafka topic you want to monitor or interact with), **Consumer Group** (the consumer group to be analyzed), and **Lag Tolerance** (the maximum delay tolerated before triggering an alert or identifying a system issue).

If you choose **Healthcheck URL**, simply enter the **URL** of the resource to be monitored. Remember that the **Host** and **Healthcheck URL** fields accept only **URLs**. If you need to use an **IP address**, it must be stored in a **Secret** to ensure **security** and **organization** of the information.

If **authentication** is required in Kafka, check the **Authentication** option and fill in the fields according to the **security mechanism** implemented by the cluster. First, enter the **Username** and corresponding **Password**. These values are provided by the Kafka administrator and are used to authenticate the client.

Next, in the **Mechanism** field, select the **authentication mechanism** used by the Kafka broker. If the cluster uses a simple method like **PLAIN**, only the **username** and **password** are required. For more secure configurations like **SCRAM**, Kafka performs authentication based on **hashed passwords**, offering greater protection. If the cluster uses **OAuth**, authentication is done using **OAuth 2.0 tokens**, allowing secure integration with identity providers. In the case of **GSSAPI**, the mechanism uses the **Kerberos protocol** for authentication, providing a robust and widely adopted solution in enterprise environments.

Additionally, check whether Kafka is configured to use **TLS** or **SASL**. If the cluster implements **encryption** to ensure secure communication, enable the **TLS** option. If Kafka uses an additional **authentication layer** such as **SASL**, also check the corresponding box.

### **Configuration with User and Password**

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/PJAWZqxl6hfCqSzOU4I9/kfkm202.png" alt=""><figcaption></figcaption></figure>

### **Configuration with** **User and Password** **and** **Authentication**

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/yWREvY3myKcSDeKxNw5X/kfkm203.png" alt=""><figcaption></figcaption></figure>

### **Configuration with** **Healthcheck URL**

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/1p88tKfvI5uwngYrn7Rp/kfkm204.png" alt=""><figcaption></figcaption></figure>

## **Automatic Incident Opening**

You can configure **automatic incident opening** to ensure a quick response to critical issues. To begin, define the **incident severity**, allowing you to prioritize according to urgency. Next, adjust the **Check Interval**, specifying the check frequency in seconds to continuously monitor the resource. This helps ensure you're always one step ahead, detecting problems as soon as they arise.

Additionally, select the **team to be notified** whenever an incident occurs and enable the **"Enable to set up automatic incidents opening"** option to ensure the configuration is active. With this setup, the platform automates **incident management**, making the response process faster and more efficient, without the need for manual intervention. This ensures your team is always ready to resolve any issue with speed and precision.

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/thQcGEDJrAMHw1JKCyJD/kfkm205.png" alt=""><figcaption></figcaption></figure>

## **Maintenance Window**&#x20;

We also have the **Maintenance Window**, an essential feature for managing **planned maintenance periods** in your application. During this interval, **checks** are temporarily paused, preventing **monitoring**, **alerts**, and **notifications** from being triggered while you perform updates or adjustments. This allows maintenance to proceed smoothly, without generating unnecessary notifications or false alarms, ensuring your operations continue in an orderly fashion without unexpected interruptions in **performance reports**.

For example, imagine you need to update the **payment system** of an **e-commerce** platform, making backend adjustments such as installing new **security certificates**. To do this, you can configure the **Maintenance Window** for a specific time, such as **12/13/2024**, from **2:00 PM to 2:30 PM**. During this period, the **Elven Platform** suspends checks, preventing the monitoring system from registering temporary failures or triggering false alerts. This way, you can make the necessary changes calmly, knowing that the **monitoring system** will not be impacted during maintenance.

This approach ensures that the update is carried out in an organized manner, without affecting the **user experience** or generating unwanted **notifications**.

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/7sR09YZrYdxgKBFK3XVD/kfkm206.png" alt=""><figcaption></figcaption></figure>

## **Application Opening Hours**&#x20;

You can also rely on the **Application Opening Hours** feature, which allows you to configure your application's **operating hours**. This functionality is essential for customizing **monitoring** based on the periods when your application is actually active, avoiding **alerts** and **notifications** outside of business hours. This way, monitoring becomes more aligned with your business’s real needs, ensuring more accurate **reports** and efficient **management**.

For example, imagine your application operates only from **Monday to Friday**, between **9:00 AM and 6:00 PM**. You can configure the **Application Opening Hours** to reflect this schedule by specifying the **days** and **operating periods**. With this setup, the **Elven Platform** automatically disables checks outside of these hours, preventing the logging of failures that don’t affect end users and avoiding unnecessary **alerts**.

This approach optimizes **performance analysis**, focusing only on relevant periods and providing a clearer view of your application's **health** during its **operating hours**.

<figure><img src="https://content.gitbook.com/content/NbD6tAAcbxaY8pw1cchL/blobs/JiTOugfUwwdbq7bfrJ9t/kfkm207.png" alt=""><figcaption></figcaption></figure>

## **Glossary of Technical Terms**

**Kafka**: An **open-source event streaming platform** designed to **process**, **store**, and **transmit** large volumes of data in **real time**. With support for **publish-subscribe patterns**, **distributed queues**, and **durable storage**, it offers high **scalability**, **low latency**, and **high availability**. Ideal for applications requiring **continuous data processing**, such as **system integration**, **real-time analytics**, and **monitoring**. Compatible with various **programming languages**, it is widely used in **microservices architectures** and **modern data pipelines**.

**Interval**: Time interval between **automatic checks** performed during monitoring.

**Timeout**: Maximum time allowed for monitoring to receive a response from the monitored resource before registering a **failure**.

**Checkpoint Cloud**: Location where the **monitoring agent** is hosted, which can be a **preexisting environment** or one **created by the user**.

**Host**: **URL address** of the monitored resource. If an **IP address** is needed, it must be stored in a **Secret** for enhanced **security**.

**Port**: **Access port** for **Kafka**.

**Consumer Group**: A group of consumers that **share the load** of processing messages from one or more **Kafka topics**. Ensures that each **partition** is processed by only one consumer within the group.

**Lag Tolerance**: Acceptable limit of **message consumption delay**, measured by the number of **unprocessed messages**. Used to identify **bottlenecks** and maintain **system performance**.

**Secret**: Resource used to store **sensitive information**, such as **IP addresses** or **credentials**, ensuring **security** and **organization**.

**Username**: Unique identifier configured in the **Kafka broker** to authenticate a client’s access to the **cluster**.

**Password**: Credential associated with the **username**, used for **authentication** and **authorization** in **Kafka**.

**Mechanism**: **Authentication method** used by **Kafka**, selected in the form. Examples include:

* **PLAIN**: Simple method using **username** and **password** for authentication.
* **SCRAM (Salted Challenge Response Authentication Mechanism)**: More secure mechanism using **hashed passwords** for authentication.
* **OAUTHBEARER**: Uses **OAuth 2.0 tokens** for authentication, allowing integration with **identity providers**.
* **GSSAPI (Generic Security Service Application Program Interface)**: **Kerberos-based authentication**, widely used in **enterprise environments**.

**TLS (Transport Layer Security)**: **Encryption protocol** that protects communication between the **Kafka client** and **server**, ensuring **data confidentiality**.

**SASL (Simple Authentication and Security Layer)**: **Authentication layer** supporting various mechanisms (such as **PLAIN**, **SCRAM**, **OAUTHBEARER**, and **GSSAPI**) for secure authentication in **Kafka**.

**Kafka Broker**: A component of a **Kafka cluster** responsible for **storing** and **distributing messages** between **producers** and **consumers**.

**Kafka Cluster**: A group of **Kafka brokers** working together to **manage** and **process messages** in a **distributed** and **scalable** manner.

**Enable to set up automatic incidents opening**: Option that, when enabled, activates **automatic incident creation** upon detection of **critical issues**.

**Severity**: Level of **criticality** assigned to an **incident**, allowing it to be **prioritized** according to **urgency**.

**Check Interval**: Time interval, in **seconds**, for performing **continuous checks** on the monitored resource.

**Maintenance Window**: Feature that **temporarily pauses monitoring**, **alerts**, and **notifications** during **planned maintenance periods**.

**Application Opening Hours**: Configuration that defines the **operating hours** of the application, aligning **monitoring** with **active periods** and avoiding **alerts** outside those hours.
