# Complete Guide to Instrumentation and Monitoring in Kubernetes

This guide was created to help you easily set up a **monitoring** and **instrumentation stack** using modern tools such as **Prometheus**, **Promtail**, and **OpenTelemetry**.\
\
You will use **Grafana** from **Elven Observability** to visualize your **metrics**, **logs**, and **dashboards** at the following address:

&#x20;[https://grafana.elvenobservability.com](https://grafana.elvenobservability.com/)

## **Repository Content**

```
stack-observability/
├── opentelemetry-operator/
│   ├── instrumentation.yaml
│   ├── values.yaml
│   ├── README.md
├── otel-collector/
│   ├── collector-config.yaml
│   ├── collector-deploy.yaml
│   ├── collector-service.yaml
│   ├── kustomization.yaml
│   ├── secrets.env
│   ├── README.md
├── prometheus/
│   ├── values-prometheus.yaml
├── promtail/
│   ├── values-promtail.yaml
├── helmfile.yaml
└── README.md
```

## **Prerequisites**

### **Make sure you have the following configured in your environment:**

* **Kubernetes**: Functional and configured cluster.
* **kubectl**: To manage resources in **Kubernetes.**
* **Helm**: To install **charts** .
* **Helmfile**: To install multiple **charts** with ease.

### **Install Helmfile:**

```
curl -sL <https://github.com/helmfile/helmfile/releases/download/v1.0.0-rc.7/helmfile_1.0.0-rc.7_linux_amd64.tar.gz> | sudo tar -xz -C /usr/local/bin
```

## **Installation: Step by Step**

### **Clone the Repository**

#### **Via SSH**

```
git clone git@github.com:elven-observability/stack-observability-k8s.git
```

#### **Via HTTPS**

```
git clone <https://github.com/elven-observability/stack-observability-k8s.git>
```

### **Configure Namespace and Credentials**

#### **Create the monitoring namespace**

```
kubectl create ns monitoring
```

Next, configure a `Secret`  to store the **Elven Observability** credentials. These **credentials** may have been provided directly by our team, or automatically generated if you registered through the website: <http://elvenobservability.com/>. If you haven’t received them yet, please contact **support**.

```
kubectl create secret generic elven-observability-credentials \
  -n monitoring \
  --from-literal=tenantId="<YOUR_TENANT_ID>" \
  --from-literal=apiToken="<YOUR_API_TOKEN>"
```

**Important**: Replace **\<YOUR\_TENANT\_ID>** and **\<YOUR\_API\_TOKEN>** with the correct values received during **onboarding** or via **registration**.

### **Configure Prometheus**

#### **Edit the** file `prometheus/values-prometheus.yaml`:

```
remoteWrite:
  - url: <https://mimir.elvenobservability.com/api/v1/push>
    authorization:
      type: Bearer
      credentials:
        key: apiToken
        name: elven-observability-credentials
    headers:
      X-Scope-OrgID: <SEU_TENANT_ID>
    relabelConfigs:
      - sourceLabels: [__name__]
        regex: "^(prometheus|go|promhttp|scrape).*"
        action: drop
```

### **Promtail Configuration: Filters by Namespace or Annotation**

#### In the `promtail/values-promtail.yaml`, edit:

```
config:
  snippets:
    common:
      # Filtro por annotation
      # - action: keep
      #   source_labels: [__meta_kubernetes_pod_annotation_promtail_logs]
      #   regex: "true"

      # Filtro por namespace
      - action: keep
        source_labels: [__meta_kubernetes_namespace]
        regex: "^(default|monitoring|namespace1|namespace2)$"
```

**How to Use**:

* By **Namespace**: collects all logs from pods in the defined **namespaces**.
* By **Annotation**: collects only from pods with the annotation `promtail_logs: "true"`:

```
metadata:
  annotations:
    promtail_logs: "true"
```

### **Configure the OpenTelemetry Operator**

#### Edit `opentelemetry-operator/instrumentation.yaml`:

```
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation
  namespace: default
spec:
  nodejs:
    env:
      - name: OTEL_NODE_DISABLED_INSTRUMENTATIONS
        value: fs
      - name: OTEL_NODE_RESOURCE_DETECTORS
        value: all
  exporter:
    endpoint: "<http://opentelemetrycollector.monitoring.svc.cluster.local:4318>"
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"
```

**Apply to the cluster:**

```
kubectl apply -f opentelemetry-operator/instrumentation.yaml
```

Refer to the `README.md` file in the `opentelemetry-operator` folder pfor more instrumentation examples.

### **Install the Components with Helmfile**

```
helmfile sync
```

## **Access to Grafana**

**Access:** [https://grafana.elvenobservability.com](https://grafana.elvenobservability.com/)

Use the credentials provided via support or after registration on the website to view dashboards for metrics, logs, and traces.

## **Installed Resources**

* **Prometheus**: Collection of **metrics** and **alerts**
* **Promtail**: Collection of **logs**
* **OpenTelemetry Operator**: Automatic **instrumentation**
* **OpenTelemetry Collector**: Centralization of **metrics** and **traces**

## **Examples and Best Practices**

* Ready-made examples in the `opentelemetry-operator/` folder
* Whenever possible, use **annotation** instead of capturing **logs** from the entire **namespace**
* Adjust **trace sampling** according to the **criticality** of the application
* Organize **dashboards** in **Grafana** by application or business domain

## **Tips and Optimizations**

* Use **relabelConfigs** to avoid excessive **data ingestion**
* Use custom **attributes** (`resource_attributes`) to enrich **trace** and **metric** data
* Explore public **dashboards** on **Grafana** for inspiration
* Add **alerts** based on **critical metrics** (e.g., 5xx errors, high latency, absence of events)

### **Troubleshooting (Problem Solving)**

| Problem                         | Common Cause                              | Solution                                       |
| ------------------------------- | ----------------------------------------- | ---------------------------------------------- |
| Data does not appear in Grafana | Misconfigured secret or invalid token     | Check the `Secret` confirm the credentials.    |
| Logs are not reaching Loki.     | Filters in Promtail or missing annotation | Review the filtering rules and annotations.    |
| Missing traces                  | Incorrect collector endpoint.             | Check the **URL** in the`instrumentation.yaml` |

## **Official Documentation**

* [OpenTelemetry](https://opentelemetry.io/docs/)
* [Prometheus](https://prometheus.io/docs/introduction/overview/)
* [Loki](https://grafana.com/docs/loki/)
* [Grafana](https://grafana.com/docs/grafana/latest/)

## **Support**

If you have any **questions**, **difficulties**, or **suggestions**:

* Open an **issue** in this repository.
* Contribute with a **Pull Request**.
* Talk to the **Elven team**.
