📈

Stack de Monitoreo

👨‍🍳 Chef⏱️ 45 minutos

📋 Prerequisitos sugeridos

  • Docker
  • Docker Compose
  • Node.js basico

Lo que vas a construir

Un stack de monitoreo completo con Prometheus para recolectar metricas, Grafana para visualizarlas, y una aplicacion Node.js que expone metricas personalizadas.

Al terminar tendras:

  • Una app Node.js con endpoint /metrics usando prom-client
  • Prometheus scrapeando metricas cada 15 segundos
  • Grafana con dashboards y alertas configuradas
  • Todo corriendo en Docker Compose con un solo comando

Paso 1: Crea el proyecto

mkdir monitoring-stack && cd monitoring-stack
mkdir app prometheus grafana

Paso 2: Crea la app Node.js con metricas

Crea app/package.json:

{
  "name": "metrics-app",
  "version": "1.0.0",
  "main": "server.js",
  "dependencies": {
    "express": "^4.18.2",
    "prom-client": "^15.1.0"
  }
}

Crea app/server.js:

const express = require('express');
const client = require('prom-client');

const app = express();
const PORT = 3000;

// Crear registro de metricas
const register = new client.Registry();

// Agregar metricas por defecto (memoria, CPU, etc.)
client.collectDefaultMetrics({ register });

// Metrica personalizada: contador de requests
const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total de requests HTTP',
  labelNames: ['method', 'path', 'status'],
  registers: [register]
});

// Metrica personalizada: histograma de latencia
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duracion de requests HTTP en segundos',
  labelNames: ['method', 'path'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [register]
});

// Metrica personalizada: gauge de usuarios activos
const activeUsers = new client.Gauge({
  name: 'active_users',
  help: 'Numero de usuarios activos',
  registers: [register]
});

// Simular usuarios activos aleatorios
setInterval(() => {
  activeUsers.set(Math.floor(Math.random() * 100) + 10);
}, 5000);

// Middleware para medir requests
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer({ method: req.method, path: req.path });
  res.on('finish', () => {
    httpRequestsTotal.inc({ method: req.method, path: req.path, status: res.statusCode });
    end();
  });
  next();
});

// Endpoints de la app
app.get('/', (req, res) => {
  res.json({ message: 'Hola! Esta app tiene metricas en /metrics' });
});

app.get('/api/users', (req, res) => {
  // Simular latencia variable
  const delay = Math.random() * 200;
  setTimeout(() => {
    res.json({ users: ['alice', 'bob', 'charlie'] });
  }, delay);
});

app.get('/api/slow', (req, res) => {
  // Endpoint lento para probar alertas
  setTimeout(() => {
    res.json({ message: 'Respuesta lenta' });
  }, 2000);
});

// Endpoint de metricas para Prometheus
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(PORT, () => {
  console.log(`App corriendo en http://localhost:${PORT}`);
  console.log(`Metricas en http://localhost:${PORT}/metrics`);
});

Crea app/Dockerfile:

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

Paso 3: Configura Prometheus

Crea prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files: []

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-app'
    static_configs:
      - targets: ['app:3000']
    metrics_path: /metrics

Paso 4: Configura Grafana

Crea grafana/provisioning/datasources/datasources.yml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

Crea grafana/provisioning/dashboards/dashboards.yml:

apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    options:
      path: /etc/grafana/provisioning/dashboards

Crea grafana/provisioning/dashboards/app-dashboard.json:

{
  "annotations": { "list": [] },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] }
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
      "id": 1,
      "options": { "colorMode": "value", "graphMode": "area", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, "textMode": "auto" },
      "pluginVersion": "10.0.0",
      "targets": [{ "expr": "rate(http_requests_total[5m])", "refId": "A" }],
      "title": "Requests por segundo",
      "type": "stat"
    },
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] }
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
      "id": 2,
      "options": { "colorMode": "value", "graphMode": "area", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, "textMode": "auto" },
      "pluginVersion": "10.0.0",
      "targets": [{ "expr": "active_users", "refId": "A" }],
      "title": "Usuarios Activos",
      "type": "stat"
    },
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": { "axisCenteredZero": false, "axisColorMode": "text", "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, "drawStyle": "line", "fillOpacity": 10, "gradientMode": "none", "hideFrom": { "legend": false, "tooltip": false, "viz": false }, "lineInterpolation": "linear", "lineWidth": 1, "pointSize": 5, "scaleDistribution": { "type": "linear" }, "showPoints": "auto", "spanNulls": false, "stacking": { "group": "A", "mode": "none" }, "thresholdsStyle": { "mode": "off" } },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 8 },
      "id": 3,
      "options": { "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true }, "tooltip": { "mode": "single", "sort": "none" } },
      "targets": [{ "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))", "legendFormat": "p95", "refId": "A" }, { "expr": "histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))", "legendFormat": "p50", "refId": "B" }],
      "title": "Latencia HTTP (p50 y p95)",
      "type": "timeseries"
    },
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": { "axisCenteredZero": false, "axisColorMode": "text", "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, "drawStyle": "line", "fillOpacity": 10, "gradientMode": "none", "hideFrom": { "legend": false, "tooltip": false, "viz": false }, "lineInterpolation": "linear", "lineWidth": 1, "pointSize": 5, "scaleDistribution": { "type": "linear" }, "showPoints": "auto", "spanNulls": false, "stacking": { "group": "A", "mode": "none" }, "thresholdsStyle": { "mode": "off" } },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 16 },
      "id": 4,
      "options": { "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true }, "tooltip": { "mode": "single", "sort": "none" } },
      "targets": [{ "expr": "process_resident_memory_bytes", "legendFormat": "Memoria RSS", "refId": "A" }],
      "title": "Uso de Memoria",
      "type": "timeseries"
    }
  ],
  "refresh": "5s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["nodejs", "prometheus"],
  "templating": { "list": [] },
  "time": { "from": "now-15m", "to": "now" },
  "timepicker": {},
  "timezone": "",
  "title": "Node.js App Metrics",
  "uid": "nodejs-app-metrics",
  "version": 1,
  "weekStart": ""
}

Paso 5: Crea el docker-compose.yml

En la raiz del proyecto, crea docker-compose.yml:

version: '3.8'

services:
  app:
    build: ./app
    ports:
      - "3000:3000"
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    networks:
      - monitoring
    depends_on:
      - prometheus

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data:
  grafana_data:

Paso 6: Levanta todo

# Construir y levantar
docker compose up -d --build

# Verificar que esten corriendo
docker compose ps

Espera 30 segundos y verifica:

ServicioURLCredenciales
Apphttp://localhost:3000-
Metricashttp://localhost:3000/metrics-
Prometheushttp://localhost:9090-
Grafanahttp://localhost:3001admin / admin123

Paso 7: Explora Prometheus

  1. Abre http://localhost:9090
  2. En "Expression", escribe http_requests_total y da Enter
  3. Prueba estas queries:
    • rate(http_requests_total[5m]) - Requests por segundo
    • histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) - p95 latencia
    • active_users - Usuarios activos

Paso 8: Configura alertas en Grafana

  1. Abre Grafana (http://localhost:3001)
  2. Login: admin / admin123
  3. Ve a Alerting > Alert rules > New alert rule
  4. Configura:
    • Name: "Alta latencia"
    • Query: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 1
    • Condition: IS ABOVE 1
  5. Guarda

Para probar la alerta:

# Genera requests lentos
for i in {1..20}; do curl http://localhost:3000/api/slow; done

Paso 9: Genera trafico de prueba

# Script para generar trafico
while true; do
  curl -s http://localhost:3000/ > /dev/null
  curl -s http://localhost:3000/api/users > /dev/null
  sleep 0.5
done

Deja correr por 2-3 minutos y observa los dashboards.


Estructura final

monitoring-stack/
  app/
    package.json
    server.js
    Dockerfile
  prometheus/
    prometheus.yml
  grafana/
    provisioning/
      datasources/
        datasources.yml
      dashboards/
        dashboards.yml
        app-dashboard.json
  docker-compose.yml

Comandos utiles

# Ver logs
docker compose logs -f app
docker compose logs -f prometheus

# Reiniciar prometheus (recarga config)
curl -X POST http://localhost:9090/-/reload

# Detener todo
docker compose down

# Detener y borrar datos
docker compose down -v

Troubleshooting

ProblemaSolucion
Prometheus no scrapeaVerifica que app:3000 sea accesible en la red
Grafana sin datosEspera 30s, verifica datasource en Settings
Dashboard vacioEl UID del datasource debe coincidir

Proximo paso

-> Testing Fullstack