๐Ÿ“ˆ

Monitoring Stack

๐Ÿ‘จโ€๐Ÿณ Chefโฑ๏ธ 45 minutes

๐Ÿ“‹ Suggested prerequisites

  • โ€ขDocker
  • โ€ขDocker Compose
  • โ€ขBasic Node.js

What you'll build

A complete monitoring stack with Prometheus to collect metrics, Grafana to visualize them, and a Node.js application that exposes custom metrics.

When finished you'll have:

  • A Node.js app with /metrics endpoint using prom-client
  • Prometheus scraping metrics every 15 seconds
  • Grafana with dashboards and alerts configured
  • Everything running in Docker Compose with a single command

Step 1: Create the project

mkdir monitoring-stack && cd monitoring-stack
mkdir app prometheus grafana

Step 2: Create the Node.js app with metrics

Create app/package.json:

{
  "name": "metrics-app",
  "version": "1.0.0",
  "main": "server.js",
  "dependencies": {
    "express": "^4.18.2",
    "prom-client": "^15.1.0"
  }
}

Create app/server.js:

const express = require('express');
const client = require('prom-client');

const app = express();
const PORT = 3000;

// Create metrics registry
const register = new client.Registry();

// Add default metrics (memory, CPU, etc.)
client.collectDefaultMetrics({ register });

// Custom metric: request counter
const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'path', 'status'],
  registers: [register]
});

// Custom metric: latency histogram
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'path'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [register]
});

// Custom metric: active users gauge
const activeUsers = new client.Gauge({
  name: 'active_users',
  help: 'Number of active users',
  registers: [register]
});

// Simulate random active users
setInterval(() => {
  activeUsers.set(Math.floor(Math.random() * 100) + 10);
}, 5000);

// Middleware to measure requests
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer({ method: req.method, path: req.path });
  res.on('finish', () => {
    httpRequestsTotal.inc({ method: req.method, path: req.path, status: res.statusCode });
    end();
  });
  next();
});

// App endpoints
app.get('/', (req, res) => {
  res.json({ message: 'Hello! This app has metrics at /metrics' });
});

app.get('/api/users', (req, res) => {
  // Simulate variable latency
  const delay = Math.random() * 200;
  setTimeout(() => {
    res.json({ users: ['alice', 'bob', 'charlie'] });
  }, delay);
});

app.get('/api/slow', (req, res) => {
  // Slow endpoint to test alerts
  setTimeout(() => {
    res.json({ message: 'Slow response' });
  }, 2000);
});

// Metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(PORT, () => {
  console.log(`App running at http://localhost:${PORT}`);
  console.log(`Metrics at http://localhost:${PORT}/metrics`);
});

Create app/Dockerfile:

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

Step 3: Configure Prometheus

Create prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files: []

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-app'
    static_configs:
      - targets: ['app:3000']
    metrics_path: /metrics

Step 4: Configure Grafana

Create grafana/provisioning/datasources/datasources.yml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

Create grafana/provisioning/dashboards/dashboards.yml:

apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    options:
      path: /etc/grafana/provisioning/dashboards

Create grafana/provisioning/dashboards/app-dashboard.json:

{
  "annotations": { "list": [] },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] }
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
      "id": 1,
      "options": { "colorMode": "value", "graphMode": "area", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, "textMode": "auto" },
      "pluginVersion": "10.0.0",
      "targets": [{ "expr": "rate(http_requests_total[5m])", "refId": "A" }],
      "title": "Requests per second",
      "type": "stat"
    },
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] }
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
      "id": 2,
      "options": { "colorMode": "value", "graphMode": "area", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }, "textMode": "auto" },
      "pluginVersion": "10.0.0",
      "targets": [{ "expr": "active_users", "refId": "A" }],
      "title": "Active Users",
      "type": "stat"
    },
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": { "axisCenteredZero": false, "axisColorMode": "text", "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, "drawStyle": "line", "fillOpacity": 10, "gradientMode": "none", "hideFrom": { "legend": false, "tooltip": false, "viz": false }, "lineInterpolation": "linear", "lineWidth": 1, "pointSize": 5, "scaleDistribution": { "type": "linear" }, "showPoints": "auto", "spanNulls": false, "stacking": { "group": "A", "mode": "none" }, "thresholdsStyle": { "mode": "off" } },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 8 },
      "id": 3,
      "options": { "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true }, "tooltip": { "mode": "single", "sort": "none" } },
      "targets": [{ "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))", "legendFormat": "p95", "refId": "A" }, { "expr": "histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))", "legendFormat": "p50", "refId": "B" }],
      "title": "HTTP Latency (p50 and p95)",
      "type": "timeseries"
    },
    {
      "datasource": { "type": "prometheus", "uid": "PBFA97CFB590B2093" },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": { "axisCenteredZero": false, "axisColorMode": "text", "axisLabel": "", "axisPlacement": "auto", "barAlignment": 0, "drawStyle": "line", "fillOpacity": 10, "gradientMode": "none", "hideFrom": { "legend": false, "tooltip": false, "viz": false }, "lineInterpolation": "linear", "lineWidth": 1, "pointSize": 5, "scaleDistribution": { "type": "linear" }, "showPoints": "auto", "spanNulls": false, "stacking": { "group": "A", "mode": "none" }, "thresholdsStyle": { "mode": "off" } },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 16 },
      "id": 4,
      "options": { "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true }, "tooltip": { "mode": "single", "sort": "none" } },
      "targets": [{ "expr": "process_resident_memory_bytes", "legendFormat": "Memory RSS", "refId": "A" }],
      "title": "Memory Usage",
      "type": "timeseries"
    }
  ],
  "refresh": "5s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["nodejs", "prometheus"],
  "templating": { "list": [] },
  "time": { "from": "now-15m", "to": "now" },
  "timepicker": {},
  "timezone": "",
  "title": "Node.js App Metrics",
  "uid": "nodejs-app-metrics",
  "version": 1,
  "weekStart": ""
}

Step 5: Create docker-compose.yml

In the project root, create docker-compose.yml:

version: '3.8'

services:
  app:
    build: ./app
    ports:
      - "3000:3000"
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    networks:
      - monitoring
    depends_on:
      - prometheus

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data:
  grafana_data:

Step 6: Start everything

# Build and start
docker compose up -d --build

# Verify they're running
docker compose ps

Wait 30 seconds and verify:

ServiceURLCredentials
Apphttp://localhost:3000-
Metricshttp://localhost:3000/metrics-
Prometheushttp://localhost:9090-
Grafanahttp://localhost:3001admin / admin123

Step 7: Explore Prometheus

  1. Open http://localhost:9090
  2. In "Expression", type http_requests_total and press Enter
  3. Try these queries:
    • rate(http_requests_total[5m]) - Requests per second
    • histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) - p95 latency
    • active_users - Active users

Step 8: Configure alerts in Grafana

  1. Open Grafana (http://localhost:3001)
  2. Login: admin / admin123
  3. Go to Alerting > Alert rules > New alert rule
  4. Configure:
    • Name: "High latency"
    • Query: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 1
    • Condition: IS ABOVE 1
  5. Save

To test the alert:

# Generate slow requests
for i in {1..20}; do curl http://localhost:3000/api/slow; done

Step 9: Generate test traffic

# Script to generate traffic
while true; do
  curl -s http://localhost:3000/ > /dev/null
  curl -s http://localhost:3000/api/users > /dev/null
  sleep 0.5
done

Let it run for 2-3 minutes and observe the dashboards.


Final structure

monitoring-stack/
  app/
    package.json
    server.js
    Dockerfile
  prometheus/
    prometheus.yml
  grafana/
    provisioning/
      datasources/
        datasources.yml
      dashboards/
        dashboards.yml
        app-dashboard.json
  docker-compose.yml

Useful commands

# View logs
docker compose logs -f app
docker compose logs -f prometheus

# Restart prometheus (reload config)
curl -X POST http://localhost:9090/-/reload

# Stop everything
docker compose down

# Stop and delete data
docker compose down -v

Troubleshooting

ProblemSolution
Prometheus not scrapingVerify app:3000 is accessible in the network
Grafana no dataWait 30s, verify datasource in Settings
Empty dashboardDatasource UID must match

Next step

-> Fullstack Testing