Integrating Prometheus, Grafana, Elasticsearch, Kibana, and Logstash into MERN Applications on AWS: A Comprehensive Guide

Introduction

In the rapidly evolving digital world, the efficiency, reliability, and scalability of web applications are paramount. For developers leveraging the MERN stack (MongoDB, Express.js, React, Node.js) on AWS, ensuring optimal performance and user experience requires advanced monitoring and analytics solutions. Enter Prometheus, Grafana, Elasticsearch, Kibana, and Logstash—five powerhouse tools that, when integrated into a MERN application, provide unparalleled insights into your app's operations. This blog post delves deep into why and how to harness these tools effectively, paving the way for enhanced application performance and informed decision-making.

The integration of these tools with a MERN application on AWS embodies a strategic approach to application monitoring and analytics. Prometheus offers robust monitoring and alerting, Grafana enables dynamic visualization, and the Elasticsearch, Kibana, and Logstash trio (the ELK stack)—or AWS's managed, Elasticsearch-compatible OpenSearch service—provides powerful log ingestion, search, and visualization. Together, they form a comprehensive observability ecosystem that can significantly elevate the performance of MERN applications.

Architecture Overview on AWS

Before jumping into configuration, decide on your runtime and operational model:

Compute
- EC2: Full control, you manage everything. Good for small/medium setups or custom needs.
- ECS/Fargate: Container-native, lower ops burden. Integrates well with FireLens/Fluent Bit for logs.
- EKS (Kubernetes): Best for larger teams and complex deployments; Helm charts accelerate observability setup.
Storage and managed services
- OpenSearch Service (managed Elasticsearch-compatible) for logs and analytics.
- Amazon Managed Grafana (optional) to offload Grafana hosting and auth.
- S3 for snapshots/archives of indices and for ILM cold storage patterns.
Networking and security
- VPC with private subnets for Prometheus, Logstash, and OpenSearch data nodes.
- ALB/NLB or API Gateway for controlled ingress to Grafana/Kibana.
- Security Groups and NACLs limiting access to required ports only.
- IAM roles for service-to-service auth where applicable (e.g., OpenSearch SigV4).

High-level data flows:

Metrics: Node/Express + exporters → Prometheus → Alerts via Alertmanager → Grafana dashboards
Logs: App JSON logs → Filebeat/Fluent Bit → Logstash (parse/enrich) → OpenSearch/Elasticsearch → Kibana dashboards

Step 1 — Instrument the MERN App for Metrics

Focus on the Node/Express backend first; front-end metrics can be forwarded via a small API if desired.

Install dependencies

npm i prom-client pino pino-http

Express instrumentation (metrics + JSON logs)

// src/server.ts
import express from "express";
import client from "prom-client";
import pino from "pino";
import pinoHttp from "pino-http";

// Logging
const logger = pino({ level: process.env.LOG_LEVEL || "info" });

const app = express();
app.use(pinoHttp({ logger }));

// Prometheus default metrics and custom histograms
client.collectDefaultMetrics({ prefix: "node_app_" });

const httpRequestDuration = new client.Histogram({
  name: "http_request_duration_seconds",
  help: "HTTP request duration",
  labelNames: ["method", "route", "status_code"],
  buckets: [0.05, 0.1, 0.3, 0.5, 1, 2, 5]
});

app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer({ method: req.method, route: req.path });
  res.on("finish", () => end({ status_code: res.statusCode }));
  next();
});

// Health endpoint
app.get("/healthz", (_req, res) => res.json({ ok: true }));

// Metrics endpoint
app.get("/metrics", async (_req, res) => {
  res.set("Content-Type", client.register.contentType);
  res.end(await client.register.metrics());
});

const port = process.env.PORT || 3000;
app.listen(port, () => logger.info({ msg: "Server listening", port }));

Optional: MongoDB metrics

Use Percona's mongodb_exporter or community exporters to expose MongoDB metrics (e.g., on :9216).
Prometheus will scrape it like any other target.

Front-end (React) UX metrics

Consider web-vitals in React, forward summary stats to your API, expose counters/histograms via Prometheus.
Avoid pushing directly from browsers to Prometheus; aggregate server-side.

Step 2 — Deploy Prometheus and Alertmanager

For EC2: run Prometheus in a dedicated instance or container. For EKS/ECS: use Helm or task definitions.

Minimal prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Scrape the Node/Express app
  - job_name: "node-api"
    metrics_path: /metrics
    static_configs:
      - targets: ["node-api.internal:3000"]  # replace with service DNS or IP

  # Scrape MongoDB exporter
  - job_name: "mongodb"
    static_configs:
      - targets: ["mongodb-exporter:9216"]

  # EC2 service discovery example (filter by tag)
  - job_name: "node-api-ec2-sd"
    metrics_path: /metrics
    ec2_sd_configs:
      - region: us-east-1
        port: 3000
    relabel_configs:
      - source_labels: [__meta_ec2_tag_Role]
        regex: node-api
        action: keep

Basic alert rules

# rules/alerts.yml
groups:
  - name: node-api
    rules:
      - alert: HighErrorRate
        expr: sum(rate(http_request_duration_seconds_count{status_code=~"5.."}[5m]))
              /
              sum(rate(http_request_duration_seconds_count[5m])) > 0.05
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High 5xx rate on node-api"
          description: "5xx error rate >5% for 10m."

Alertmanager config (Slack example)

# alertmanager.yml
route:
  receiver: "slack"
  group_by: ["job"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h

receivers:
  - name: "slack"
    slack_configs:
      - api_url: https://hooks.slack.com/services/XXX/YYY/ZZZ
        channel: "#alerts"
        text: "{{ .CommonAnnotations.summary }}\n{{ .CommonAnnotations.description }}"

On EKS:

Use kube-prometheus-stack Helm chart for Prometheus/Alertmanager/Grafana in one go.
Add ServiceMonitors for your app's Service and exporters.

Step 3 — Visualize with Grafana

Options:

Self-managed Grafana on EC2/EKS/ECS.
Amazon Managed Grafana for reduced ops and built-in SSO.

Add Prometheus data source

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

Quick-start dashboards

Node Exporter Full (popular, ID 1860) for host metrics.
Prometheus Stats dashboards for TSDB health.
Import MongoDB exporter community dashboards.
Build custom panels: p95 latency, RPS, 5xx rate, queue depths, MongoDB ops/s, connections, cache hit ratio.

Grafana alerting

Create alert rules on panels (e.g., p95 latency > 300ms for 10m).
Route via contact points (Slack, PagerDuty, Email/SNS).

SSO and security

Enable OAuth/OIDC (Cognito, Okta, Google) for user auth.
Restrict Grafana ingress via ALB + WAF, put behind private subnets if internal.

Step 4 — Centralized Logging with ELK (or OpenSearch)

While “Elasticsearch” is the common term, on AWS you'll typically use Amazon OpenSearch Service (Elasticsearch-compatible). For self-managed, you can still run Elasticsearch on EC2/EKS.

Recommended logging shape

App logs: JSON, one line per event.
Include fields: timestamp, level, message, service, env, version, requestId, userId (if applicable).
Avoid high-cardinality fields (e.g., raw user input) as tags.

Node JSON logging using pino

import pino from "pino";
const logger = pino({
  level: process.env.LOG_LEVEL || "info",
  base: { service: "node-api", env: process.env.NODE_ENV || "dev" },
  timestamp: pino.stdTimeFunctions.isoTime
});
logger.info({ msg: "Startup", version: process.env.APP_VERSION });

Log shippers

EC2/VM: Filebeat or Fluent Bit tails files and ships to Logstash.
ECS: Use FireLens (Fluent Bit) to route container logs.
EKS: Use Fluent Bit DaemonSet to capture container stdout/stderr.

Filebeat example (EC2)

# filebeat.yml
filebeat.inputs:
  - type: filestream
    id: node-api-logs
    paths: ["/var/log/node-api/*.log"]
    fields:
      service: node-api
      env: prod
    fields_under_root: true

output.logstash:
  hosts: ["logstash.internal:5044"]

Fluent Bit example (EKS)

# values override for fluent-bit helm chart (snippet)
[INPUT]
  Name              tail
  Path              /var/log/containers/*.log
  Parser            docker
  Tag               kube.*
[FILTER]
  Name              kubernetes
  Match             kube.*
  Merge_Log         On
  Keep_Log          Off
[OUTPUT]
  Name              forward
  Match             *
  Host              logstash.internal
  Port              5044

Logstash pipeline

# pipelines/logs.conf
input {
  beats { port => 5044 }
  # or tcp/udp/http if using Fluent Bit forward/http
}

filter {
  json { source => "message" target => "json" skip_on_invalid_json => true }
  if [json][level] { mutate { add_field => { "level" => "%{[json][level]}" } } }
  date {
    match => ["[json][timestamp]", "ISO8601"]
    target => "@timestamp"
    remove_field => ["[json][timestamp]"]
  }
  mutate {
    rename => { "json" => "event" }
    add_field => { "service" => "%{[event][service]}" }
  }
}

output {
  # Self-managed Elasticsearch
  # elasticsearch {
  #   hosts => ["https://elasticsearch.internal:9200"]
  #   index => "logs-%{[service]}-%{+YYYY.MM.dd}"
  #   ilm_enabled => true
  #   ilm_rollover_alias => "logs-%{[service]}"
  #   ilm_pattern => "000001"
  # }

  # OpenSearch Service (preferred on AWS). Use the OpenSearch output with SigV4 if available in your Logstash version.
  opensearch {
    hosts => ["https://search-my-domain.us-east-1.es.amazonaws.com"]
    index => "logs-%{[service]}-%{+YYYY.MM.dd}"
    ssl => true
    # For IAM auth, enable SigV4 in your plugin/version and avoid static keys.
    # aws_sigv4 => true
    # region => "us-east-1"
    # For testing only (static creds) - prefer instance profiles/IRSA:
    # user => "${AWS_ACCESS_KEY_ID}"
    # password => "${AWS_SECRET_ACCESS_KEY}"
  }
}

Kibana (or OpenSearch Dashboards)

Create index patterns: logs-*
Build visualizations: error rate by service, latency buckets, top loggers, slow endpoints, MongoDB slow queries.
Add saved searches for 5xx traces, correlation by requestId.

Index Lifecycle Management (ILM)

Hot: index and search for 3-7 days.
Warm: less frequent querying.
Cold: cheaper storage (snapshot to S3).
Delete: enforce retention (e.g., 14-30 days) to control cost.

Step 5 — Secure, Scale, and Automate

Security

TLS everywhere: Ingress (ALB), Grafana, Prometheus web, Logstash beats input, OpenSearch.
AuthN/AuthZ: SSO for Grafana/Kibana; Grafana folder/datasource permissions.
IAM: Use roles (EC2 instance profiles or IRSA on EKS) for OpenSearch access; avoid static keys.
Network: Private subnets for data plane; restrict Security Groups; consider AWS WAF on public endpoints.

Scalability and performance

Metrics cardinality: Avoid label explosion (e.g., userId as a label). Keep label sets small and bounded.
Scrape intervals: 15s is a good default; lower only when necessary.
Prometheus storage: Use larger volumes or remote_write (e.g., Cortex/Mimir/AMP) for long retention.
Log volumes: Use ILM and sampling; avoid logging large payloads or PII.

Backups and DR

OpenSearch snapshots to S3 on schedule.
Versioned IaC (Terraform/CDK/Helm) for reproducible environments.

Automation (IaC)

Terraform modules for VPC, ECS/EKS, OpenSearch domains, ALB, SGs.
Helm charts:
- prometheus-community/kube-prometheus-stack
- grafana/loki-stack (if you ever opt for Loki instead of ELK)
- fluent/fluent-bit
- opensearch-project/opensearch, opensearch-dashboards (self-managed)

Optional — Local Dev Sandbox (docker-compose)

Great for learning and validating pipelines before AWS deployment.

# docker-compose.yml
version: "3.8"
services:
  node-api:
    build: .
    environment:
      - LOG_LEVEL=info
      - NODE_ENV=dev
    ports: ["3000:3000"]
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports: ["9090:9090"]
  grafana:
    image: grafana/grafana
    ports: ["3001:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    depends_on: [prometheus]
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ports: ["9200:9200"]
  logstash:
    image: docker.elastic.co/logstash/logstash:8.13.0
    volumes:
      - ./pipelines:/usr/share/logstash/pipeline
    ports: ["5044:5044"]
  kibana:
    image: docker.elastic.co/kibana/kibana:8.13.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports: ["5601:5601"]
  filebeat:
    image: docker.elastic.co/beats/filebeat:8.13.0
    user: root
    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
      - ./logs:/var/log/node-api
    depends_on: [logstash]

Cost and Capacity Planning

Metrics
- Primary driver: time series count = unique combinations of metric name + label set.
- Start with 15s scrape, 7-14 days retention; plan ~2-8 bytes per sample + overhead.
Logs
- Drivers: daily ingest GBs, index count, replicas, query patterns.
- Start with 1-2 data nodes (OpenSearch) for dev/staging; 3+ for HA production.
- ILM to keep hot data small; snapshot older indices to S3.
Grafana/OpenSearch Dashboards
- Minimize public exposure; use managed services for predictable ops cost.

Common Pitfalls (and How to Avoid Them)

Unbounded labels in Prometheus leading to OOM: never use user/session IDs as labels.
Chatty logs: implement log levels; avoid logging large payloads or secrets.
Missing request correlation: add requestId to logs and propagate across services.
Alert fatigue: start with SLO-aligned alerts (e.g., latency, error rate) and tune thresholds.
One “monster” cluster: split dev/staging/prod; enforce per-env isolation and quotas.

Quick Start Checklists

Metrics

Add prom-client to Node/Express and expose /metrics.
Deploy Prometheus + Alertmanager; add scrape configs and alert rules.
Connect Grafana to Prometheus; import baseline dashboards.

Logs

Standardize JSON logging.
Ship logs with Filebeat/Fluent Bit to Logstash.
Parse/enrich in Logstash; store in OpenSearch/Elasticsearch.
Build Kibana dashboards and saved searches.
Apply ILM and snapshots.

Security/Operations

Private networking, TLS, SSO, IAM-based access.
IaC for all infrastructure.
SLOs defined; alerts mapped to user-impact.

Conclusion

Integrating Prometheus, Grafana, Elasticsearch (or OpenSearch), Kibana, and Logstash with your MERN application on AWS is a powerful strategy for enhancing monitoring, analytics, and performance management. By instrumenting your Node/Express services, centralizing logs, and building curated dashboards and alerts, you gain deep visibility into system health and user experience. Start small with essential metrics and logs, secure your stack, enforce retention and ILM to control costs, and iterate as your application grows.

With a thoughtful architecture and the practices outlined above, you'll move from reactive firefighting to proactive, data-driven operations—elevating reliability, performance, and ultimately your users' satisfaction.