All topics
DevOps · Learning hub

Grafana notes for developers

Master Grafana with a curated set of 3 developer notes — core concepts, patterns, and interview prep. Maintained by the DevRecall team.

Save this stack to your DevRecallMore DevOps notes
Grafana

Dashboards & Panels

Grafana: Dashboards & Panels Grafana is the leading open-source observability platform for visualizing metrics, logs, and traces. It connects to dozens of data

Grafana: Dashboards & Panels

Grafana is the leading open-source observability platform for visualizing metrics, logs, and traces. It connects to dozens of data sources and organizes data into dashboards made of panels.

Core Concepts

  • Dashboard: collection of panels arranged in a grid — each saved as JSON, can be version-controlled

  • Panel: a single visualization unit — time series, stat, table, heatmap, bar chart, etc.

  • Data source: connection to a backend (Prometheus, Loki, InfluxDB, PostgreSQL, CloudWatch, etc.)

  • Variable: template variable that makes dashboards dynamic — dropdown to filter by host, service, namespace

  • Annotation: mark a point in time on panels — e.g., "deployment happened here"

  • Time range: global dashboard time picker — "Last 1h", "Last 24h", or custom

Panel Types

  • Time series: line/bar chart over time — default for metrics (CPU, memory, request rate)

  • Stat: single large number with sparkline — current value with optional threshold coloring

  • Gauge: circular progress indicator — fill % relative to min/max

  • Bar chart: compare values across categories at a point in time

  • Table: raw data in tabular format — supports pagination, sorting, column mapping

  • Heatmap: visualize distributions over time (e.g., request latency percentiles)

  • Logs: log lines from Loki — supports regex filtering, log context

  • Node Graph: visualize service dependency graphs (traces/APM)

  • Geomap: world map with data points — for geographic metrics

Dashboard Variables

Variables make dashboards reusable across environments/services.

Types:
  Query     — values come from a data source query (e.g., all hostnames from Prometheus label)
  Custom    — comma-separated static values
  Interval  — time interval ($__interval resolves to auto step)
  Text box  — free text input
  Constant  — hidden value (e.g., data source name)
  Data source — select a data source dynamically

Example — Query variable for host selection:
  Name: host
  Query type: Label values
  Label: instance
  Data source: Prometheus
  → Creates a dropdown: $host = "web-01" | "web-02" | "db-01"

Using in panels: rate(http_requests_total{instance="${host}"}[5m])

Dashboard JSON & Import/Export

# Export dashboard JSON
# Dashboard → Share → Export → Save to file (dashboard.json)

# Import from JSON
# + → Import → Upload JSON file or paste JSON or use Grafana.com ID

# Grafana.com dashboard library (thousands of community dashboards)
# Popular IDs:
#   1860  — Node Exporter Full (Linux system metrics)
#   3662  — Prometheus 2.0 Overview
#  13659  — Loki & Promtail logs
#  15489  — PostgreSQL overview
#   6417  — Kubernetes cluster monitoring
Grafana

Data Sources & PromQL Queries

Grafana: Data Sources & PromQL Connecting Data Sources Configuration → Data Sources → Add data source Prometheus: URL (e.g., http://prometheus:9090), no auth fo

Grafana: Data Sources & PromQL

Connecting Data Sources

  • Configuration → Data Sources → Add data source

  • Prometheus: URL (e.g., http://prometheus:9090), no auth for internal; enable exemplars for trace linking

  • Loki: URL (e.g., http://loki:3100) — log aggregation, pairs with Prometheus

  • InfluxDB: Flux or InfluxQL query language — for time-series databases

  • PostgreSQL / MySQL: direct SQL queries on relational databases

  • CloudWatch: AWS metrics and logs — requires IAM role or access key

  • Elasticsearch / OpenSearch: logs, traces, full-text search

  • Jaeger / Tempo / Zipkin: distributed tracing backends

  • TestData: built-in fake data source — great for building dashboards without real data

PromQL Essentials

PromQL (Prometheus Query Language) is used in Grafana panels when the data source is Prometheus. Understanding it is essential for building useful dashboards.

# Instant vector — current value of a metric
http_requests_total

# Filter by label
http_requests_total{job="api", status="200"}

# Range vector — values over a time window
http_requests_total[5m]

# rate() — per-second rate from a counter (use with range vector)
rate(http_requests_total{status!="200"}[5m])

# irate() — instantaneous rate (last two samples) — more responsive but noisy
irate(http_requests_total[5m])

# increase() — total increase over window (rate * duration)
increase(http_requests_total[1h])

# sum() — aggregate across labels
sum(rate(http_requests_total[5m])) by (status)

# avg, min, max, count
avg(node_cpu_seconds_total{mode="idle"}) by (instance)

# Histogram quantiles (p50, p95, p99)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

# Arithmetic
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

# Comparison — only return when condition is true
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1

LogQL — Querying Loki

# Stream selector (required)
{app="nginx", env="production"}

# Filter by log content
{app="api"} |= "ERROR"
{app="api"} != "health"
{app="api"} |~ "status=5[0-9][0-9]"   # regex match

# Parse JSON logs
{app="api"} | json | level="error"

# Extract fields and aggregate
sum(rate({app="api"} | json | status="500" [5m])) by (endpoint)

# Log volume over time (for bar chart or time series)
sum(count_over_time({app="api"}[1m])) by (level)

Grafana Query Inspector

  • Panel menu → Inspect → Query: see the exact query sent to the data source

  • Panel menu → Inspect → Data: see raw response data in table format

  • Panel menu → Inspect → Stats: query execution time, number of data points

  • Use Query Inspector to debug why a panel shows "No data" or unexpected values

Grafana

Alerts, Provisioning & Grafana Cloud

Grafana: Alerts, Provisioning & Cloud Grafana Alerting Grafana Unified Alerting (v8+) centralizes alert rules, contact points, and notification policies. Rules

Grafana: Alerts, Provisioning & Cloud

Grafana Alerting

Grafana Unified Alerting (v8+) centralizes alert rules, contact points, and notification policies. Rules evaluate periodically and fire when conditions are met.

  • Alert rule: condition on a query (e.g., error rate > 5% for 5 minutes)

  • Contact point: where to send alerts — Slack, PagerDuty, email, webhook, OpsGenie

  • Notification policy: routing tree — which alerts go to which contact point based on labels

  • Silences: suppress alerts for a time range (maintenance windows)

  • Alert states: Normal → Pending (condition met but not for long enough) → Firing → Resolved

# Alert rule (Alerting → Alert rules → New alert rule)
# Or define via Terraform / provisioning YAML

# Example conditions:
# A: query — rate(http_requests_total{status=~"5.."}[5m])
# B: query — rate(http_requests_total[5m])
# C: expression — A / B > 0.05  (error rate > 5%)
# Condition: C is above 0.05 for 5m

# Labels on the rule:
severity: critical
team: backend
env: production

Provisioning (Infrastructure as Code)

Grafana supports provisioning dashboards, data sources, and alert rules via YAML files — dashboards are read from disk on startup.

# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true
    jsonData:
      timeInterval: 15s

# /etc/grafana/provisioning/dashboards/default.yaml
apiVersion: 1
providers:
  - name: Default
    type: file
    options:
      path: /var/lib/grafana/dashboards
      # Grafana reads all .json files from this directory

Docker Compose Setup

# docker-compose.yml — Grafana + Prometheus + Node Exporter
services:
  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana-data:/var/lib/grafana
      - ./provisioning:/etc/grafana/provisioning

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  node-exporter:
    image: prom/node-exporter:latest
    ports: ["9100:9100"]

volumes:
  grafana-data:

Grafana Cloud

  • Hosted Grafana: free tier — 10k series Prometheus, 50GB Loki logs, 50GB Tempo traces, 14-day retention

  • Grafana Agent: lightweight collector that scrapes metrics and ships to Grafana Cloud

  • Prometheus remote_write: push metrics from self-hosted Prometheus to Grafana Cloud

  • Grafana OnCall: on-call scheduling and escalation policies (included in Cloud)

  • k6: load testing tool — results visualized natively in Grafana

  • Grafana Mimir: horizontally scalable Prometheus-compatible backend (open-source)

  • Grafana Alloy: next-gen agent, replaces Grafana Agent — supports OTEL natively

Keep your Grafana knowledge sharp.

Save this stack to your personal DevRecall — add your own notes, track what you're learning, and share what you know with the community.

Get started — free forever