Grafana

Dashboards & Panels

Grafana: Dashboards & Panels Grafana is the leading open-source observability platform for visualizing metrics, logs, and traces. It connects to dozens of data …

Grafana: Dashboards & Panels

Grafana is the leading open-source observability platform for visualizing metrics, logs, and traces. It connects to dozens of data sources and organizes data into dashboards made of panels.

Core Concepts

Dashboard: collection of panels arranged in a grid — each saved as JSON, can be version-controlled
Panel: a single visualization unit — time series, stat, table, heatmap, bar chart, etc.
Data source: connection to a backend (Prometheus, Loki, InfluxDB, PostgreSQL, CloudWatch, etc.)
Variable: template variable that makes dashboards dynamic — dropdown to filter by host, service, namespace
Annotation: mark a point in time on panels — e.g., "deployment happened here"
Time range: global dashboard time picker — "Last 1h", "Last 24h", or custom

Panel Types

Time series: line/bar chart over time — default for metrics (CPU, memory, request rate)
Stat: single large number with sparkline — current value with optional threshold coloring
Gauge: circular progress indicator — fill % relative to min/max
Bar chart: compare values across categories at a point in time
Table: raw data in tabular format — supports pagination, sorting, column mapping
Heatmap: visualize distributions over time (e.g., request latency percentiles)
Logs: log lines from Loki — supports regex filtering, log context
Node Graph: visualize service dependency graphs (traces/APM)
Geomap: world map with data points — for geographic metrics

Dashboard Variables

Variables make dashboards reusable across environments/services.

Types:
  Query     — values come from a data source query (e.g., all hostnames from Prometheus label)
  Custom    — comma-separated static values
  Interval  — time interval ($__interval resolves to auto step)
  Text box  — free text input
  Constant  — hidden value (e.g., data source name)
  Data source — select a data source dynamically

Example — Query variable for host selection:
  Name: host
  Query type: Label values
  Label: instance
  Data source: Prometheus
  → Creates a dropdown: $host = "web-01" | "web-02" | "db-01"

Using in panels: rate(http_requests_total{instance="${host}"}[5m])

Dashboard JSON & Import/Export

# Export dashboard JSON
# Dashboard → Share → Export → Save to file (dashboard.json)

# Import from JSON
# + → Import → Upload JSON file or paste JSON or use Grafana.com ID

# Grafana.com dashboard library (thousands of community dashboards)
# Popular IDs:
#   1860  — Node Exporter Full (Linux system metrics)
#   3662  — Prometheus 2.0 Overview
#  13659  — Loki & Promtail logs
#  15489  — PostgreSQL overview
#   6417  — Kubernetes cluster monitoring

Grafana

Data Sources & PromQL Queries

Grafana: Data Sources & PromQL Connecting Data Sources Configuration → Data Sources → Add data source Prometheus: URL (e.g., http://prometheus:9090), no auth fo…

Grafana: Data Sources & PromQL

Connecting Data Sources

Configuration → Data Sources → Add data source
Prometheus: URL (e.g., http://prometheus:9090), no auth for internal; enable exemplars for trace linking
Loki: URL (e.g., http://loki:3100) — log aggregation, pairs with Prometheus
InfluxDB: Flux or InfluxQL query language — for time-series databases
PostgreSQL / MySQL: direct SQL queries on relational databases
CloudWatch: AWS metrics and logs — requires IAM role or access key
Elasticsearch / OpenSearch: logs, traces, full-text search
Jaeger / Tempo / Zipkin: distributed tracing backends
TestData: built-in fake data source — great for building dashboards without real data

PromQL Essentials

PromQL (Prometheus Query Language) is used in Grafana panels when the data source is Prometheus. Understanding it is essential for building useful dashboards.

# Instant vector — current value of a metric
http_requests_total

# Filter by label
http_requests_total{job="api", status="200"}

# Range vector — values over a time window
http_requests_total[5m]

# rate() — per-second rate from a counter (use with range vector)
rate(http_requests_total{status!="200"}[5m])

# irate() — instantaneous rate (last two samples) — more responsive but noisy
irate(http_requests_total[5m])

# increase() — total increase over window (rate * duration)
increase(http_requests_total[1h])

# sum() — aggregate across labels
sum(rate(http_requests_total[5m])) by (status)

# avg, min, max, count
avg(node_cpu_seconds_total{mode="idle"}) by (instance)

# Histogram quantiles (p50, p95, p99)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

# Arithmetic
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

# Comparison — only return when condition is true
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1

LogQL — Querying Loki

# Stream selector (required)
{app="nginx", env="production"}

# Filter by log content
{app="api"} |= "ERROR"
{app="api"} != "health"
{app="api"} |~ "status=5[0-9][0-9]"   # regex match

# Parse JSON logs
{app="api"} | json | level="error"

# Extract fields and aggregate
sum(rate({app="api"} | json | status="500" [5m])) by (endpoint)

# Log volume over time (for bar chart or time series)
sum(count_over_time({app="api"}[1m])) by (level)

Grafana Query Inspector

Panel menu → Inspect → Query: see the exact query sent to the data source
Panel menu → Inspect → Data: see raw response data in table format
Panel menu → Inspect → Stats: query execution time, number of data points
Use Query Inspector to debug why a panel shows "No data" or unexpected values

Grafana

Alerts, Provisioning & Grafana Cloud

Grafana: Alerts, Provisioning & Cloud Grafana Alerting Grafana Unified Alerting (v8+) centralizes alert rules, contact points, and notification policies. Rules …

Grafana: Alerts, Provisioning & Cloud

Grafana Alerting

Grafana Unified Alerting (v8+) centralizes alert rules, contact points, and notification policies. Rules evaluate periodically and fire when conditions are met.

Alert rule: condition on a query (e.g., error rate > 5% for 5 minutes)
Contact point: where to send alerts — Slack, PagerDuty, email, webhook, OpsGenie
Notification policy: routing tree — which alerts go to which contact point based on labels
Silences: suppress alerts for a time range (maintenance windows)
Alert states: Normal → Pending (condition met but not for long enough) → Firing → Resolved

# Alert rule (Alerting → Alert rules → New alert rule)
# Or define via Terraform / provisioning YAML

# Example conditions:
# A: query — rate(http_requests_total{status=~"5.."}[5m])
# B: query — rate(http_requests_total[5m])
# C: expression — A / B > 0.05  (error rate > 5%)
# Condition: C is above 0.05 for 5m

# Labels on the rule:
severity: critical
team: backend
env: production

Provisioning (Infrastructure as Code)

Grafana supports provisioning dashboards, data sources, and alert rules via YAML files — dashboards are read from disk on startup.

# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true
    jsonData:
      timeInterval: 15s

# /etc/grafana/provisioning/dashboards/default.yaml
apiVersion: 1
providers:
  - name: Default
    type: file
    options:
      path: /var/lib/grafana/dashboards
      # Grafana reads all .json files from this directory

Docker Compose Setup

# docker-compose.yml — Grafana + Prometheus + Node Exporter
services:
  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana-data:/var/lib/grafana
      - ./provisioning:/etc/grafana/provisioning

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  node-exporter:
    image: prom/node-exporter:latest
    ports: ["9100:9100"]

volumes:
  grafana-data:

Grafana Cloud

Hosted Grafana: free tier — 10k series Prometheus, 50GB Loki logs, 50GB Tempo traces, 14-day retention
Grafana Agent: lightweight collector that scrapes metrics and ships to Grafana Cloud
Prometheus remote_write: push metrics from self-hosted Prometheus to Grafana Cloud
Grafana OnCall: on-call scheduling and escalation policies (included in Cloud)
k6: load testing tool — results visualized natively in Grafana
Grafana Mimir: horizontally scalable Prometheus-compatible backend (open-source)
Grafana Alloy: next-gen agent, replaces Grafana Agent — supports OTEL natively

Grafana notes for developers