All topics
Cloud · Learning hub

CloudWatch notes for developers

Master CloudWatch with a curated set of 2 developer notes — core concepts, patterns, and interview prep. Maintained by the DevRecall team.

Save this stack to your DevRecallMore Cloud notes
CloudWatch

Metrics, Alarms & Dashboards

CloudWatch: Metrics, Alarms & Dashboards Amazon CloudWatch is AWS's observability service — metrics, logs, alarms, and dashboards. Almost all AWS services publi

CloudWatch: Metrics, Alarms & Dashboards

Amazon CloudWatch is AWS's observability service — metrics, logs, alarms, and dashboards. Almost all AWS services publish metrics to CloudWatch automatically.

Metrics

# List available metrics
aws cloudwatch list-metrics --namespace AWS/EC2 --metric-name CPUUtilization

# Get metric statistics
aws cloudwatch get-metric-statistics   --namespace AWS/EC2   --metric-name CPUUtilization   --dimensions Name=InstanceId,Value=i-1234567890abcdef0   --start-time 2024-03-15T00:00:00Z   --end-time 2024-03-16T00:00:00Z   --period 3600   --statistics Average Maximum

# Common namespaces and metrics:
# AWS/EC2:         CPUUtilization, NetworkIn/Out, DiskReadOps, StatusCheckFailed
# AWS/RDS:         CPUUtilization, DatabaseConnections, FreeStorageSpace, ReadIOPS
# AWS/ALB:         RequestCount, TargetResponseTime, HTTPCode_Target_5XX_Count
# AWS/Lambda:      Invocations, Duration, Errors, Throttles, ConcurrentExecutions
# AWS/S3:          NumberOfObjects, BucketSizeBytes (daily)
# AWS/SQS:         NumberOfMessagesSent, ApproximateNumberOfMessagesVisible
# AWS/ECS:         CPUUtilization, MemoryUtilization
# AWS/DynamoDB:    ConsumedReadCapacityUnits, SystemErrors, SuccessfulRequestLatency

Custom Metrics

# Publish custom metric via CLI
aws cloudwatch put-metric-data   --namespace "MyApp"   --metric-name "OrdersProcessed"   --value 42   --unit Count   --dimensions Environment=prod,Service=checkout

# With high-resolution (1-second granularity)
aws cloudwatch put-metric-data   --namespace "MyApp"   --metric-name "ApiLatency"   --value 123.5   --unit Milliseconds   --storage-resolution 1
import boto3

cloudwatch = boto3.client('cloudwatch')

# Publish metric
cloudwatch.put_metric_data(
    Namespace='MyApp',
    MetricData=[
        {
            'MetricName': 'OrdersProcessed',
            'Value': 42,
            'Unit': 'Count',
            'Dimensions': [
                {'Name': 'Environment', 'Value': 'prod'},
                {'Name': 'Service', 'Value': 'checkout'},
            ],
        },
        {
            'MetricName': 'ApiLatency',
            'Value': 123.5,
            'Unit': 'Milliseconds',
            'Dimensions': [{'Name': 'Endpoint', 'Value': '/api/orders'}],
        },
    ]
)

# Batch publish (max 1000 metrics per call)
# Use CloudWatch agent for infrastructure metrics (memory, disk — not built-in)

Alarms

# Create alarm
aws cloudwatch put-metric-alarm   --alarm-name "High-CPU-i-1234567890"   --alarm-description "CPU over 80% for 5 minutes"   --namespace AWS/EC2   --metric-name CPUUtilization   --dimensions Name=InstanceId,Value=i-1234567890abcdef0   --statistic Average   --period 60   --evaluation-periods 5   --threshold 80   --comparison-operator GreaterThanThreshold   --alarm-actions arn:aws:sns:us-east-1:123456789:my-alerts   --ok-actions arn:aws:sns:us-east-1:123456789:my-alerts   --treat-missing-data breaching

# Composite alarm (AND/OR of other alarms)
aws cloudwatch put-composite-alarm   --alarm-name "Service-Down"   --alarm-rule "ALARM("High-CPU") AND ALARM("High-Latency")"

# List alarms
aws cloudwatch describe-alarms --alarm-names "High-CPU-i-1234567890"
aws cloudwatch describe-alarms --state-value ALARM

# Set alarm state (for testing)
aws cloudwatch set-alarm-state   --alarm-name "High-CPU-i-1234567890"   --state-value ALARM   --state-reason "Testing"

# Delete alarm
aws cloudwatch delete-alarms --alarm-names "High-CPU-i-1234567890"

Dashboards

# Create dashboard (JSON body defines widgets)
aws cloudwatch put-dashboard   --dashboard-name "MyApp-Prod"   --dashboard-body file://dashboard.json

# dashboard.json widgets example:
# {
#   "widgets": [
#     {
#       "type": "metric",
#       "properties": {
#         "title": "CPU Utilization",
#         "metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "i-xxx"]],
#         "period": 300,
#         "stat": "Average",
#         "view": "timeSeries"
#       }
#     },
#     {
#       "type": "alarm",
#       "properties": {
#         "title": "Active Alarms",
#         "alarms": ["arn:aws:cloudwatch:us-east-1:123:alarm:High-CPU"]
#       }
#     }
#   ]
# }

# List dashboards
aws cloudwatch list-dashboards
CloudWatch

Logs, Insights & Container Monitoring

CloudWatch: Logs, Insights & Container Monitoring CloudWatch Logs # Log groups and streams aws logs create-log-group --log-group-name /myapp/prod aws logs creat

CloudWatch: Logs, Insights & Container Monitoring

CloudWatch Logs

# Log groups and streams
aws logs create-log-group --log-group-name /myapp/prod
aws logs create-log-stream   --log-group-name /myapp/prod   --log-stream-name web-server-1

# Set retention
aws logs put-retention-policy   --log-group-name /myapp/prod   --retention-in-days 30   # 1,3,5,7,14,30,60,90,120,150,180,365,400,545,731,1827,3653

# List log groups
aws logs describe-log-groups
aws logs describe-log-groups --log-group-name-prefix /myapp

# Tail logs (like tail -f)
aws logs tail /myapp/prod --follow
aws logs tail /myapp/prod --since 1h
aws logs tail /myapp/prod --filter-pattern "ERROR"

# Get log events
aws logs get-log-events   --log-group-name /myapp/prod   --log-stream-name web-server-1   --start-time $(date -d "1 hour ago" +%s000)

# Filter log events (across streams in a log group)
aws logs filter-log-events   --log-group-name /myapp/prod   --filter-pattern "{ $.level = ERROR }"   --start-time $(date -d "1 hour ago" +%s000)

CloudWatch Logs Insights

Logs Insights is an interactive query language for analyzing log data. Queries run across log groups and return results in seconds.

-- Count errors in last hour
fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as error_count by bin(5m)
| sort @timestamp desc

-- Lambda cold starts
filter @type = "REPORT"
| stats count() as invocations, avg(@duration) as avg_duration,
        sum(@initDuration > 0) as cold_starts
| display invocations, avg_duration, cold_starts

-- API latency percentiles
filter @message like /END/
| parse @message "Duration: * ms" as duration
| stats avg(duration), percentile(duration, 50) as p50,
        percentile(duration, 95) as p95, percentile(duration, 99) as p99

-- Top IPs from ALB access logs
fields @timestamp, @message
| parse @message "* * * * * * * * * * * "* *" * *" as time, elb, client, target,
    request_processing, target_processing, response_processing, elb_status,
    target_status, received_bytes, sent_bytes, request, user_agent, ssl_cipher, ssl_protocol
| stats count() as requests by client
| sort requests desc
| limit 20

-- Find specific user's requests
fields @timestamp, @message
| filter @message like /user-123/
| sort @timestamp desc
| limit 100

CloudWatch Agent (EC2 & On-Premises)

// /opt/aws/amazon-cloudwatch-agent/bin/config.json
{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "namespace": "MyApp/EC2",
    "metrics_collected": {
      "cpu": {
        "measurement": ["cpu_usage_idle", "cpu_usage_user", "cpu_usage_system"],
        "metrics_collection_interval": 60
      },
      "mem": {
        "measurement": ["mem_used_percent"],
        "metrics_collection_interval": 60
      },
      "disk": {
        "measurement": ["used_percent"],
        "resources": ["/", "/data"],
        "metrics_collection_interval": 300
      }
    }
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/myapp/app.log",
            "log_group_name": "/myapp/prod",
            "log_stream_name": "{instance_id}"
          }
        ]
      }
    }
  }
}
# Install and start agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl   -a fetch-config   -m ec2   -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json   -s

Container Insights

  • Container Insights: enhanced monitoring for ECS and EKS — pod/task-level CPU, memory, network metrics.

  • Enable on ECS cluster: aws ecs update-cluster-settings --cluster my-cluster --settings name=containerInsights,value=enabled

  • Enable on EKS: install CloudWatch agent via Helm or EKS add-on; use FluentBit for log forwarding.

  • EMF (Embedded Metric Format): log structured JSON with _aws.CloudWatchMetrics to publish metrics via logs — no SDK required.

  • Synthetics Canaries: run Node.js/Python scripts on a schedule to monitor endpoints — like website uptime checks.

  • RUM (Real User Monitoring): collect browser-side metrics (Core Web Vitals, errors) from actual users.

  • Application Signals: automatic application performance monitoring with traces + SLI/SLO tracking (requires AWS X-Ray agent).

Keep your CloudWatch knowledge sharp.

Save this stack to your personal DevRecall — add your own notes, track what you're learning, and share what you know with the community.

Get started — free forever