Metrics, Alarms & Dashboards
CloudWatch: Metrics, Alarms & Dashboards Amazon CloudWatch is AWS's observability service — metrics, logs, alarms, and dashboards. Almost all AWS services publi…
CloudWatch: Metrics, Alarms & Dashboards
Amazon CloudWatch is AWS's observability service — metrics, logs, alarms, and dashboards. Almost all AWS services publish metrics to CloudWatch automatically.
Metrics
# List available metrics
aws cloudwatch list-metrics --namespace AWS/EC2 --metric-name CPUUtilization
# Get metric statistics
aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization --dimensions Name=InstanceId,Value=i-1234567890abcdef0 --start-time 2024-03-15T00:00:00Z --end-time 2024-03-16T00:00:00Z --period 3600 --statistics Average Maximum
# Common namespaces and metrics:
# AWS/EC2: CPUUtilization, NetworkIn/Out, DiskReadOps, StatusCheckFailed
# AWS/RDS: CPUUtilization, DatabaseConnections, FreeStorageSpace, ReadIOPS
# AWS/ALB: RequestCount, TargetResponseTime, HTTPCode_Target_5XX_Count
# AWS/Lambda: Invocations, Duration, Errors, Throttles, ConcurrentExecutions
# AWS/S3: NumberOfObjects, BucketSizeBytes (daily)
# AWS/SQS: NumberOfMessagesSent, ApproximateNumberOfMessagesVisible
# AWS/ECS: CPUUtilization, MemoryUtilization
# AWS/DynamoDB: ConsumedReadCapacityUnits, SystemErrors, SuccessfulRequestLatencyCustom Metrics
# Publish custom metric via CLI
aws cloudwatch put-metric-data --namespace "MyApp" --metric-name "OrdersProcessed" --value 42 --unit Count --dimensions Environment=prod,Service=checkout
# With high-resolution (1-second granularity)
aws cloudwatch put-metric-data --namespace "MyApp" --metric-name "ApiLatency" --value 123.5 --unit Milliseconds --storage-resolution 1import boto3
cloudwatch = boto3.client('cloudwatch')
# Publish metric
cloudwatch.put_metric_data(
Namespace='MyApp',
MetricData=[
{
'MetricName': 'OrdersProcessed',
'Value': 42,
'Unit': 'Count',
'Dimensions': [
{'Name': 'Environment', 'Value': 'prod'},
{'Name': 'Service', 'Value': 'checkout'},
],
},
{
'MetricName': 'ApiLatency',
'Value': 123.5,
'Unit': 'Milliseconds',
'Dimensions': [{'Name': 'Endpoint', 'Value': '/api/orders'}],
},
]
)
# Batch publish (max 1000 metrics per call)
# Use CloudWatch agent for infrastructure metrics (memory, disk — not built-in)Alarms
# Create alarm
aws cloudwatch put-metric-alarm --alarm-name "High-CPU-i-1234567890" --alarm-description "CPU over 80% for 5 minutes" --namespace AWS/EC2 --metric-name CPUUtilization --dimensions Name=InstanceId,Value=i-1234567890abcdef0 --statistic Average --period 60 --evaluation-periods 5 --threshold 80 --comparison-operator GreaterThanThreshold --alarm-actions arn:aws:sns:us-east-1:123456789:my-alerts --ok-actions arn:aws:sns:us-east-1:123456789:my-alerts --treat-missing-data breaching
# Composite alarm (AND/OR of other alarms)
aws cloudwatch put-composite-alarm --alarm-name "Service-Down" --alarm-rule "ALARM("High-CPU") AND ALARM("High-Latency")"
# List alarms
aws cloudwatch describe-alarms --alarm-names "High-CPU-i-1234567890"
aws cloudwatch describe-alarms --state-value ALARM
# Set alarm state (for testing)
aws cloudwatch set-alarm-state --alarm-name "High-CPU-i-1234567890" --state-value ALARM --state-reason "Testing"
# Delete alarm
aws cloudwatch delete-alarms --alarm-names "High-CPU-i-1234567890"Dashboards
# Create dashboard (JSON body defines widgets)
aws cloudwatch put-dashboard --dashboard-name "MyApp-Prod" --dashboard-body file://dashboard.json
# dashboard.json widgets example:
# {
# "widgets": [
# {
# "type": "metric",
# "properties": {
# "title": "CPU Utilization",
# "metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "i-xxx"]],
# "period": 300,
# "stat": "Average",
# "view": "timeSeries"
# }
# },
# {
# "type": "alarm",
# "properties": {
# "title": "Active Alarms",
# "alarms": ["arn:aws:cloudwatch:us-east-1:123:alarm:High-CPU"]
# }
# }
# ]
# }
# List dashboards
aws cloudwatch list-dashboards