All topics
General · Learning hub

System Design notes for developers

Master System Design with a curated set of 2 developer notes — core concepts, patterns, and interview prep. Maintained by the DevRecall team.

Save this stack to your DevRecallMore General notes
System Design

Scalability & Architecture

Scalability & Architecture Scalability Patterns Horizontal scaling — add more servers; requires stateless app (store sessions in Redis, not in-process) Load bal

Scalability & Architecture

Scalability Patterns

  • Horizontal scaling — add more servers; requires stateless app (store sessions in Redis, not in-process)

  • Load balancing — distribute traffic (round-robin, least-connections, IP hash for sticky sessions)

  • Caching — CDN (static), Redis/Memcached (app), browser cache, HTTP cache headers

  • Database scaling — read replicas (read from replica, write to primary), sharding (horizontal partitioning), vertical scaling

  • Async processing — offload slow tasks (email, image processing) to queues (SQS, RabbitMQ, Bull)

  • CDN — serve static assets and cacheable content from edge locations near users

CAP Theorem

Distributed systems can only guarantee 2 of 3: Consistency (every read gets the most recent write), Availability (every request gets a non-error response), Partition Tolerance (system works despite network partitions). Since network partitions happen, real systems choose CP (PostgreSQL, MongoDB) or AP (DynamoDB, Cassandra in eventual consistency mode).

Common Components

Client → DNS → CDN (static/cached) → Load Balancer → App Servers → Cache → Database
                                                    ↓
                                             Message Queue → Workers

Key components:
┌──────────────────────────────────────────────────────────┐
│ DNS          — Route 53, Cloudflare (geo-routing, failover)│
│ CDN          — CloudFront, Fastly (static assets, caching) │
│ Load Balancer— ALB, Nginx (health checks, SSL termination) │
│ App Servers  — stateless, auto-scaling group               │
│ Cache Layer  — Redis (sessions, hot data, rate limiting)   │
│ Primary DB   — PostgreSQL, MySQL (writes)                  │
│ Read Replica — for read-heavy queries                      │
│ Object Store — S3 (uploads, large files, backups)          │
│ Message Queue— SQS, RabbitMQ (async jobs, decoupling)      │
│ Workers      — process jobs (emails, image resize, reports)│
│ Search       — Elasticsearch (full-text, faceted search)   │
│ Monitoring   — Datadog, CloudWatch (metrics, alerts)       │
│ Log Aggregator— Elasticsearch/Loki + Grafana               │
└──────────────────────────────────────────────────────────┘

Database Selection Guide

  • PostgreSQL/MySQL — structured data, complex queries, ACID, strong consistency

  • MongoDB — flexible schema, documents with nested data, rapid iteration

  • Redis — caching, sessions, rate limiting, leaderboards, pub/sub

  • Elasticsearch — full-text search, log analytics, faceted filtering

  • Cassandra/DynamoDB — massive scale write-heavy, time-series, multi-region

  • S3/Object Store — files, images, videos, backups, data lake

System Design

Reliability & Interview Framework

Reliability & Interview Framework Reliability Patterns Circuit Breaker — stop calling a failing service after N failures; half-open state to probe recovery Retr

Reliability & Interview Framework

Reliability Patterns

  • Circuit Breaker — stop calling a failing service after N failures; half-open state to probe recovery

  • Retry with backoff — retry transient failures with exponential backoff + jitter

  • Timeout — always set timeouts on outbound calls; fail fast, don't let threads pile up

  • Bulkhead — isolate resources per service so one slow service doesn't exhaust all thread pools

  • Idempotency — make retries safe by ensuring duplicate requests produce same result

  • Health checks — /health endpoint for load balancer; /ready for readiness

  • Graceful shutdown — finish in-flight requests before shutting down; drain connections

System Design Interview — RESHADED Framework

1. Requirements clarification (5 min)
   - Functional: what does the system DO?
   - Non-functional: scale, latency, availability, consistency
   - Out of scope: what are we NOT building?
   - "How many users? DAU? Writes/reads per second?"

2. Estimation (3 min)
   - DAU × avg requests = RPS
   - Storage: object size × writes/day × retention

3. System Interface (2 min)
   - Key API endpoints / data model

4. High-Level Design (10 min)
   - Draw the major components: client, API gateway, services, DB, cache, queue
   - Data flow for the main use cases

5. Detailed Design (15 min)
   - Deep-dive the interesting/hard parts
   - Database schema, sharding strategy, cache invalidation

6. Bottlenecks & Trade-offs (5 min)
   - Single points of failure
   - What breaks at 10x scale?
   - Cost vs performance trade-offs

Common numbers to know:
- Read from memory: ~100ns
- Read from SSD: ~100µs (1000× slower)
- Read from network: ~10ms
- 1 million requests/day = ~12 RPS
- 1 billion requests/day = ~12,000 RPS
- Average web request: ~1KB
- 1 million users × 1KB = ~1GB/user data
- 99.9% availability = 8.7 hours downtime/year
- 99.99% availability = 52 minutes downtime/year

Caching Strategy Reference

  • Cache-aside (lazy loading) — app checks cache, on miss fetches from DB and writes to cache. Simple, but initial miss is slow

  • Write-through — write to cache AND DB synchronously. Cache always up-to-date, but slower writes

  • Write-behind (write-back) — write to cache first, async write to DB. Fast writes, risk of data loss

  • Read-through — cache handles DB fetch on miss. Simpler app code, cache library manages it

Keep your System Design knowledge sharp.

Save this stack to your personal DevRecall — add your own notes, track what you're learning, and share what you know with the community.

Get started — free forever