System Design Fundamentals
System design isn't about memorizing solutions—it's about understanding trade-offs. Every decision has costs and benefits. The best engineers can articulate those trade-offs clearly and make informed decisions based on actual requirements.
The Foundation: Requirements First
Before drawing any boxes and arrows, understand what you're building:
Functional Requirements
What does the system need to do?
- Users can upload files
- Users can share files with others
- Files can be organized in folders
Non-Functional Requirements
How well does it need to work?
- Scale: How many users? How much data?
- Latency: How fast should responses be?
- Availability: What uptime is required?
- Consistency: Can we tolerate stale data?
There's a huge difference between designing for 1,000 users and 1,000,000 users. Always quantify requirements. "Fast" isn't a requirement—"95th percentile latency under 200ms" is.
1. Scalability: Vertical vs. Horizontal
Vertical Scaling (Scale Up)
Add more power to existing machines: more CPU, more RAM, bigger disks.
- Pros: Simple, no code changes needed
- Cons: Physical limits, single point of failure, expensive
Horizontal Scaling (Scale Out)
Add more machines to distribute the load.
- Pros: Virtually unlimited scaling, better fault tolerance
- Cons: More complex, requires distributed system thinking
// When you scale horizontally, you need a load balancer
Client → Load Balancer → [Server 1]
→ [Server 2]
→ [Server 3]
// Load balancing strategies:
// - Round Robin: Simple rotation
// - Least Connections: Route to server with fewest active connections
// - IP Hash: Same client always hits same server (useful for sessions)
// - Weighted: Route more traffic to more powerful servers
2. The CAP Theorem: Pick Two (Sort Of)
In a distributed system, you can't have all three:
- Consistency: Every read gets the most recent write
- Availability: Every request gets a response
- Partition Tolerance: System works despite network failures
In practice, network partitions happen, so you're choosing between consistency and availability during failures.
// CP System (Consistency + Partition Tolerance)
// Example: Banking system
// During a network partition, some nodes become unavailable
// but data is always consistent
// AP System (Availability + Partition Tolerance)
// Example: Social media feed
// System always responds, but you might see stale data
// Eventually consistent
Most systems don't need to make this choice most of the time. CAP only applies during network partitions. Design for the common case, have a strategy for the edge cases.
3. Databases: SQL vs. NoSQL
SQL Databases (PostgreSQL, MySQL)
Best for: Structured data, complex queries, ACID transactions
-- Strong consistency, complex joins
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id
HAVING COUNT(o.id) > 5;
NoSQL Databases (MongoDB, Cassandra, DynamoDB)
Best for: Flexible schemas, high write throughput, horizontal scaling
// Document store - flexible schema
{
"user_id": "123",
"name": "John",
"orders": [
{ "id": "o1", "total": 99.99, "items": [...] },
{ "id": "o2", "total": 149.99, "items": [...] }
],
"preferences": {
"theme": "dark",
"notifications": true
}
}
When to Use What
- SQL: Financial data, user accounts, anything requiring transactions
- Document DB: Content management, user profiles, catalogs
- Key-Value: Caching, session storage, real-time data
- Wide-Column: Time-series data, analytics, high-write workloads
- Graph: Social networks, recommendation engines, fraud detection
4. Caching: The Performance Multiplier
Caching is the #1 way to improve performance. But it introduces complexity.
// Cache-Aside Pattern (most common)
function getUser(userId) {
// Try cache first
let user = cache.get(`user:${userId}`);
if (user) {
return user; // Cache hit
}
// Cache miss - fetch from database
user = database.query('SELECT * FROM users WHERE id = ?', userId);
// Store in cache for next time
cache.set(`user:${userId}`, user, { ttl: 3600 });
return user;
}
Cache Invalidation Strategies
- TTL (Time To Live): Cache expires after a set time. Simple but can serve stale data.
- Write-Through: Update cache when updating database. Consistent but slower writes.
- Write-Behind: Update cache immediately, sync to database later. Fast but complex.
- Event-Based: Invalidate cache when data changes. Requires event infrastructure.
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Take cache invalidation seriously.
5. Message Queues: Decoupling Services
When Service A doesn't need an immediate response from Service B, use a queue.
// Without queue - tight coupling
async function processOrder(order) {
await saveOrder(order);
await sendEmail(order); // If email fails, order fails
await updateInventory(order); // If inventory fails, order fails
await notifyShipping(order); // If shipping fails, order fails
}
// With queue - loose coupling
async function processOrder(order) {
await saveOrder(order);
// These happen asynchronously, can retry independently
queue.publish('order.created', order);
}
// Separate consumers handle each concern
queue.subscribe('order.created', async (order) => {
await sendEmail(order);
});
queue.subscribe('order.created', async (order) => {
await updateInventory(order);
});
6. Database Replication and Sharding
Replication: Copies of Your Data
// Master-Slave Replication
// Writes go to master, reads can go to slaves
[Write] → [Master DB] → [Slave 1] ← [Read]
→ [Slave 2] ← [Read]
→ [Slave 3] ← [Read]
// Benefits: Read scaling, fault tolerance
// Drawbacks: Replication lag, write bottleneck
Sharding: Splitting Your Data
// Horizontal sharding by user_id
// Users 1-1M → Shard 1
// Users 1M-2M → Shard 2
// Users 2M-3M → Shard 3
function getShard(userId) {
const shardCount = 3;
return userId % shardCount; // Simple modulo sharding
}
// Better: Consistent hashing for easier scaling
7. API Design for Scale
Rate Limiting
// Token bucket algorithm
class RateLimiter {
constructor(tokensPerSecond, bucketSize) {
this.tokens = bucketSize;
this.lastRefill = Date.now();
this.tokensPerSecond = tokensPerSecond;
this.bucketSize = bucketSize;
}
tryConsume() {
this.refill();
if (this.tokens > 0) {
this.tokens--;
return true;
}
return false;
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.bucketSize,
this.tokens + elapsed * this.tokensPerSecond
);
this.lastRefill = now;
}
}
Pagination
// Offset pagination - simple but slow at scale
GET /users?page=1000&limit=20
// Database: OFFSET 20000 LIMIT 20 (scans 20,000 rows!)
// Cursor pagination - efficient at any scale
GET /users?cursor=abc123&limit=20
// Database: WHERE id > cursor LIMIT 20 (uses index)
8. Observability: Know Your System
The Three Pillars
- Logs: Detailed records of what happened
- Metrics: Numerical measurements over time
- Traces: Request flow across services
// Structured logging
logger.info('Order processed', {
orderId: order.id,
userId: order.userId,
total: order.total,
processingTimeMs: endTime - startTime,
paymentMethod: order.paymentMethod
});
// Key metrics to track
- Request rate (requests/second)
- Error rate (errors/requests)
- Latency percentiles (p50, p95, p99)
- Saturation (CPU, memory, connections)
Design Process: A Framework
- Clarify requirements: Ask questions, quantify constraints
- High-level design: Draw the major components
- Deep dive: Detail critical paths and algorithms
- Identify bottlenecks: Where will the system fail first?
- Scale: How do we handle 10x, 100x traffic?
- Trade-offs: What are we sacrificing for our choices?
Conclusion
System design is fundamentally about trade-offs. There's no perfect architecture—only the right architecture for your specific requirements, constraints, and team.
Start simple. Measure everything. Optimize the bottlenecks. Repeat.