What is a Rate-Limiter?

A rate limiter is a system component or algorithm used to control the rate of operations performed by a user, client, or service over a given period. It helps prevent abuse, reduce load, and ensure fair usage of system resources.

Rate Limiter

Fixed Window: Allows 5 requests per 10s window. Counter resets at the start of each window.

Window Progress10.0s remaining

Window: 0.0s / 10s

Requests in window

0/5 requests used

Rate limiters are commonly applied to:

API request throttling
Login attempt restrictions
Messaging or notification limits
Preventing denial-of-service (DoS) attacks

In this chapter, we will explore the low-level design of a rate limiter like system in detail.

Let's start by clarifying the requirements:

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

Discussion

Candidate: Should the rate limiter work on a per-user basis, or should it be based on API key or IP address?

Interviewer: Letâ€™s keep it simple and go with per-user rate limiting. You can assume users are uniquely identified by a user ID or token.

Candidate: Which rate limiting algorithm should we implement fixed window, sliding window, or token bucket?

Interviewer: Use the fixed window algorithm and token bucket if possible for this version. We can consider more advanced approaches later.

Candidate: Should all users have the same rate limit, or can it vary per user?

Interviewer: Assume the same rate limit for all users (e.g., 100 requests per 60 seconds).

Candidate: What should happen when a user exceeds the rate limit? Should we silently drop the request or notify the user?

Interviewer: The system should clearly inform the user that theyâ€™ve exceeded the limit.

Candidate: Do we need to handle concurrency in case of multiple threads trying to access or update rate limits for the same user?

Interviewer: : Yes, the implementation should be thread-safe and handle concurrent access reliably.

After gathering the details, we can summarize the key system requirements.

1.1 Functional Requirements

2 3 4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1718 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 4849 class=ml-4>

Support rate limiting on a per-user basis.
Enforce a fixed number of allowed requests (e.g., 100) within a defined time window (e.g., 60 seconds).
Reject requests that exceed the allowed limit and return an appropriate response.
Provide a simple way to simulate requests in a demo or main method.

1.2 Non-Functional Requirements

Thread-Safety: The rate limiter must handle concurrent access from multiple threads without race conditions.
Modularity: The system should follow object-oriented design principles with clear separation of concerns.
Extensibility: The design should be flexible enough to support other rate limiting strategies like sliding window or token bucket.
Maintainability: The codebase should be clean, testable, and easy to extend or debug.
Performance: The implementation should efficiently support high-frequency request patterns using optimal data structures.

2. Identifying Core Entities

The core of our design challenge is that there isn't one "best" rate-limiting algorithm. Each has its own trade-offs. This is a perfect scenario for the Strategy Design Pattern.

We can define a common RateLimitingStrategy interface and create concrete implementations for each algorithm. This allows the main RateLimiter class to be completely decoupled from the specific algorithm being used.

Popular Algorithms to Consider:

Token Bucket: A simple and popular algorithm. A bucket has a fixed capacity of tokens, which are refilled at a constant rate. Each request consumes one token. If the bucket is empty, the request is rejected.
Fixed Window Counter: The simplest approach. A time window is divided into fixed slots (e.g., 0-60 seconds). We count requests in the current window. At the start of a new window, the count resets. Weakness: A burst of traffic at the edge of a window can exceed the rate (e.g., 10 requests at 00:59 and 10 at 01:00).
Sliding Window Log: The most accurate approach. We store timestamps of all requests within the window. When a new request arrives, we discard all timestamps older than the window and count the remaining ones. Weakness: Can be memory-intensive.

Key Entities/Classes:

RateLimiter (Facade/Context): The main entry point for clients. It manages a map of client IDs to their specific rate-limiting rules and strategies.
RateLimitingStrategy (Interface): Defines the isAllowed() contract that all algorithms must implement.
SlidingWindowLogStrategy (Concrete Strategy): An implementation of the strategy interface that uses a queue of timestamps to track requests.
Rule: A simple data object to hold the configuration for a rate limit (e.g., 100 requests per 60 seconds).
UserRateLimiter: A helper class that bundles a user's specific Rule and their instance of a RateLimitingStrategy.

3. Designing Classes and Relationships

This section breaks down the system's architecture into its fundamental classes, their responsibilities, and the relationships that connect them. We also explore the key design patterns that provide robustness and flexibility to the solution.

3.1 Class Definitions

The system is composed of several types of classes, each with a distinct role.

Enums

There are no enums used in this design.

Data Classes

UserRequestInfo: A private inner class within FixedWindowStrategy. It acts as a data holder to track the windowStart time and the requestCount (as an AtomicInteger for thread safety) for a specific user.
TokenBucket: A private inner class within TokenBucketStrategy. It models the token bucket for a single user, containing its capacity, current number of tokens, refillRatePerSecond, and the lastRefillTimestamp.

Core Classes

RateLimitingStrategy (Interface): This is the core of the Strategy Pattern. It defines a single contract, allowRequest(String userId), that all concrete rate-limiting algorithms must implement. This allows the system to switch between different limiting strategies seamlessly.
FixedWindowStrategy (Concrete Strategy): An implementation of RateLimitingStrategy that limits the number of requests within a fixed time window. It uses a map to store UserRequestInfo for each user to track their request counts.
TokenBucketStrategy (Concrete Strategy): Another implementation of RateLimitingStrategy that uses the token bucket algorithm. This strategy allows for bursts of traffic by maintaining a bucket of tokens for each user that refills at a constant rate.
RateLimiterService (Singleton & Facade): The primary entry point for any client. It holds a reference to the currently active RateLimitingStrategy. It simplifies the interaction for the client, which only needs to call a single handleRequest method, delegating the actual rate-limiting logic to the configured strategy.

3.2 Class Relationships

The relationships between classes define the system's structure and data flow.

Composition

RateLimiterService "has-a" RateLimitingStrategy. The service's behavior is defined by the strategy it holds.
FixedWindowStrategy "has-a" map of UserRequestInfo objects to maintain state for each user.
TokenBucketStrategy "has-a" map of TokenBucket objects.

Inheritance

FixedWindowStrategy and TokenBucketStrategy both implement the RateLimitingStrategy interface. This allows them to be used interchangeably by the RateLimiterService.

Dependency

The client (RateLimiterDemo) depends on the RateLimiterService to handle requests. It does not need to know about the concrete strategies.
The RateLimiterService depends on the RateLimitingStrategy interface, not the concrete implementations.

3.3 Key Design Patterns

Strategy Pattern

This is the primary design pattern used. It allows the rate-limiting algorithm to be selected and changed at runtime. The RateLimiterService (Context) delegates the decision-making process to a concrete RateLimitingStrategy object (FixedWindowStrategy or TokenBucketStrategy). This makes the system flexible and easy to extend with new limiting algorithms (e.g., Sliding Window Log) without modifying the service class.

Singleton Pattern

The RateLimiterService is implemented as a singleton. This ensures that there is only one instance of the rate limiter service controlling the policies for the entire application, providing a single, global point of access and preventing conflicting states.

Facade Pattern

The RateLimiterService also acts as a facade. It provides a simple, unified interface (handleRequest) to the client, hiding the underlying complexity of which strategy is being used and how it manages user-specific data like time windows or token buckets.

3.4 Full Class Diagram

4. Implementation

4.1 RateLimitingStrategy Interface

This interface defines the contract for all rate limiting strategies. Each strategy decides whether a userâ€™s request should be allowed based on internal logic.

1class RateLimitingStrategy(ABC): @abstractmethod def allow_request(self, user_id: str) -> bool: pass

4.2 FixedWindowStrategy

This strategy limits requests within a fixed, discrete time window (e.g., 100 requests per minute).

1class FixedWindowStrategy(RateLimitingStrategy): def __init__(self, max_requests: int, window_size_in_seconds: int): self.max_requests = max_requests self.window_size_in_millis = window_size_in_seconds * 1000 self.user_request_map: Dict[str, 'UserRequestInfo'] = {} self._lock = threading.Lock() def allow_request(self, user_id: str) -> bool: current_time = int(time.time() * 1000)

with self._lock: if user_id not in self.user_request_map: self.user_request_map[user_id] = UserRequestInfo(current_time) request_info = self.user_request_map[user_id] with request_info.lock: if current_time - request_info.window_start >= self.window_size_in_millis: request_info.reset(current_time) if request_info.request_count < self.max_requests: request_info.request_count += 1 return True else: return False

Algorithm: This class maintains a start time (windowStart) and a counter (requestCount) for each user. When a request arrives, it checks if the current time is outside the window. If it is, the window and counter are reset. Otherwise, it checks if the counter has exceeded the maxRequests.

4.3 TokenBucketRateLimiter

The Token Bucket strategy provides a more flexible rate limit, allowing for bursts of requests by using a "bucket" of tokens that refills over time.

1class TokenBucket: def __init__(self, capacity: int, refill_rate_per_second: int, current_time_millis: int): self.capacity = capacity self.refill_rate_per_second = refill_rate_per_second self.tokens = capacity self.last_refill_timestamp = current_time_millis self.lock = threading.Lock() def refill(self, current_time: int): elapsed_time = current_time - self.last_refill_timestamp tokens_to_add = int((elapsed_time / 1000.0) * self.refill_rate_per_second) if tokens_to_add > 0: self.tokens = min(self.capacity, self.tokens + tokens_to_add) self.last_refill_timestamp = current_time class=token style=color:rgb(139,233,253)>class TokenBucketRateLimiter(RateLimiter): def __init__(self, capacity: int, refill_rate_per_second: int): self.capacity = capacity self.refill_rate_per_second = refill_rate_per_second self.user_buckets: Dict[str, 'TokenBucket'] = {} self._lock = threading.Lock() def allow_request(self, user_id: str) -> bool: current_time = int(time.time() * 1000)

with self._lock: if user_id not in self.user_buckets: self.user_buckets[user_id] = TokenBucket(self.capacity, self.refill_rate_per_second, current_time)

bucket = self.user_buckets[user_id] with bucket.lock: bucket.refill(current_time) if bucket.tokens > 0: bucket.tokens -= 1 return True else: return False

Algorithm: Each user has a TokenBucket with a fixed capacity. Tokens are added to the bucket at a constant refillRate. When a request arrives, it attempts to consume one token. If the bucket is empty, the request is denied. This allows for bursts of traffic up to the bucket's capacity.
Refill Logic: The refill method calculates how many tokens should have been generated since the last refill and adds them to the bucket, ensuring the total never exceeds capacity

4.4 RateLimiterService

This class acts as a central Singleton and Facade, providing a simplified API for clients to use the rate-limiting functionality.

1class RateLimiterService: _instance = None _lock = threading.Lock() def __init__(self): if RateLimiterService._instance is not None: raise Exception("This class is a singleton!") self.rate_limiting_strategy = None @staticmethod def get_instance(): if RateLimiterService._instance is None: with RateLimiterService._lock: if RateLimiterService._instance is None: RateLimiterService._instance = RateLimiterService() return RateLimiterService._instance def set_rate_limiter(self, rate_limiting_strategy: RateLimitingStrategy): self.rate_limiting_strategy = rate_limiting_strategy def handle_request(self, user_id: str): if self.rate_limiting_strategy.allow_request(user_id): print(f"Request from user {user_id} is allowed") else: print(f"Request from user {user_id} is rejected: Rate limit exceeded")

It delegates the request evaluation to the chosen strategy and acts as the facade to client code.

Singleton Pattern: The service is a Singleton to ensure there is a single, globally accessible instance controlling the rate-limiting policies for the entire application.
Facade Pattern: It provides a simple handleRequest method that hides the complexity of the underlying strategy. The client code doesn't need to know which algorithm is currently active; it just interacts with this clean API.

4.5 RateLimiterDemo

This demo simulates client requests under both rate limiting strategies using multiple threads. It demonstrates how the same service interface supports different strategies interchangeably.

1class RateLimiterDemo: @staticmethod def main(): user_id = "user123" print("=== Fixed Window Demo ===") RateLimiterDemo.run_fixed_window_demo(user_id) print("\n=== Token Bucket Demo ===") RateLimiterDemo.run_token_bucket_demo(user_id) @staticmethod def run_fixed_window_demo(user_id: str): max_requests = 5 window_seconds = 10 rate_limiter = FixedWindowRateLimiter(max_requests, window_seconds) service = RateLimiterService.get_instance() service.set_rate_limiter(rate_limiter) with ThreadPoolExecutor(max_workers=3) as executor: for i in range(10): executor.submit(service.handle_request, user_id) try: time.sleep(0.5) except KeyboardInterrupt: break @staticmethod def run_token_bucket_demo(user_id: str): capacity = 5 refill_rate = 1  # 1 token per second token_bucket_limiter = TokenBucketRateLimiter(capacity, refill_rate) service = RateLimiterService.get_instance() service.set_rate_limiter(token_bucket_limiter) with ThreadPoolExecutor(max_workers=2) as executor: # Simulate 10 rapid requests for i in range(10): executor.submit(service.handle_request, user_id) try: time.sleep(0.3)  # faster than refill rate except KeyboardInterrupt: break class=token style=color:rgb(139,233,253)>if __name__ == "__main__": RateLimiterDemo.main()

5. Run and Test

Languages

Java

Python

C++

Files5

strategies

rate_limiter_demo.py

main

rate_limiter_service.py