Design In-Memory Rate Limiter

Ashish

Ashish Pratap Singh

hard

In this chapter, we will explore the low-level design of a rate limiter like system in detail.

Let's start by clarifying the requirements:

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

After gathering the details, we can summarize the key system requirements.

1.1 Functional Requirements

  • Support rate limiting on a per-user basis.
  • Enforce a fixed number of allowed requests (e.g., 100) within a defined time window (e.g., 60 seconds).
  • Reject requests that exceed the allowed limit and return an appropriate response.
  • Provide a simple way to simulate requests in a demo or main method.

1.2 Non-Functional Requirements

  • Thread-Safety: The rate limiter must handle concurrent access from multiple threads without race conditions.
  • Modularity: The system should follow object-oriented design principles with clear separation of concerns.
  • Extensibility: The design should be flexible enough to support other rate limiting strategies like sliding window or token bucket.
  • Maintainability: The codebase should be clean, testable, and easy to extend or debug.
  • Performance: The implementation should efficiently support high-frequency request patterns using optimal data structures.

2. Identifying Core Entities

The core of our design challenge is that there isn't one "best" rate-limiting algorithm. Each has its own trade-offs. This is a perfect scenario for the Strategy Design Pattern.

We can define a common RateLimitingStrategy interface and create concrete implementations for each algorithm. This allows the main RateLimiter class to be completely decoupled from the specific algorithm being used.

Popular Algorithms to Consider:

  1. Token Bucket: A simple and popular algorithm. A bucket has a fixed capacity of tokens, which are refilled at a constant rate. Each request consumes one token. If the bucket is empty, the request is rejected.
  2. Fixed Window Counter: The simplest approach. A time window is divided into fixed slots (e.g., 0-60 seconds). We count requests in the current window. At the start of a new window, the count resets. Weakness: A burst of traffic at the edge of a window can exceed the rate (e.g., 10 requests at 00:59 and 10 at 01:00).
  3. Sliding Window Log: The most accurate approach. We store timestamps of all requests within the window. When a new request arrives, we discard all timestamps older than the window and count the remaining ones. Weakness: Can be memory-intensive.

Key Entities/Classes:

  • RateLimiter (Facade/Context): The main entry point for clients. It manages a map of client IDs to their specific rate-limiting rules and strategies.
  • RateLimitingStrategy (Interface): Defines the isAllowed() contract that all algorithms must implement.
  • SlidingWindowLogStrategy (Concrete Strategy): An implementation of the strategy interface that uses a queue of timestamps to track requests.
  • Rule: A simple data object to hold the configuration for a rate limit (e.g., 100 requests per 60 seconds).
  • UserRateLimiter: A helper class that bundles a user's specific Rule and their instance of a RateLimitingStrategy.

3. Designing Classes and Relationships

This section breaks down the system's architecture into its fundamental classes, their responsibilities, and the relationships that connect them. We also explore the key design patterns that provide robustness and flexibility to the solution.

3.1 Class Definitions

The system is composed of several types of classes, each with a distinct role.

Enums

There are no enums used in this design.

Data Classes

  • UserRequestInfo: A private inner class within FixedWindowStrategy. It acts as a data holder to track the windowStart time and the requestCount (as an AtomicInteger for thread safety) for a specific user.
  • TokenBucket: A private inner class within TokenBucketStrategy. It models the token bucket for a single user, containing its capacity, current number of tokens, refillRatePerSecond, and the lastRefillTimestamp.

Core Classes

  • RateLimitingStrategy (Interface): This is the core of the Strategy Pattern. It defines a single contract, allowRequest(String userId), that all concrete rate-limiting algorithms must implement. This allows the system to switch between different limiting strategies seamlessly.
  • FixedWindowStrategy (Concrete Strategy): An implementation of RateLimitingStrategy that limits the number of requests within a fixed time window. It uses a map to store UserRequestInfo for each user to track their request counts.
  • TokenBucketStrategy (Concrete Strategy): Another implementation of RateLimitingStrategy that uses the token bucket algorithm. This strategy allows for bursts of traffic by maintaining a bucket of tokens for each user that refills at a constant rate.
  • RateLimiterService (Singleton & Facade): The primary entry point for any client. It holds a reference to the currently active RateLimitingStrategy. It simplifies the interaction for the client, which only needs to call a single handleRequest method, delegating the actual rate-limiting logic to the configured strategy.

3.2 Class Relationships

The relationships between classes define the system's structure and data flow.

Composition

  • RateLimiterService "has-a" RateLimitingStrategy. The service's behavior is defined by the strategy it holds.
  • FixedWindowStrategy "has-a" map of UserRequestInfo objects to maintain state for each user.
  • TokenBucketStrategy "has-a" map of TokenBucket objects.

Inheritance

  • FixedWindowStrategy and TokenBucketStrategy both implement the RateLimitingStrategy interface. This allows them to be used interchangeably by the RateLimiterService.

Dependency

  • The client (RateLimiterDemo) depends on the RateLimiterService to handle requests. It does not need to know about the concrete strategies.
  • The RateLimiterService depends on the RateLimitingStrategy interface, not the concrete implementations.

3.3 Key Design Patterns

Strategy Pattern

This is the primary design pattern used. It allows the rate-limiting algorithm to be selected and changed at runtime. The RateLimiterService (Context) delegates the decision-making process to a concrete RateLimitingStrategy object (FixedWindowStrategy or TokenBucketStrategy). This makes the system flexible and easy to extend with new limiting algorithms (e.g., Sliding Window Log) without modifying the service class.

Singleton Pattern

The RateLimiterService is implemented as a singleton. This ensures that there is only one instance of the rate limiter service controlling the policies for the entire application, providing a single, global point of access and preventing conflicting states.

Facade Pattern

The RateLimiterService also acts as a facade. It provides a simple, unified interface (handleRequest) to the client, hiding the underlying complexity of which strategy is being used and how it manages user-specific data like time windows or token buckets.

3.4 Full Class Diagram

Rate Limiter Class Diagram

4. Implementation

4.1 RateLimitingStrategy Interface

This interface defines the contract for all rate limiting strategies. Each strategy decides whether a user’s request should be allowed based on internal logic.

1class RateLimitingStrategy(ABC):
2    @abstractmethod
3    def allow_request(self, user_id: str) -> bool:
4        pass

4.2 FixedWindowStrategy

This strategy limits requests within a fixed, discrete time window (e.g., 100 requests per minute).

1class FixedWindowStrategy(RateLimitingStrategy):
2    def __init__(self, max_requests: int, window_size_in_seconds: int):
3        self.max_requests = max_requests
4        self.window_size_in_millis = window_size_in_seconds * 1000
5        self.user_request_map: Dict[str, 'UserRequestInfo'] = {}
6        self._lock = threading.Lock()
7
8    def allow_request(self, user_id: str) -> bool:
9        current_time = int(time.time() * 1000)
10        
11        with self._lock:
12            if user_id not in self.user_request_map:
13                self.user_request_map[user_id] = UserRequestInfo(current_time)
14
15            request_info = self.user_request_map[user_id]
16
17            with request_info.lock:
18                if current_time - request_info.window_start >= self.window_size_in_millis:
19                    request_info.reset(current_time)
20
21                if request_info.request_count < self.max_requests:
22                    request_info.request_count += 1
23                    return True
24                else:
25                    return False
  • Algorithm: This class maintains a start time (windowStart) and a counter (requestCount) for each user. When a request arrives, it checks if the current time is outside the window. If it is, the window and counter are reset. Otherwise, it checks if the counter has exceeded the maxRequests.

4.3 TokenBucketRateLimiter

The Token Bucket strategy provides a more flexible rate limit, allowing for bursts of requests by using a "bucket" of tokens that refills over time.

1class TokenBucket:
2    def __init__(self, capacity: int, refill_rate_per_second: int, current_time_millis: int):
3        self.capacity = capacity
4        self.refill_rate_per_second = refill_rate_per_second
5        self.tokens = capacity
6        self.last_refill_timestamp = current_time_millis
7        self.lock = threading.Lock()
8
9    def refill(self, current_time: int):
10        elapsed_time = current_time - self.last_refill_timestamp
11        tokens_to_add = int((elapsed_time / 1000.0) * self.refill_rate_per_second)
12
13        if tokens_to_add > 0:
14            self.tokens = min(self.capacity, self.tokens + tokens_to_add)
15            self.last_refill_timestamp = current_time
16
17class TokenBucketRateLimiter(RateLimiter):
18    def __init__(self, capacity: int, refill_rate_per_second: int):
19        self.capacity = capacity
20        self.refill_rate_per_second = refill_rate_per_second
21        self.user_buckets: Dict[str, 'TokenBucket'] = {}
22        self._lock = threading.Lock()
23
24    def allow_request(self, user_id: str) -> bool:
25        current_time = int(time.time() * 1000)
26        
27        with self._lock:
28            if user_id not in self.user_buckets:
29                self.user_buckets[user_id] = TokenBucket(self.capacity, self.refill_rate_per_second, current_time)
30            
31            bucket = self.user_buckets[user_id]
32
33            with bucket.lock:
34                bucket.refill(current_time)
35                if bucket.tokens > 0:
36                    bucket.tokens -= 1
37                    return True
38                else:
39                    return False
  • Algorithm: Each user has a TokenBucket with a fixed capacity. Tokens are added to the bucket at a constant refillRate. When a request arrives, it attempts to consume one token. If the bucket is empty, the request is denied. This allows for bursts of traffic up to the bucket's capacity.
  • Refill Logic: The refill method calculates how many tokens should have been generated since the last refill and adds them to the bucket, ensuring the total never exceeds capacity

4.4 RateLimiterService

This class acts as a central Singleton and Facade, providing a simplified API for clients to use the rate-limiting functionality.

1class RateLimiterService:
2    _instance = None
3    _lock = threading.Lock()
4
5    def __init__(self):
6        if RateLimiterService._instance is not None:
7            raise Exception("This class is a singleton!")
8        self.rate_limiting_strategy = None
9
10    @staticmethod
11    def get_instance():
12        if RateLimiterService._instance is None:
13            with RateLimiterService._lock:
14                if RateLimiterService._instance is None:
15                    RateLimiterService._instance = RateLimiterService()
16        return RateLimiterService._instance
17
18    def set_rate_limiter(self, rate_limiting_strategy: RateLimitingStrategy):
19        self.rate_limiting_strategy = rate_limiting_strategy
20
21    def handle_request(self, user_id: str):
22        if self.rate_limiting_strategy.allow_request(user_id):
23            print(f"Request from user {user_id} is allowed")
24        else:
25            print(f"Request from user {user_id} is rejected: Rate limit exceeded")

It delegates the request evaluation to the chosen strategy and acts as the facade to client code.

  • Singleton Pattern: The service is a Singleton to ensure there is a single, globally accessible instance controlling the rate-limiting policies for the entire application.
  • Facade Pattern: It provides a simple handleRequest method that hides the complexity of the underlying strategy. The client code doesn't need to know which algorithm is currently active; it just interacts with this clean API.

4.5 RateLimiterDemo

This demo simulates client requests under both rate limiting strategies using multiple threads. It demonstrates how the same service interface supports different strategies interchangeably.

1class RateLimiterDemo:
2    @staticmethod
3    def main():
4        user_id = "user123"
5
6        print("=== Fixed Window Demo ===")
7        RateLimiterDemo.run_fixed_window_demo(user_id)
8
9        print("\n=== Token Bucket Demo ===")
10        RateLimiterDemo.run_token_bucket_demo(user_id)
11
12    @staticmethod
13    def run_fixed_window_demo(user_id: str):
14        max_requests = 5
15        window_seconds = 10
16
17        rate_limiter = FixedWindowRateLimiter(max_requests, window_seconds)
18        service = RateLimiterService.get_instance()
19        service.set_rate_limiter(rate_limiter)
20
21        with ThreadPoolExecutor(max_workers=3) as executor:
22            for i in range(10):
23                executor.submit(service.handle_request, user_id)
24                try:
25                    time.sleep(0.5)
26                except KeyboardInterrupt:
27                    break
28
29    @staticmethod
30    def run_token_bucket_demo(user_id: str):
31        capacity = 5
32        refill_rate = 1  # 1 token per second
33
34        token_bucket_limiter = TokenBucketRateLimiter(capacity, refill_rate)
35        service = RateLimiterService.get_instance()
36        service.set_rate_limiter(token_bucket_limiter)
37
38        with ThreadPoolExecutor(max_workers=2) as executor:
39            # Simulate 10 rapid requests
40            for i in range(10):
41                executor.submit(service.handle_request, user_id)
42                try:
43                    time.sleep(0.3)  # faster than refill rate
44                except KeyboardInterrupt:
45                    break
46
47
48if __name__ == "__main__":
49    RateLimiterDemo.main()

5. Run and Test

Languages
Java
C#
Python
C++
Files5
strategies
rate_limiter_demo.py
main
rate_limiter_service.py
rate_limiter_demo.py
Output

6. Quiz

Design Rate Limiter Quiz

1 / 21
Multiple Choice

What is the main purpose of introducing a rate limiter in an in-memory system?

How helpful was this article?

Comments


0/2000

No comments yet. Be the first to comment!

Copilot extension content script