A rate limiter is a system component or algorithm used to control the rate of operations performed by a user, client, or service over a given period. It helps prevent abuse, reduce load, and ensure fair usage of system resources.
Fixed Window: Allows 5 requests per 10s window. Counter resets at the start of each window.
Rate limiters are commonly applied to:
In this chapter, we will explore the low-level design of a rate limiter like system in detail.
Let's start by clarifying the requirements:
Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a discussion between the candidate and the interviewer might unfold:
Candidate: Should the rate limiter work on a per-user basis, or should it be based on API key or IP address?
Interviewer: Let’s keep it simple and go with per-user rate limiting. You can assume users are uniquely identified by a user ID or token.
Candidate: Which rate limiting algorithm should we implement fixed window, sliding window, or token bucket?
Interviewer: Use the fixed window algorithm and token bucket if possible for this version. We can consider more advanced approaches later.
Candidate: Should all users have the same rate limit, or can it vary per user?
Interviewer: Assume the same rate limit for all users (e.g., 100 requests per 60 seconds).
Candidate: What should happen when a user exceeds the rate limit? Should we silently drop the request or notify the user?
Interviewer: The system should clearly inform the user that they’ve exceeded the limit.
Candidate: Do we need to handle concurrency in case of multiple threads trying to access or update rate limits for the same user?
Interviewer: : Yes, the implementation should be thread-safe and handle concurrent access reliably.
After gathering the details, we can summarize the key system requirements.
The core of our design challenge is that there isn't one "best" rate-limiting algorithm. Each has its own trade-offs. This is a perfect scenario for the Strategy Design Pattern.
We can define a common RateLimitingStrategy interface and create concrete implementations for each algorithm. This allows the main RateLimiter class to be completely decoupled from the specific algorithm being used.
Popular Algorithms to Consider:
Key Entities/Classes:
This section breaks down the system's architecture into its fundamental classes, their responsibilities, and the relationships that connect them. We also explore the key design patterns that provide robustness and flexibility to the solution.
The system is composed of several types of classes, each with a distinct role.
There are no enums used in this design.
UserRequestInfo: A private inner class within FixedWindowStrategy. It acts as a data holder to track the windowStart time and the requestCount (as an AtomicInteger for thread safety) for a specific user.TokenBucket: A private inner class within TokenBucketStrategy. It models the token bucket for a single user, containing its capacity, current number of tokens, refillRatePerSecond, and the lastRefillTimestamp.RateLimitingStrategy (Interface): This is the core of the Strategy Pattern. It defines a single contract, allowRequest(String userId), that all concrete rate-limiting algorithms must implement. This allows the system to switch between different limiting strategies seamlessly.FixedWindowStrategy (Concrete Strategy): An implementation of RateLimitingStrategy that limits the number of requests within a fixed time window. It uses a map to store UserRequestInfo for each user to track their request counts.TokenBucketStrategy (Concrete Strategy): Another implementation of RateLimitingStrategy that uses the token bucket algorithm. This strategy allows for bursts of traffic by maintaining a bucket of tokens for each user that refills at a constant rate.RateLimiterService (Singleton & Facade): The primary entry point for any client. It holds a reference to the currently active RateLimitingStrategy. It simplifies the interaction for the client, which only needs to call a single handleRequest method, delegating the actual rate-limiting logic to the configured strategy.The relationships between classes define the system's structure and data flow.
RateLimiterService "has-a" RateLimitingStrategy. The service's behavior is defined by the strategy it holds.FixedWindowStrategy "has-a" map of UserRequestInfo objects to maintain state for each user.TokenBucketStrategy "has-a" map of TokenBucket objects.FixedWindowStrategy and TokenBucketStrategy both implement the RateLimitingStrategy interface. This allows them to be used interchangeably by the RateLimiterService.RateLimiterDemo) depends on the RateLimiterService to handle requests. It does not need to know about the concrete strategies.RateLimiterService depends on the RateLimitingStrategy interface, not the concrete implementations.This is the primary design pattern used. It allows the rate-limiting algorithm to be selected and changed at runtime. The RateLimiterService (Context) delegates the decision-making process to a concrete RateLimitingStrategy object (FixedWindowStrategy or TokenBucketStrategy). This makes the system flexible and easy to extend with new limiting algorithms (e.g., Sliding Window Log) without modifying the service class.
The RateLimiterService is implemented as a singleton. This ensures that there is only one instance of the rate limiter service controlling the policies for the entire application, providing a single, global point of access and preventing conflicting states.
The RateLimiterService also acts as a facade. It provides a simple, unified interface (handleRequest) to the client, hiding the underlying complexity of which strategy is being used and how it manages user-specific data like time windows or token buckets.
This interface defines the contract for all rate limiting strategies. Each strategy decides whether a user’s request should be allowed based on internal logic.
1class RateLimitingStrategy(ABC):
2 @abstractmethod
3 def allow_request(self, user_id: str) -> bool:
4 passThis strategy limits requests within a fixed, discrete time window (e.g., 100 requests per minute).
1class FixedWindowStrategy(RateLimitingStrategy):
2 def __init__(self, max_requests: int, window_size_in_seconds: int):
3 self.max_requests = max_requests
4 self.window_size_in_millis = window_size_in_seconds * 1000
5 self.user_request_map: Dict[str, 'UserRequestInfo'] = {}
6 self._lock = threading.Lock()
7
8 def allow_request(self, user_id: str) -> bool:
9 current_time = int(time.time() * 1000)
10
11 with self._lock:
12 if user_id not in self.user_request_map:
13 self.user_request_map[user_id] = UserRequestInfo(current_time)
14
15 request_info = self.user_request_map[user_id]
16
17 with request_info.lock:
18 if current_time - request_info.window_start >= self.window_size_in_millis:
19 request_info.reset(current_time)
20
21 if request_info.request_count < self.max_requests:
22 request_info.request_count += 1
23 return True
24 else:
25 return FalseThe Token Bucket strategy provides a more flexible rate limit, allowing for bursts of requests by using a "bucket" of tokens that refills over time.
1class TokenBucket:
2 def __init__(self, capacity: int, refill_rate_per_second: int, current_time_millis: int):
3 self.capacity = capacity
4 self.refill_rate_per_second = refill_rate_per_second
5 self.tokens = capacity
6 self.last_refill_timestamp = current_time_millis
7 self.lock = threading.Lock()
8
9 def refill(self, current_time: int):
10 elapsed_time = current_time - self.last_refill_timestamp
11 tokens_to_add = int((elapsed_time / 1000.0) * self.refill_rate_per_second)
12
13 if tokens_to_add > 0:
14 self.tokens = min(self.capacity, self.tokens + tokens_to_add)
15 self.last_refill_timestamp = current_time
16
17class TokenBucketRateLimiter(RateLimiter):
18 def __init__(self, capacity: int, refill_rate_per_second: int):
19 self.capacity = capacity
20 self.refill_rate_per_second = refill_rate_per_second
21 self.user_buckets: Dict[str, 'TokenBucket'] = {}
22 self._lock = threading.Lock()
23
24 def allow_request(self, user_id: str) -> bool:
25 current_time = int(time.time() * 1000)
26
27 with self._lock:
28 if user_id not in self.user_buckets:
29 self.user_buckets[user_id] = TokenBucket(self.capacity, self.refill_rate_per_second, current_time)
30
31 bucket = self.user_buckets[user_id]
32
33 with bucket.lock:
34 bucket.refill(current_time)
35 if bucket.tokens > 0:
36 bucket.tokens -= 1
37 return True
38 else:
39 return FalseThis class acts as a central Singleton and Facade, providing a simplified API for clients to use the rate-limiting functionality.
1class RateLimiterService:
2 _instance = None
3 _lock = threading.Lock()
4
5 def __init__(self):
6 if RateLimiterService._instance is not None:
7 raise Exception("This class is a singleton!")
8 self.rate_limiting_strategy = None
9
10 @staticmethod
11 def get_instance():
12 if RateLimiterService._instance is None:
13 with RateLimiterService._lock:
14 if RateLimiterService._instance is None:
15 RateLimiterService._instance = RateLimiterService()
16 return RateLimiterService._instance
17
18 def set_rate_limiter(self, rate_limiting_strategy: RateLimitingStrategy):
19 self.rate_limiting_strategy = rate_limiting_strategy
20
21 def handle_request(self, user_id: str):
22 if self.rate_limiting_strategy.allow_request(user_id):
23 print(f"Request from user {user_id} is allowed")
24 else:
25 print(f"Request from user {user_id} is rejected: Rate limit exceeded")It delegates the request evaluation to the chosen strategy and acts as the facade to client code.
This demo simulates client requests under both rate limiting strategies using multiple threads. It demonstrates how the same service interface supports different strategies interchangeably.
1class RateLimiterDemo:
2 @staticmethod
3 def main():
4 user_id = "user123"
5
6 print("=== Fixed Window Demo ===")
7 RateLimiterDemo.run_fixed_window_demo(user_id)
8
9 print("\n=== Token Bucket Demo ===")
10 RateLimiterDemo.run_token_bucket_demo(user_id)
11
12 @staticmethod
13 def run_fixed_window_demo(user_id: str):
14 max_requests = 5
15 window_seconds = 10
16
17 rate_limiter = FixedWindowRateLimiter(max_requests, window_seconds)
18 service = RateLimiterService.get_instance()
19 service.set_rate_limiter(rate_limiter)
20
21 with ThreadPoolExecutor(max_workers=3) as executor:
22 for i in range(10):
23 executor.submit(service.handle_request, user_id)
24 try:
25 time.sleep(0.5)
26 except KeyboardInterrupt:
27 break
28
29 @staticmethod
30 def run_token_bucket_demo(user_id: str):
31 capacity = 5
32 refill_rate = 1 # 1 token per second
33
34 token_bucket_limiter = TokenBucketRateLimiter(capacity, refill_rate)
35 service = RateLimiterService.get_instance()
36 service.set_rate_limiter(token_bucket_limiter)
37
38 with ThreadPoolExecutor(max_workers=2) as executor:
39 # Simulate 10 rapid requests
40 for i in range(10):
41 executor.submit(service.handle_request, user_id)
42 try:
43 time.sleep(0.3) # faster than refill rate
44 except KeyboardInterrupt:
45 break
46
47
48if __name__ == "__main__":
49 RateLimiterDemo.main()What is the main purpose of introducing a rate limiter in an in-memory system?
No comments yet. Be the first to comment!