A URL shortener is a service that converts a long URL into a shorter, more manageable version. These short links typically contain a short domain name and a unique identifier (e.g., bit.ly/abc123) that maps back to the original long URL.
URL Shortener
short.ly
https://github.com/user/repository/blob/main/src/componen...
Popular services like TinyURL, Bitly, and t.co (used by Twitter) rely on URL shorteners to:
In this chapter, we will explore the low-level design of an url shortener in detail.
Let's start by clarifying the requirements:
Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a conversation between the candidate and the interviewer might unfold:
Candidate: Should the system automatically generate short URLs, or should users be able to specify custom aliases?
Interviewer: The system should support both. . By default, it should generate a unique short URL automatically, but users can also provide a custom alias if they prefer.
Candidate: Should the short URLs have an expiration policy, or should they remain valid indefinitely?
Interviewer: By default, short URLs should not expire. But we should allow users to specify an expiration date if needed.
Candidate: Do we need to support analytics, like tracking click counts and timestamps?
Interviewer: For now, basic click count tracking will suffice.
The heart of the problem lies in how we generate the short URL. A common and robust approach is to use a counter-based system.
The best way to do this conversion is using a base-62 encoding. A base-10 number system uses 10 digits (0-9). A base-62 system uses 62 characters: [0-9], [a-z], and [A-Z].
Example:
This approach guarantees that every ID will produce a unique string, and the strings will grow slowly in length.
These core entities define the key abstractions of the URL shortener and will guide the structure of our low-level design and class diagrams.
This section details the design of each class identified previously, including their specific attributes and methods. We will also illustrate how these classes relate to one another and highlight the key design patterns that underpin our solution.
We can categorize our classes into enums, data-holding classes, and core classes that encapsulate the system's primary logic.
A type-safe enumeration to represent the different events within the system that observers can subscribe to.
An immutable data class representing the core mapping between a long URL and its generated short key. It is constructed using an inner Builder class to ensure clean and flexible instantiation.
This is the central engine and Facade of the system. It orchestrates the entire URL shortening and resolution process, coordinating the repository and key generation strategy, and notifying observers of events.
The KeyGenerationStrategy interface and its concrete implementations are a clear application of the Strategy Pattern. This allows the algorithm for generating short keys to be selected and changed without altering the URLShortenerService, making the system highly flexible.
The Observer interface, AnalyticsService, and the URLShortenerService (as the "subject") form the Observer Pattern. This allows for a clean separation of concerns, enabling features like analytics, logging, or caching to be added by simply creating new observers, without modifying the core shortening logic.
The ShortenedURL.Builder provides a clean, fluent API for constructing immutable ShortenedURL objects. This is particularly useful for objects with multiple fields, improving code readability and maintainability.
The URLRepository interface abstracts the data access logic from the business logic within URLShortenerService. This separation of concerns means the underlying storage can be changed (e.g., from in-memory to a SQL or NoSQL database) with minimal impact on the core service.
The URLShortenerService acts as a Facade. It provides a simple, high-level interface (shorten, resolve) that hides the complex interactions between the repository, key generation strategies, and observer notifications.
The URLShortenerService uses this pattern to ensure a single, shared instance throughout the application. This is useful for managing a central resource and state, such as the list of observers and configuration.
1class EventType(Enum):
2 URL_CREATED = "URL_CREATED"
3 URL_ACCESSED = "URL_ACCESSED"1class ShortenedURL:
2 def __init__(self, builder):
3 self.long_url = builder.long_url
4 self.short_key = builder.short_key
5 self.creation_date = builder.creation_date
6
7 def get_long_url(self) -> str:
8 return self.long_url
9
10 def get_short_key(self) -> str:
11 return self.short_key
12
13 def get_creation_date(self) -> datetime:
14 return self.creation_date
15
16 class Builder:
17 def __init__(self, long_url: str, short_key: str):
18 self.long_url = long_url
19 self.short_key = short_key
20 self.creation_date = datetime.now()
21
22 def creation_date(self, creation_date: datetime):
23 self.creation_date = creation_date
24 return self
25
26 def build(self) -> 'ShortenedURL':
27 return ShortenedURL(self)1class URLRepository(ABC):
2 @abstractmethod
3 def save(self, url: ShortenedURL) -> None:
4 pass
5
6 @abstractmethod
7 def find_by_key(self, key: str) -> Optional[ShortenedURL]:
8 pass
9
10 @abstractmethod
11 def find_key_by_long_url(self, long_url: str) -> Optional[str]:
12 pass
13
14 @abstractmethod
15 def get_next_id(self) -> int:
16 pass
17
18 @abstractmethod
19 def exists_by_key(self, key: str) -> bool:
20 pass
21
22class InMemoryURLRepository(URLRepository):
23 def __init__(self):
24 self.key_to_url_map: Dict[str, ShortenedURL] = {}
25 self.long_url_to_key_map: Dict[str, str] = {}
26 self.id_counter = AtomicLong(1) # Start from 1
27 self._lock = Lock()
28
29 def save(self, url: ShortenedURL) -> None:
30 with self._lock:
31 self.key_to_url_map[url.get_short_key()] = url
32 self.long_url_to_key_map[url.get_long_url()] = url.get_short_key()
33
34 def find_by_key(self, key: str) -> Optional[ShortenedURL]:
35 with self._lock:
36 return self.key_to_url_map.get(key)
37
38 def find_key_by_long_url(self, long_url: str) -> Optional[str]:
39 with self._lock:
40 return self.long_url_to_key_map.get(long_url)
41
42 def get_next_id(self) -> int:
43 return self.id_counter.get_and_increment()
44
45 def exists_by_key(self, key: str) -> bool:
46 with self._lock:
47 return key in self.key_to_url_map1class Observer(ABC):
2 @abstractmethod
3 def update(self, event_type: EventType, url: ShortenedURL) -> None:
4 pass
5
6class AnalyticsService(Observer):
7 def __init__(self):
8 self.click_counts: Dict[str, AtomicLong] = {}
9 self._lock = Lock()
10
11 def update(self, event_type: EventType, url: ShortenedURL) -> None:
12 if event_type == EventType.URL_CREATED:
13 with self._lock:
14 self.click_counts[url.get_short_key()] = AtomicLong(0)
15 print(f"[Analytics] URL Created: Key={url.get_short_key()}, Original={url.get_long_url()}")
16 elif event_type == EventType.URL_ACCESSED:
17 with self._lock:
18 if url.get_short_key() not in self.click_counts:
19 self.click_counts[url.get_short_key()] = AtomicLong(0)
20 count = self.click_counts[url.get_short_key()]
21 count.increment_and_get()
22 print(f"[Analytics] URL Accessed: Key={url.get_short_key()}, Clicks={count.get()}")1class RandomStrategy(KeyGenerationStrategy):
2 CHARACTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
3 KEY_LENGTH = 6
4
5 def generate_key(self, id: int) -> str:
6 return ''.join(random.choice(self.CHARACTERS) for _ in range(self.KEY_LENGTH))
7
8class UUIDStrategy(KeyGenerationStrategy):
9 KEY_LENGTH = 6
10
11 def generate_key(self, id: int) -> str:
12 # Generate a new UUID, remove the hyphens, and take a substring.
13 uuid_str = str(uuid.uuid4()).replace("-", "")
14 # Return the first part of the UUID.
15 return uuid_str[:self.KEY_LENGTH]
16
17class Base62Strategy(KeyGenerationStrategy):
18 BASE62_CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
19 BASE = 62
20
21 # This is the smallest number that will produce a 6-character Base62 string.
22 # It's calculated as 62^5.
23 MIN_6_CHAR_ID_OFFSET = 916_132_832
24
25 def generate_key(self, id: int) -> str:
26 if id == 0:
27 return self.BASE62_CHARS[0]
28
29 id_with_offset = id + self.MIN_6_CHAR_ID_OFFSET
30
31 result = []
32 while id_with_offset > 0:
33 result.append(self.BASE62_CHARS[id_with_offset % self.BASE])
34 id_with_offset //= self.BASE
35
36 return ''.join(reversed(result))1class URLShortenerService:
2 _instance = None
3 _lock = Lock()
4
5 def __new__(cls):
6 if cls._instance is None:
7 with cls._lock:
8 if cls._instance is None:
9 cls._instance = super().__new__(cls)
10 cls._instance._initialized = False
11 return cls._instance
12
13 def __init__(self):
14 if not hasattr(self, '_initialized') or not self._initialized:
15 self.url_repository = None
16 self.key_generation_strategy = None
17 self.domain = None
18 self.MAX_RETRIES = 10
19 self.observers: List[Observer] = []
20 self._initialized = True
21
22 @classmethod
23 def get_instance(cls):
24 return cls()
25
26 def configure(self, domain: str, repository: URLRepository, strategy: KeyGenerationStrategy) -> None:
27 self.domain = domain
28 self.url_repository = repository
29 self.key_generation_strategy = strategy
30
31 def shorten(self, long_url: str) -> str:
32 # Check if we've already shortened this URL
33 existing_key = self.url_repository.find_key_by_long_url(long_url)
34 if existing_key is not None:
35 return self.domain + existing_key
36
37 # Generate a new key, handling potential collisions
38 short_key = self._generate_unique_key()
39
40 shortened_url = ShortenedURL.Builder(long_url, short_key).build()
41 self.url_repository.save(shortened_url)
42
43 self._notify_observers(EventType.URL_CREATED, shortened_url)
44
45 return self.domain + short_key
46
47 def _generate_unique_key(self) -> str:
48 for _ in range(self.MAX_RETRIES):
49 # The ID is passed but may be ignored by some strategies (like random)
50 potential_key = self.key_generation_strategy.generate_key(self.url_repository.get_next_id())
51 if not self.url_repository.exists_by_key(potential_key):
52 return potential_key # Found a unique key
53
54 # If we reach here, we failed to generate a unique key after several attempts.
55 raise RuntimeError(f"Failed to generate a unique short key after {self.MAX_RETRIES} attempts.")
56
57 def resolve(self, short_url: str) -> Optional[str]:
58 if not short_url.startswith(self.domain):
59 return None
60
61 short_key = short_url.replace(self.domain, "")
62
63 if self.url_repository.exists_by_key(short_key):
64 shortened_url = self.url_repository.find_by_key(short_key)
65 self._notify_observers(EventType.URL_ACCESSED, shortened_url)
66 return short_key
67
68 return None
69
70 def add_observer(self, observer: Observer) -> None:
71 self.observers.append(observer)
72
73 def remove_observer(self, observer: Observer) -> None:
74 if observer in self.observers:
75 self.observers.remove(observer)
76
77 def _notify_observers(self, event_type: EventType, url: ShortenedURL) -> None:
78 for observer in self.observers:
79 observer.update(event_type, url)1def resolve_and_print(shortener: URLShortenerService, short_url: str) -> None:
2 resolved_url = shortener.resolve(short_url)
3 if resolved_url is not None:
4 print(f"Resolved {short_url} -> {resolved_url}")
5 else:
6 print(f"No original URL found for {short_url}")
7
8def main():
9 # --- 1. Setup Phase ---
10 # Get the Singleton instance of our service
11 shortener = URLShortenerService.get_instance()
12
13 # Configure the service with the chosen strategy and repository
14 shortener.configure("http://short.ly/", InMemoryURLRepository(), RandomStrategy())
15 shortener.add_observer(AnalyticsService())
16
17 print("--- URL Shortener Service Initialized ---\n")
18
19 # --- 2. Usage Phase ---
20 original_url1 = "https://www.verylongurl.com/with/lots/of/path/segments/and/query/params?id=123&user=test"
21 print("Shortening: " + original_url1)
22 short_url1 = shortener.shorten(original_url1)
23 print("Generated Short URL: " + short_url1)
24 print()
25
26 # Shorten the same URL again
27 print("Shortening the same URL again...")
28 short_url2 = shortener.shorten(original_url1)
29 print("Generated Short URL: " + short_url2)
30 if short_url1 == short_url2:
31 print("SUCCESS: The system correctly returned the existing short URL.\n")
32
33 # Shorten a different URL
34 original_url2 = "https://www.anotherdomain.com/page.html"
35 print("Shortening: " + original_url2)
36 short_url3 = shortener.shorten(original_url2)
37 print("Generated Short URL: " + short_url3)
38 print()
39
40 # --- 3. Resolution Phase ---
41 print("--- Resolving and Tracking Clicks ---")
42
43 # Resolve the first URL multiple times
44 resolve_and_print(shortener, short_url1)
45 resolve_and_print(shortener, short_url1)
46 resolve_and_print(shortener, short_url3)
47
48 # Try to resolve a non-existent URL
49 print("\nResolving a non-existent URL...")
50 resolve_and_print(shortener, "http://short.ly/nonexistent")
51
52
53if __name__ == "__main__":
54 main()What main purpose does base-62 encoding serve in a URL shortener design?
No comments yet. Be the first to comment!