Design Search Autocomplete System

Ashish

Ashish Pratap Singh

easy

In this chapter, we will explore the low-level design of a Search Autocomplete system in detail.

Let’s start by clarifying the requirements:

1. Clarifying Requirements

Before diving into the design, it’s important to clarify how the autocomplete system is expected to behave. Asking targeted questions helps refine assumptions, define the scope, and align on core expectations for the system.

After gathering the details, we can summarize the key system requirements.

1.1 Functional Requirements

  • The system supports inserting words into an internal dictionary.
  • It returns suggestions when a user types a prefix.
  • Suggestions are ranked based on a configurable strategy (alphabetical or frequency-based).
  • The number of suggestions returned is configurable.
  • Frequency count is incremented each time a word is added.
  • Words and prefixes are treated case-insensitively.
  • The system can handle dynamic insertion of words at runtime.

1.2 Non-Functional Requirements

  • The system should be fast and optimized for real-time suggestions.
  • It should be modular and follow object-oriented design principles.
  • The design should be easily extensible to add new ranking strategies or support new languages in the future.
  • The system can assume in-memory data storage and doesn’t need persistence.

After the requirements are clear, the next step is to identify the core entities that we will form the foundation of our design.

2. Identifying Core Entities

Core entities are the fundamental building blocks of our system. We identify them by analyzing the functional requirements and highlighting the key nouns and responsibilities that naturally map to object-oriented abstractions such as classes, enums, or interfaces.

Let’s walk through the functional requirements and extract the relevant entities:

1. The system must store a large dictionary of words and their usage frequencies efficiently.

This requirement is central to the system's performance. A standard list or hash map would be inefficient for prefix-based searches. This points to a specialized data structure, the Trie.

Each node in the Trie represents a character, forming a tree-like structure of prefixes. Therefore, we need a TrieNode entity to be the basic building block and a Trie class to manage the overall structure, including word insertions and prefix searches. The TrieNode will also need to store the frequency of each completed word.

2. Given a prefix, the system must return a list of potential full-word suggestions.

This involves two steps: finding all words that start with the given prefix and then packaging them for further processing. The Trie class is responsible for traversing from the prefix's end node to find all valid descendant words. To hold the results of this traversal, which includes both the word and its associated data (like frequency), we need a simple data-transfer object. This leads to the Suggestion entity, which encapsulates a word and its ranking weight.

These core entities define the key abstractions of the autocomplete system and will guide the structure of our low-level design and class diagrams.

3. Class Design

3.1 Class Definitions

Data Classes

These classes primarily act as data containers with minimal behavior.

TrieNode

Represents a single node in the Trie data structure. It's the fundamental unit for storing character-level information and word metadata.

TrieNode

Attributes:

  • children: A dictionary (Map or Dictionary) mapping a char to its corresponding child TrieNode.
  • isEndOfWord: A boolean flag indicating whether this node represents the end of a complete word.
  • frequency: An integer that counts how many times the word ending at this node has been inserted.

Suggestion

A simple data transfer object (DTO) that encapsulates a word and its associated ranking weight, making it easy to pass around between the Trie collection logic and the ranking strategy.

Suggestion

Attributes:

  • word: A string representing the suggested word.
  • weight: An int representing the value used for ranking (e.g., frequency).

Core Classes

Trie

Trie

Attributes:

  • root: The root TrieNode of the Trie.

Methods:

  • Insert(string word): Adds a word to the Trie, creating nodes for each character and updating the frequency count.
  • SearchPrefix(string prefix): Traverses the Trie to find the node corresponding to the end of a given prefix. Returns null if the prefix does not exist.
  • CollectSuggestions(TrieNode startNode, string prefix): Performs a depth-first search (DFS) from a given startNode to gather all complete words and returns them as a list of Suggestion objects.

AutoCompleteSystem

AutoCompleteSystem

Attributes:

  • trie: An instance of the Trie class to store all words.
  • rankingStrategy: An instance of IRankingStrategy to define how suggestions are ranked.
  • maxSuggestions: An integer specifying the maximum number of suggestions to return.

Methods:

  • AddWord(string word) / AddWords(List<string> words): Public methods to populate the Trie.
  • GetSuggestions(string prefix): The primary method for clients. It orchestrates the process of searching the Trie, collecting suggestions, ranking them using the rankingStrategy, and returning the top results.

3.2 Class Relationships

The relationships define how our classes interact.

Composition ("has-a")

AutocompleteSystem has a Trie and has a RankingStrategy. The AutocompleteSystem manages the lifecycle and coordinates the actions of these components.

Trie is composed of TrieNode objects. The Trie class is responsible for creating and linking TrieNodes to form the prefix tree structure.

Dependency ("uses-a")

AutocompleteSystem uses Suggestion objects as an intermediary data structure.

3.3 Key Design Patterns

Strategy Pattern

The RankingStrategy interface and its concrete implementations (FrequencyBasedRanking, AlphabeticalRanking) are a clear application of the Strategy Pattern. This decouples the core suggestion-finding logic within AutocompleteSystem from the ranking algorithm.

RankingStrategy

It allows us to change the ranking behavior at runtime or easily add new ranking methods (e.g., by recency, by location) without modifying the AutocompleteSystem class.

Builder Pattern

The AutocompleteSystemBuilder class implements the Builder Pattern. This pattern simplifies the creation of the AutocompleteSystem object, which has multiple configuration parameters.

Builder

It provides a fluent, readable API for constructing the object step-by-step and separates the complex construction logic from the final object representation.

Facade Pattern

The AutocompleteSystem class itself serves as a Facade. It provides a simple, high-level interface (AddWord, GetSuggestions) to a more complex underlying subsystem composed of the Trie, Suggestion objects, and the RankingStrategy. Clients interact with this simple facade, remaining unaware of the intricate logic of Trie traversal and suggestion ranking.

3.4 Full Class Diagram

Class

4. Implementation

4.1 TrieNode

1class TrieNode:
2    def __init__(self):
3        self.children: Dict[str, TrieNode] = {}
4        self.is_end_of_word: bool = False
5        self.frequency: int = 0
6
7    def get_children(self) -> Dict[str, TrieNode]:
8        return self.children
9
10    def is_end_of_word_check(self) -> bool:
11        return self.is_end_of_word
12
13    def set_end_of_word(self, end_of_word: bool):
14        self.is_end_of_word = end_of_word
15
16    def get_frequency(self) -> int:
17        return self.frequency
18
19    def increment_frequency(self):
20        self.frequency += 1

4.2 Suggestion

1class Suggestion:
2    def __init__(self, word: str, weight: int):
3        self.word = word
4        self.weight = weight
5
6    def get_word(self) -> str:
7        return self.word
8
9    def get_weight(self) -> int:
10        return self.weight

4.3 Trie

1class Trie:
2    def __init__(self):
3        self.root = TrieNode()
4
5    def insert(self, word: str):
6        current = self.root
7        for ch in word:
8            if ch not in current.get_children():
9                current.get_children()[ch] = TrieNode()
10            current = current.get_children()[ch]
11        current.set_end_of_word(True)
12        current.increment_frequency()
13
14    def search_prefix(self, prefix: str) -> Optional[TrieNode]:
15        current = self.root
16        for ch in prefix:
17            node = current.get_children().get(ch)
18            if node is None:
19                return None
20            current = node
21        return current
22
23    def collect_suggestions(self, start_node: TrieNode, prefix: str) -> List[Suggestion]:
24        suggestions = []
25        self._collect(start_node, prefix, suggestions)
26        return suggestions
27
28    def _collect(self, node: TrieNode, current_prefix: str, suggestions: List[Suggestion]):
29        if node.is_end_of_word_check():
30            suggestions.append(Suggestion(current_prefix, node.get_frequency()))
31
32        for ch in node.get_children().keys():
33            self._collect(node.get_children()[ch], current_prefix + ch, suggestions)

4.4 RankingStrategy

1class RankingStrategy(ABC):
2    @abstractmethod
3    def rank(self, suggestions: List[Suggestion]) -> List[Suggestion]:
4        pass
5
6class AlphabeticalRanking(RankingStrategy):
7    def rank(self, suggestions: List[Suggestion]) -> List[Suggestion]:
8        return sorted(suggestions, key=lambda s: s.get_word())
9
10class FrequencyBasedRanking(RankingStrategy):
11    def rank(self, suggestions: List[Suggestion]) -> List[Suggestion]:
12        return sorted(suggestions, key=lambda s: s.get_weight(), reverse=True)

4.5 AutocompleteSystem

1class AutocompleteSystem:
2    def __init__(self, ranking_strategy: RankingStrategy, max_suggestions: int):
3        self.trie = Trie()
4        self.ranking_strategy = ranking_strategy
5        self.max_suggestions = max_suggestions
6
7    def add_word(self, word: str):
8        self.trie.insert(word.lower())
9
10    def add_words(self, words: List[str]):
11        for word in words:
12            self.add_word(word)
13
14    def get_suggestions(self, prefix: str) -> List[str]:
15        prefix_node = self.trie.search_prefix(prefix.lower())
16        if prefix_node is None:
17            return []
18
19        raw_suggestions = self.trie.collect_suggestions(prefix_node, prefix.lower())
20        ranked_suggestions = self.ranking_strategy.rank(raw_suggestions)
21
22        return [s.get_word() for s in ranked_suggestions[:self.max_suggestions]]

4.6 AutocompleteSystemBuilder

1class AutocompleteSystemBuilder:
2    def __init__(self):
3        self.ranking_strategy = FrequencyBasedRanking()  # Default strategy
4        self.max_suggestions = 10  # Default limit
5
6    def with_ranking_strategy(self, strategy: RankingStrategy):
7        self.ranking_strategy = strategy
8        return self
9
10    def with_max_suggestions(self, max_suggestions: int):
11        self.max_suggestions = max_suggestions
12        return self
13
14    def build(self) -> AutocompleteSystem:
15        return AutocompleteSystem(self.ranking_strategy, self.max_suggestions)

4.7 AutocompleteDemo

1def main():
2    print("----------- SCENARIO 1: Frequency-based Ranking -----------")
3
4    # 1. Build the system with the default frequency-based strategy
5    system_by_frequency = (AutocompleteSystemBuilder()
6                          .with_max_suggestions(5)
7                          .with_ranking_strategy(FrequencyBasedRanking())
8                          .build())
9
10    # 2. Feed data into the system
11    # 'canada' is added most frequently, followed by 'car'
12    dictionary = [
13        "car", "cat", "cart", "cartoon", "canada", "candy",
14        "car", "canada", "canada", "car", "canada", "canopy", "captain"
15    ]
16    system_by_frequency.add_words(dictionary)
17
18    # 3. Get suggestions for a prefix
19    prefix1 = "ca"
20    suggestions1 = system_by_frequency.get_suggestions(prefix1)
21    print(f"Suggestions for '{prefix1}': {suggestions1}")
22
23    prefix2 = "car"
24    suggestions2 = system_by_frequency.get_suggestions(prefix2)
25    print(f"Suggestions for '{prefix2}': {suggestions2}")
26
27    print("\n----------- SCENARIO 2: Alphabetical Ranking -----------")
28
29    # 1. Build a new system with the alphabetical strategy
30    system_alphabetical = (AutocompleteSystemBuilder()
31                          .with_ranking_strategy(AlphabeticalRanking())
32                          .build())
33
34    # 2. Feed the same data
35    system_alphabetical.add_words(dictionary)
36
37    # 3. Get suggestions for the same prefix
38    suggestions3 = system_alphabetical.get_suggestions(prefix1)
39    print(f"Suggestions for '{prefix1}' (alphabetical): {suggestions3}")
40
41
42if __name__ == "__main__":
43    main()

5. Run and Test

Languages
Java
C#
Python
C++
Files9
builder
core
strategy
auto_complete_demo.py
main
auto_complete_system.py
auto_complete_demo.py
Output

6. Quiz

Design Search Autocomplete System Quiz

1 / 21
Multiple Choice

Which entity is most responsible for supporting efficient prefix-based search in a Search Autocomplete system?

How helpful was this article?

Comments (1)


0/2000
Sort by
Rishabh Raj15 days ago

in AutocompleteSystemBuilder there is no need for constructor, also the complete builder implementation does not seem to as per builder pattern, there is no nesting of classes like conventional builder.

Copilot extension content script