Creating an Efficient Trie Data Structure with Python

Trie Implementation

Trie data structure is very efficient when it comes to information retrieval. It is majorly used in the implementation of dictionaries and phonebooks. It is also useful for implementing auto-text suggestions you see while typing on a keyboard. In this tutorial, we will understand how to implement our own trie data structure using Python specifically tailored for prefix search operations.

Here, you will learn the following:

  • How to implement your own Trie data structure in Python for efficient prefix searches.
  • How to make insertions into a Trie data structure.
  • How to query for words in a Trie data structure.

Implementing the TrieNode class

Creating the Trie Data Structure Class

  1. A Character
  2. List of children
  3. A Boolean that tells whether a word ends at that node.

Let’s write the code for our TrieNode class:

class TrieNode:

    def __init__(self, char):

        self.char = char

        self.is_end = False

        self.children = {}

While initializing a TrieNode we need to provide a character.

.is_end marks whether a word ends at the current node or not. It is set to false by default.

Creating the Trie Data Structure Class

Now that we have the TrieNode class, we’ll build the trie data structure around it using Python. To initialize a trie, we need to initialize a trie node and provide methods for inserting and searching in the trie.

class Trie(object):

    def __init__(self):

        self.root = TrieNode("")

This part takes care of initializing an empty TrieNode.

Inserting Elements into the Trie Structure

Let’s look at how insertions happen in a Trie data structure.

To insert a word into the trie, we will traverse the word character by character, adding nodes as necessary.

Simultaneously we need to move down the Trie from the root and see if the list of children has that character. In case the character is not present then we need to make a new TrieNode with that character and add it to the list of children.

When we reach the end of the word we need to set is_end to true for the node corresponding to the last character of the word.

Here’s the implementation of the approach discussed above.

def insert(self, word):

        node = self.root

#traverse the word character by character 
        for char in word:
#check if the character is there in the list of children 

            if char in node.children:
                node = node.children[char]
            else:
# else make a new TrieNode corresponding to that character 

                new_node = TrieNode(char)

# add the new node to the list of children 

                node.children[char] = new_node
                node = new_node

#after traversig the word set .is_end to true for the last #char
        node.is_end = True

This will take care of all our insertions.

Consider a Trie with the following words :

  • Here
  • Hear
  • Her
  • He
  • Hello
  • How

The trie corresponding to these words will look like :

Trie
Trie

Here green nodes correspond to is_end being true for that node.

Searching for Words in the Trie in Python

Now let’s examine how search operations are performed in our Python-based trie data structure. We don’t want to perform an exact match for searching. Rather what we want is to get the list of words that start with the string we are searching for.

When implementing a trie, the search function takes a prefix tree as input and efficiently returns all the words starting with that common prefix.

For example if we search for “He”, we should get the following words.

  • He
  • Here
  • Hear
  • Her
  • Hello

These are the words that start with “He”. This aspect of a trie makes it useful for implementing auto-completion in keyboards.

While searching for words we search in a DFS fashion. Therefore we need to write a function for performing a DFS search in our trie.

 def dfs(self, node, pre):

        if node.is_end:
            self.output.append((pre + node.char))
        
        for child in node.children.values():
            self.dfs(child, pre + node.char)

When using our trie implementation to search, we pass a node and the searched prefix as parameters. Whenever the search reaches a node with is_end as true it appends the word to the output list.

Otherwise it keeps on searching among the children in a DFS fashion.

The search function is as follows:

def search(self, x):
       
        node = self.root
# traverse the search query and move down the trie        
        for char in x:
            if char in node.children:
                node = node.children[char]
            else:
              #if query doesn't match the nodes in trie
                return []
        
        self.output = []
#call DFS 
        self.dfs(node, x[:-1])

        return self.output

While searching we traverse through the search query and move down the trie simultaneously.

Then we call DFS on the node corresponding to the last character of the query.

The DFS function then moves down from this last character and adds all the complete words to our output list.

Python Implementation of Trie Data Structure

The complete code of this tutorial is given below :

class TrieNode:

    def __init__(self, char):

        self.char = char

        self.is_end = False

        self.children = {}

class Trie(object):

    def __init__(self):

        self.root = TrieNode("")
    
    def insert(self, word):

        node = self.root

        for char in word:
            if char in node.children:
                node = node.children[char]
            else:

                new_node = TrieNode(char)
                node.children[char] = new_node
                node = new_node
        
        node.is_end = True
        
    def dfs(self, node, pre):

        if node.is_end:
            self.output.append((pre + node.char))
        
        for child in node.children.values():
            self.dfs(child, pre + node.char)
        
    def search(self, x):
       
        node = self.root
        
        for char in x:
            if char in node.children:
                node = node.children[char]
            else:
              
                return []
        
        self.output = []
        self.dfs(node, x[:-1])

        return self.output

Testing the Trie Implementation

Let’s try adding some words to a trie and make a search query.

tr = Trie()
tr.insert("here")
tr.insert("hear")
tr.insert("he")
tr.insert("hello")
tr.insert("how ")
tr.insert("her")

Utilizing our Python trie implementation, we will insert words to the trie data structure. This will make a trie and add these five words to it.

Now we can make a query using the following line :

tr.search("he")

Output :

['he', 'her', 'here', 'hear', 'hello']

Let’s make another query :

tr.search("her")

Output :

['her', 'here']

Conclusion

We successfully implemented a Trie data structure with Python, allowing for efficient prefix-based search operations. In this tutorial, we explored the efficient Trie data structure in Python. We built a Trie class supporting insertion and word query operations. How can you utilize this implementation for specific applications, such as auto-complete features?

References