Implementing HashMaps in Python

Whenever we try to implement some kind of search logic through our data, we want it to be efficient. This may not be the case in small programs but with large applications with huge codebases, the searching, deletion, or modification time for running any kind of logic is supposed to be minimal.

Dictionaries in Python are used to store data as Key: Value pairs. They use a technique called Hashing to perform a time-based SEARCH Operation that is almost independent of the size of the dictionary.

Also read: Python Dictionary (Dict) Tutorial

What is Hashing?

Hashing is a technique used to sort and then index any kind of data. Hashing allows us to take huge amounts of data or inputs to be indexed, and store them in less output space using keys generally created using something called a Hash Function which is essentially some kind of mathematical logic applied to our input data resulting to a numerical value.

Let’s assume we have three strings and we want to store them in an efficient way using some mathematical logic. Here we have strings of John, Peter, and Carrie. So if you want to store them using hashing, the first step over here is we are changing these strings into a number using some hash function.

If we use a hash function to convert these strings to numbers, we are first converting John word to 10, then we are converting Peter. It’s converted to 5. Then we are continuing to convert Carrie, which will be 11. Thus, we have converted our strings into numbers.

Now you might be interested, in how to convert and what is the Hash function over here, and how it works. We will try to understand that further in the article.

The Hash Function in Python

A Hash function is responsible for converting these strings by using some formulas into numbers.

Next, we have to store these numbers in some data structures. If we have an array or python list, which has 12 cells in it as shown here, by using the numbers that are generated by this hash function, we will insert these strings in these arrays.

So for the first string, which is John, we have 10. We will insert John string to the index of 10 in this array. Then, the next string is Peter. So when we convert it to the number, it becomes 5.

We will insert Peter string to the index of 5. Now, the next word is Carrie. When we convert Carrie by using the hash function over here it becomes 11. So we will insert Carrie to the index of 11 to this array over here. We have completed the conversion of these strings to integers and we are inserting these strings into this array.

Reading or Accessing Data Using The Hash Function

Now, the question is, if we want to read data from this array over here, how can we access it?

Again, the logic is the same. For example, if we want to access John, the first thing that we are going to do is to use the same hash function and convert John to the number. Every time we convert the same string, the result will be the same. So we will get 10. Based on the index of 10, we can access this element over here.

Why do we need Hashing?

We know that accessing an element of the array or Python list based on their index takes O(1) time complexity. This is why hashing is very important in terms of searching for a given value in the data structure. By doing so, we can easily identify whether the string is in this array or not.

Let’s say we want to develop an application in which the search operation is used heavily and we want our app to perform as fast as possible. In this case, we can use hashing. Hashing also performs better than any other data structure in the terms of the search operation.

Hashing Terminologies

Hash Function: It is the logic or mathematical formula that can be used to map the arbitrary size of input data to an output of a fixed or a smaller size.
Hash Value: It is the value that is returned by the Hash Function.
Hash Table: It is the data structure that implements an associative array abstract data type and a structure that can map keys to values.
Collision: A collision occurs when two different keys to a hash function produce the same output.

Implementing a HashMap in Python

Let’s try to get an overview of a Hashmap by implementing one. We will create a custom Class containing basic functions. Following are the functions included in our custom class. The code contains itself has all the explanations that we discussed previously in the article.

_hash(self, key): This is our defined Hash Function or mathematical logic that we apply to the key to convert it into an integral value.
set(self, key, value): This function would allow us to add a key | value pair to our Hash Map.
get(self, key): To retrieve a value for the given key, this function is utilized.
__str__(self): To print out the complete Hash Map to our terminal.

Code

class HashMap(object):
    def __init__(self, size):
        """Initializing the list with the argument size for our HashMap"""
        self.data = [[]] * (size)

    def _hash(self, key):
        """Hash Function:
        marked as a private method in the Object"""

        # Initialize
        hash_value = 0

        # Iterating and generating hashed values using the logic below
        # Logic: Taking the initial value of 0 and adding to it the ASCII encoding for each character,
        # and multiplying by the index value. After that we are taking the modulus of the result using the
        # length of the array
        # This becomes our hashed value i.e. the index position of the input to be placed
        # in the output space
        for i in range(len(key)):
            hash_value = (hash_value + ord(key[i]) * i) % len(self.data)

        return hash_value

    def set(self, key, value):
        # Represents the index position where we want to store our data
        address = self._hash(key)

        if not self.data[address]:
            self.data[address] = []
        # If there is a collision of index value i.e. our hashed value
        # then simply add on to that array
        self.data[address].append([key, value])
        return self.data

    def get(self, key):
        # Getting the hashed value again, with the same hash function to locate the value
        # for the given key
        address = self._hash(key)
        # assigning the entire array to a variable in-order to operate later
        currentBucket = self.data[address]

        if currentBucket:
            # Iterating over the entire list to get the value
            for i in range(len(currentBucket)):
                # Check to retrieve the value
                if currentBucket[i][0] == key:
                    # If found, return the current bucket
                    return currentBucket[i][1]

        # If no value present
        return "Data Not Found"

    # In order to print the hashmap to see the structure
    def __str__(self):
        return "".join(str(item) for item in self.data)


# Instantiating from our Object Class with 50 buckets
myHash = HashMap(50)

# Setting the values to the hash table
myHash.set("john", 123)
myHash.set("peter", 567)
myHash.set("carrie", 898)

# Printing the entire hash table
print(myHash)

# Getting the values from the hash table using the keys
print(myHash.get("john"))  # Output: 123
print(myHash.get("guava"))  # Output: Data Not Found

Output

Conclusion

In this article, we went over the key terms and methodology to create a Hash Map using Python. As the name suggests, it’s like creating a map using data to provide an abstraction over an underlying logic for representation. They provide an efficient way to display and locate values based on keys and also to insert and delete values based on keys.