Python Hashable Objects: Learning the Key Concepts

Hashable Objects In Python

In Python, the term “hashable” refers to any object with a hash value that never changes during its lifetime. This hash value allows hashable objects to be used as dictionary keys or as members of sets, providing fast lookup and comparison operations.

But what exactly does it mean for an object to be hashable, and why does it matter? Let’s break it down step-by-step.

A hashable object is one with a consistent hash value throughout its lifetime, enabling its use as dictionary keys or set members. Hashable objects include immutable types like integers, strings, tuples, while lists, dictionaries, and sets are non-hashable due to their mutability. Understanding hashability is crucial for efficient data storage and retrieval in Python, as it ensures fast and reliable access in data structures like hash tables.

Exploring Python’s Hash Functions

hash function is a function that takes input data of arbitrary size, and converts it into a fixed size value. This value is called the hash value or hash. A good hash function has some key properties:

  • It generates very different hash values for similar input values – even a small change in the input results in a large change in the hash. This is called the avalanche effect.
  • It uniformly distributes hashes across the possible output range for the hash function.
  • It is deterministic – the same input value will always generate the same hash value.

Python has a built-in hash() function that can generate hash values:

>>> hash("Python") 
7282735233108043601

The actual algorithm used by hash() can vary between Python versions, but the properties stay the same.

Why Hashing is Useful

Generating hash values allows us to compare dictionary keys or set members quickly.

For example, let’s say we wanted to check if two lists contain the same elements. Without hashing, we would need to iterate through both lists and check each element, which takes linear time O(n).

But if the lists contain hashable items like strings or tuples, we can simply hash each list and compare the hashes in constant O(1) time:

def list_equals(list1, list2):
    return hash(tuple(list1)) == hash(tuple(list2))

By using the hash, we no longer care if the lists are ordered differently. All we care about is whether they hash to the same value.

This speedup via hashing applies to dictionaries and sets as well. Python actually stores set items and dictionary keys in a hash table internally to enable fast lookups.

Hashable Objects in Python

For an object to be hashable, two key requirements must be met:

1. It must have a hash method that returns an integer.

All Python objects inherit a __hash__ method, but for mutable objects like lists and dicts, __hash__ returns a TypeError indicating that the object is unhashable.

For immutable built-in objects like integers, strings and tuples, __hash__ returns a unique integer derived from the object’s value:

>>> hash("hello")
1859319527393798429

>>> hash( ("apple", "banana")) 
8424618783216606276

Custom classes are hashable by default if you don’t override __hash__, but best practice is to implement it explicitly if defining custom equality behavior with __eq__.

2. Its hash value cannot change while it is in use by a dict or set.

That’s why only immutable objects can be hashable – they guarantee their hash value will not change over their lifetime. A tuple object will always hash to the same value. But if a list was hashable, appending items would change its content while still expecting lookups based on the old hash value.

An object is hashable if:

  • It supports hash() via a hash() method
  • Its hash value is immutable and stays constant over the lifetime of the object

The Challenge of Hashing Mutable Objects

If we try to get the hash value of a mutable object like a list, set, or dict, we get a TypeError:

>>> hash([1,2,3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> hash({1: "one"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict' 

We can even put a mutable list inside an immutable tuple, and it still won’t be hashable because the internal list could be modified:

>>> hash((1, [2,3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

These mutable collections can’t be hashed directly:

  • List
  • Set
  • Dictionary
  • Bytearray
  • User-defined classes (unless hash is overridden)

While these immutable objects can be hashed:

  • Integer
  • Float
  • String
  • Tuple
  • Range
  • Frozenset
  • Bytes

When used as dictionary keys or set members, those hashable immutable objects allow for optimized hash table performance and comparisons via the hash.

Real-World Examples of Hashable Objects

Hashability comes up often in Python programming. Here are some common examples:

Dictionary Keys

Dictionaries can only use hashable objects like strings or tuples as keys. This Twitter API response contains dictionaries with account IDs as keys:

{
    "data": {
        "2244994945": {
            "id": "2244994945",
            "name": "Twitter Dev"
        },
        "783214": {  
            "id": "783214",
            "name": "Twitter API"
        } 
    }
}

Caching

Implementations of memoization or caching decorators frequently leverage hashability to cache return values by input parameters. For example:

cache = {}

def expensive_func(arg1, arg2):
    key = (arg1, arg2)
    if key not in cache:
        cache[key] = # expensive computation
    return cache[key]

Here arg1 and arg2 must be hashable so they can be used in the tuple key for cache lookups.

Data Science

When analyzing datasets, we often want to quickly identify or count unique values, which relies on those values being hashable. For example, extracting unique categories from this e-commerce data:

orders = [
    {"category": "Electronics", "revenue": 100},
    {"category": "Software", "revenue": 150},
    {"category": "Electronics", "revenue": 200} 
]
    
unique_categories = set(order["category"] for order in orders)
# {'Software', 'Electronics'}

Summary

When we talk about “hashable” objects in Python, we mean objects that have an unchanging identifier value associated with them – a hash value. This hash code is like a fingerprint that uniquely identifies a particular object.

Now, why does this matter? Because having this steady, reliable hash value is the key to making Python dictionaries and sets work their magic! Behind the scenes, these data structures store and quickly look up objects using the hash as an index. Hopefully, you have a better grasp on the concept of hashability!