In Python, the term “hashable” refers to any object with a hash value that never changes during its lifetime. This hash value allows hashable objects to be used as dictionary keys or as members of sets, providing fast lookup and comparison operations.
But what exactly does it mean for an object to be hashable, and why does it matter? Let’s break it down step-by-step.
A hashable object is one with a consistent hash value throughout its lifetime, enabling its use as dictionary keys or set members. Hashable objects include immutable types like integers, strings, tuples, while lists, dictionaries, and sets are non-hashable due to their mutability. Understanding hashability is crucial for efficient data storage and retrieval in Python, as it ensures fast and reliable access in data structures like hash tables.
Exploring Python’s Hash Functions
A hash function is a function that takes input data of arbitrary size, and converts it into a fixed size value. This value is called the hash value or hash. A good hash function has some key properties:
- It generates very different hash values for similar input values – even a small change in the input results in a large change in the hash. This is called the avalanche effect.
- It uniformly distributes hashes across the possible output range for the hash function.
- It is deterministic – the same input value will always generate the same hash value.
Python has a built-in hash()
function that can generate hash values:
>>> hash("Python")
7282735233108043601
The actual algorithm used by hash()
can vary between Python versions, but the properties stay the same.
Why Hashing is Useful
Generating hash values allows us to compare dictionary keys or set members quickly.
For example, let’s say we wanted to check if two lists contain the same elements. Without hashing, we would need to iterate through both lists and check each element, which takes linear time O(n).
But if the lists contain hashable items like strings or tuples, we can simply hash each list and compare the hashes in constant O(1) time:
def list_equals(list1, list2):
return hash(tuple(list1)) == hash(tuple(list2))
By using the hash, we no longer care if the lists are ordered differently. All we care about is whether they hash to the same value.
This speedup via hashing applies to dictionaries and sets as well. Python actually stores set items and dictionary keys in a hash table internally to enable fast lookups.
Hashable Objects in Python
For an object to be hashable, two key requirements must be met:
1. It must have a hash method that returns an integer.
All Python objects inherit a __hash__
method, but for mutable objects like lists and dicts, __hash__
returns a TypeError indicating that the object is unhashable.
For immutable built-in objects like integers, strings and tuples, __hash__
returns a unique integer derived from the object’s value:
>>> hash("hello")
1859319527393798429
>>> hash( ("apple", "banana"))
8424618783216606276
Custom classes are hashable by default if you don’t override __hash__
, but best practice is to implement it explicitly if defining custom equality behavior with __eq__
.
2. Its hash value cannot change while it is in use by a dict or set.
That’s why only immutable objects can be hashable – they guarantee their hash value will not change over their lifetime. A tuple object will always hash to the same value. But if a list was hashable, appending items would change its content while still expecting lookups based on the old hash value.
An object is hashable if:
- It supports hash() via a hash() method
- Its hash value is immutable and stays constant over the lifetime of the object
The Challenge of Hashing Mutable Objects
If we try to get the hash value of a mutable object like a list, set, or dict, we get a TypeError:
>>> hash([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash({1: "one"})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
We can even put a mutable list inside an immutable tuple, and it still won’t be hashable because the internal list could be modified:
>>> hash((1, [2,3]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
These mutable collections can’t be hashed directly:
- List
- Set
- Dictionary
- Bytearray
- User-defined classes (unless hash is overridden)
While these immutable objects can be hashed:
- Integer
- Float
- String
- Tuple
- Range
- Frozenset
- Bytes
When used as dictionary keys or set members, those hashable immutable objects allow for optimized hash table performance and comparisons via the hash.
Real-World Examples of Hashable Objects
Hashability comes up often in Python programming. Here are some common examples:
Dictionary Keys
Dictionaries can only use hashable objects like strings or tuples as keys. This Twitter API response contains dictionaries with account IDs as keys:
{
"data": {
"2244994945": {
"id": "2244994945",
"name": "Twitter Dev"
},
"783214": {
"id": "783214",
"name": "Twitter API"
}
}
}
Caching
Implementations of memoization or caching decorators frequently leverage hashability to cache return values by input parameters. For example:
cache = {}
def expensive_func(arg1, arg2):
key = (arg1, arg2)
if key not in cache:
cache[key] = # expensive computation
return cache[key]
Here arg1
and arg2
must be hashable so they can be used in the tuple key for cache lookups.
Data Science
When analyzing datasets, we often want to quickly identify or count unique values, which relies on those values being hashable. For example, extracting unique categories from this e-commerce data:
orders = [
{"category": "Electronics", "revenue": 100},
{"category": "Software", "revenue": 150},
{"category": "Electronics", "revenue": 200}
]
unique_categories = set(order["category"] for order in orders)
# {'Software', 'Electronics'}
Summary
When we talk about “hashable” objects in Python, we mean objects that have an unchanging identifier value associated with them – a hash value. This hash code is like a fingerprint that uniquely identifies a particular object.
Now, why does this matter? Because having this steady, reliable hash value is the key to making Python dictionaries and sets work their magic! Behind the scenes, these data structures store and quickly look up objects using the hash as an index. Hopefully, you have a better grasp on the concept of hashability!