How Do I Return Hash Values of a 1D array?

The hash values of a 1d array can be computed with the help of an infamous function of the pandas library – pandas.util.hash_array.

The hash function is used to assign a unique and deterministic integer to each element in the array. One main application of the hash function with respect to arrays would be checking if two arrays are equal. This function can only be applied to immutable objects such as tuples.

This function generates a random integer for each element that uniquely identifies the element.

In this tutorial, we are going to take the help of Numpy and Pandas to create a 1D array and generate an array of hash values for this array.

Visit this post if you are unfamiliar with the Pandas library.

What Is an Array?

An array is a data structure that holds multiple elements of the same type at a time. We can say that an array is a collection of homogeneous elements.

Although we do not have a built-in function that supports the creation of arrays, we can use the NumPy library to create arrays.

Check out this post to know more about the NumPy arrays.

Example of an Array

Let us see how we can create an array using the Numpy library. We are going to see the creation of 1D and 2D arrays.

1D Array

A 1D array has only one dimension. So when we compute the shape of the array, it only consists of one element.

#creation of a 1D array
import numpy as np
arr1=np.array([1,2,3,4,5,6])
print("The 1D array is:\n",arr1)
print("The shape of the array:",arr1.shape)

So firstly, we are going to import the numpy library as np to work with arrays.

The variable called arr1 stores the elements of an array. We can observe that these elements are created in the form of a list which is then converted into an array.

We are using two print functions in the last two lines where the first one prints the array we just computed and the next one is used to get the shape of the array.

The shape of the array is (6,) because there are 6 elements stored in one entity (either row or column).

2D Array

A 2D array stores the elements in two dimensions(row and column).

#creation of a 2D array
arr2=np.array([[1,2,4],[3,5,6]])
print("The 2D array is:\n",arr2)
print("The shape of the array:",arr2.shape)

We are initializing a variable called arr2 to store the elements of the array. This array has three elements in two lists each.

In the next two lines, we are printing the array and its shape.

The shape of the array is (2,3) because the array has 2 rows and 3 columns.

Check out this post to understand how to create multidimensional arrays.

Syntax of pandas.util.hash_array Explored

Let us see the syntax of the function.

pandas.util.hash_array(vals, encoding='utf8', hash_key='0123456789123456', categorize=True)

vals: This is a required parameter and takes a 1D array as input.

encoding=utf-8: This parameter is optional and is only used when the input is a string in order to convert it into bytes before computing the hash value. The utf-8 stands for Unicode Transformation Format which is 8 bits.

hash_key='0123456789123456': When the obj we are trying to hash has strings in it, we can also generate a hash key to encode the string. This field is also optional and the default value of this parameter is default _default_hash_key.

categorize: This argument takes a boolean type and is used when the obj we are trying to hash has duplicate values. This parameter converts the elements in vals to categorical objects and by doing so, we can reduce the number of values that are unique to be hashed. The default value is True.

What does this function return?
This function returns an array of the same length as the original array and contains hashed values of that array.

Let us look at a few examples of this function.

Hash Values of a 1D Array With Unique Positive Elements

Let us create a 1D array with positive elements and try to find the hash values.

#creation of a 1D array
import numpy as np
import pandas as pd
arr1=np.array([1,2,3,4,5,6])
print("The 1D array is:\n",arr1)
print("-"*25)
#hash values of the array
harr1=pd.util.hash_array(arr1)
print("The hash values of the array are:\n",harr1)

In the second and third lines, we are importing the numpy and pandas libraries.

In the next line, we are creating a variable called arr1 which contains six elements.

The print function is used to output the array to the screen.

The next print function is used to separate the two outputs.

In the following line, we are creating a variable called harr1 to compute the hash values of the array that is passed to the hash function.

Next, we are printing the hash values.

Hash Values of a 1D Array With Unique Negative Elements

Let us take the same array as the previous example but assign a negative sign and check if we get the same hash values.

#creation of a 1D array
import numpy as np
import pandas as pd
arr2=np.array([-1,-2,-3,-4,-5,-6])
print("The 1D array is:\n",arr2)
print("-"*25)
#hash values of the array
harr2=pd.util.hash_array(arr2)
print("The hash values of the array are:\n",harr2)

We have taken the same example but with a negative sign. Let us check if we get any different hash values.

From the two outputs, we can see that even though the elements of the two arrays are essentially the same but of different signs, the hash values are not the same for the arrays.

Hash Values of a 1D Array With Duplicate Elements

Let us see how the hash function works on an array with duplicate elements.

import numpy as np
import pandas as pd
#creating array with duplicate elements
arr = np.array([1, 2,3,2,4])
h_arr = pd.util.hash_array(arr)
print("The hash values of the array are:\n",h_arr)

In the first two lines, we are importing the numpy and pandas libraries.

In the following line, we are creating a variable called arr to store the elements of the array.

Next, we are initializing another variable h_arr to store the hash values computed by the hash function.

In the last line, we are printing the hash array.

Hash Values Of 1D Array With Duplicate Elements

As you can see, in the array we have a duplicate element of 2. So we have two same hash values for both the elements at the first and third indices.

Hash Values of a 1D String Array

Let us check if we can compute the hash values for a string array.

#hash values of a string array
import numpy as np
import pandas as pd
arr2=np.array(["Hey","Hi","Hello","How are you"])
harr2=pd.util.hash_array(arr2)
print("The hash array is :\n",harr2)

Firstly, we are importing the two libraries Numpy and Pandas.

Next, we are creating a variable called arr2 to store the array. This array has four string elements.

We are creating another variable called harr2 to store the hash values computed by the function util.hash_array.

In the last line, we are printing the hash array.

Quick Note: The hashing function can only be applied to immutable objects. Hence, the hash values are unique and fixed. So, they remain the same no matter how many times the code is run.

Conclusion

To sum up everything about this tutorial, we have discussed what is a hash function and how the hash values of a 1d array can be computed with the help of pandas.util.hash_array.

Next, we have discussed what are these hash values and how they can be useful in comparing two arrays.

We have studied the definition of an array and also looked at the examples of 1D and 2D arrays.

Following that, we discussed the syntax and parameters of the function util.hash_array.

In the first example, we tried to compute the hash values of a unique positive 1D array.

In the next, we tried to compute the hash values of the same array but with a negative sign. We observed that even if the elements are the same, they get different hash values when assigned a negative sign.

In the third example, we tried to compute the hash values for an array with duplicate elements.

Lastly, we tried to compute the hash values for a string array.