Python – An Introduction to NumPy Arrays

NumPy is the most commonly used scientific computing Python library. It provides a fast Pythonic interface, while still using the much faster C++ under the hood for computation. This ensures that the high-level readability and Pythonic features are still present while making the actual computation much faster than what pure Python code could.

Here, we look at the data structure behind which NumPy does all of its work and how we could transform it in different ways similar to how we would manipulate other array-like data structures.

The NumPy Array object

To declare a numpy array object, we first import the numpy library, following which we instantiate our newly created array using the np.array() library function.

The below snippet declares a simple 1-Dimensional numpy array:

>>> import numpy as np
>>> a = np.array([1, 2, 3, 4])
>>> print(a)
[1 2 3 4]

Each array has the following attributes :

  • ndim (the number of dimensions)
  • shape (the size of each dimension)
  • size (the total size of the array)
  • dtype (the datatype of the array)

NumPy array elements have the same data type, unlike Python lists. We cannot make a single numpy array hold multiple different data types as a result.

To declare a higher dimensional array, it is similar to declaring a higher dimensional array in any other language, using the appropriate matrix that represents the entire array.

# Declare a 2-Dimensional numpy array
b = np.array([[1, 2, 3], [4, 5, 6]])
print("b -> ndim:", b.ndim)
print("b -> shape:", b.shape)
print("b -> size:", b.size)
print("b -> dtype:", b.dtype)

Output:

b -> ndim: 2
b -> shape: (2, 3)
b -> size: 6
b -> dtype: dtype('int64')

Accessing NumPy Array Elements

Similar to accessing list elements and array elements in Python, numpy arrays are accessed in the same way.

To access individual elements in multidimensional arrays, we use comma-separated indices for each dimension.

>>> b[0]
array([1, 2, 3])
>>> b[1]
array([4, 5, 6])
>>> b[-1]
array([4, 5, 6])
>>> b[1, 1]
5

NumPy Array slicing

Once again, similar to the Python standard library, NumPy also provides us with the slice operation on numpy arrays, using which we can access the array slice of elements to give us a corresponding subarray.

>>> b[:]
array([[1, 2, 3],
       [4, 5, 6]])
>>> b[:1]
array([1, 2, 3])

In fact, this is the widely recommended way to use NumPy arrays, due to the highly optimized nature of the numpy operations. Since native python methods are quite slow in comparison, we should only use numpy methods to manipulate numpy arrays. Pure Python iterative loops and other list comprehensions are not used with numpy as a result.


Other ways to generate numpy arrays

We can use numpy built-in arange(n) method to construct a 1-Dimensional array consisting of the numbers 0 to n-1.

>>> c = np.arange(12)
>>> print(c)
[0 1 2 3 4 5 6 7 8 9 10 11]
>>> c.shape
(12,)

Using random.randint(limit, size=N) generates a random integer array with all elements between 0 and limit, and with a size of N, specified as a keyword argument.

>>> d = np.random.randint(10, size=6)
>>> d
array([7, 7, 8, 8, 3, 3])
>>> e = np.random.randint(10, size=(3,4))
>>> e
array([[2, 2, 0, 5],
       [8, 9, 7, 3],
       [5, 7, 7, 0]])

Manipulating NumPy Arrays

NumPy provides a method reshape(), which can be used to change the dimensions of the numpy array and modify the original array in place. Here, we show an illustration of using reshape() to change the shape of c to (4, 3)

>>> c
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> c.shape
(12,)
>>> c.reshape(4, 3)
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Since numpy operations are designed to be highly optimized, any subarray that is created from an array is still holding the reference to the original array. This means that if the subarray is modified in place, the original array is also modified.

>>> f = e[:3, :2]
>>> f
array([[2, 2],
       [8, 9],
       [5, 7]])
>>> f[0,0] *= 3
>>> f
array([[6, 2],
       [8, 9],
       [5, 7]])
>>> e
array([[6, 2, 0, 5],
       [8, 9, 7, 3],
       [5, 7, 7, 0]])

Here, the original array e is also modified with any change in the subarray slice f. This is because numpy slices only return a view of the original array.

To ensure that the original array is not modified with any change in the subarray slice, we use numpy copy() method to create a copy of the array and modify the cloned object, instead of dealing with a reference of the original object.

The below snippet shows how copy deals with this issue.

>>> e
array([[6, 2, 0, 5],
       [8, 9, 7, 3],
       [5, 7, 7, 0]])
>>> f = e[:3, :2].copy()
>>> f
array([[6, 2],
       [8, 9],
       [5, 7]])
>>> f[0,0] = 100
>>> f
array([[100,   2],
       [  8,   9],
       [  5,   7]])
>>> e
# No change is reflected in the original array
# We are safe!
array([[6, 2, 0, 5],
       [8, 9, 7, 3],
       [5, 7, 7, 0]])

Conclusion

In this article, we learned about numpy arrays and some elementary operations and manipulations involving them, including their attributes, array slicing, reshaping, and copying.

References

NumPy Docs