Geospatial analysis is such an interesting field of technology that deals with latitude, longitude, locations, directions, and visualization of course. It is one of the most immersive fields to work in.
Geospatial Machine Learning is also a trending field that involves building and training machine learning models for more complex tasks involving the Earth.
In this post, we are going to try to calculate the distance and bearing between two GPS points(latitude and longitude coordinates) using the Haversine Formula.
The Haversine Formula is used to calculate the great-circle distance between two points on Earth given their latitude and longitude. This method assumes the Earth is a perfect sphere, which can result in slight inaccuracies. Python libraries, such as the haversine module, and mathematical implementations help perform these calculations efficiently
Recommended Read: Satellite Imagery using Python
Understanding the Core of the Haversine Formula
The Haversine Formula, derived from trigonometric formulas is used to calculate the great circle distance between two points given their latitudes and longitudes.
The haversine formula works well on spherical objects. It is based on an assumption that the Earth is a perfect sphere and well, we know that the Earth is not a perfect sphere so the results might differ from actual readings.
The Haversine Formula can be deduced as follows:

With this formula, we can compute the distance between two points as shown below.

The ϕ represents the latitudes of the two points and ƛ represents the longitudes respectively.
Let us see the mathematical approach to calculating the Haversine distance between two points followed by a Python library for doing the same.
Check out other approaches to calculating the distance between two points.
Calculating Distances: Haversine Formula in Python
Let us go by the mathematical deduction and implement the Haversine Formula in Python. Imagine you are moving from Manhattan to London and want to know the distance between them.
You take your phone and check the shortest distance between these two places on Google Maps. How would you know the distance if you were in a pre-tech era? Use the Haversine Formula of course!
import math
R = 6371.0
def haver_dist(lati1, long1, lati2, long2):
lati1 = math.radians(lati1)
long1 = math.radians(long1)
lati2 = math.radians(lati2)
long2 = math.radians(long2)
dlong = long2 - long1
dlati = lati2 - lati1
a = (
math.sin(dlati / 2) ** 2
+ math.cos(lati1) * math.cos(lati2) * math.sin(dlon / 2) ** 2
)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
return distance
def bearing(lati1, long1, lati2, long2):
lati1 = math.radians(lati1)
long1 = math.radians(long1)
lati2 = math.radians(lati2)
long2 = math.radians(long2)
y = math.sin(long2 - long1) * math.cos(lati2)
x = math.cos(lati1) * math.sin(lati2) - math.sin(lati1) * math.cos(
lati2
) * math.cos(long2 - long1)
intbearing = math.atan2(y, x)
intbearing = math.degrees(intbearing)
intbearing = (intbearing + 360) % 360
return intbearing
lati1 = 40.7772
long1 = 73.9661
lati2 = 51.4847
long2 = 0.1279
distance = haver_dist(lati1, long1, lati2, long2)
intbearing = bearing(lati1, long1, lati2, long2)
print(f"Distance: {distance:.2f} km")
print(f"Initial Bearing: {intbearing} degrees")
We imported the math library to carry out the calculations in the first line. The standard radius of the Earth is declared in the second line, stored in R. Following that, we have created a special function to calculate the distance using the Haversine Formula, which takes the respective latitudes and longitudes of the points we choose. The latitudes and longitudes are converted into radians before use. Thereafter, we implement the formula discussed in the previous section to calculate the distance and the bearing angle between Manhattan and London.
We call the functions defined earlier, to calculate the distance and bearing. The calculated values are printed onto the screen.

The calculated distance closely matches the actual distance with the shortest distance calculated by Google Maps(5560 km).
Now let us learn how to use the Haversine module to calculate the distance.
Simplifying Distance Calculation with Python’s Haversine Module
Fortunately, Python has a module that does all the work for us to calculate the distance between two points and returns the result in any unit we want(Kilometers, Nautical Miles, Miles, and so on).
The haversine module can be installed using the command:
pip install haversine
Now that we have the module ready, let us start using it!
from haversine import haversine, Unit
manhattan = (40.7772, 73.9661)
london = (51.4847, 0.1279)
dist = haversine(manhattan, london, unit=Unit.KILOMETERS)
print(f"The Haversine distance between Manhattan and London is {dist:.2f} kilometers.")
The haversine module and the unit module are imported from the haversine library at the beginning of the code snippet. Next, we define the coordinates of the two places in the following lines. The dist
variable has the result computed by the haversine module, with the unit being kilometers. The result is displayed in the next line.

Let us take real-world data and compute the distance between two cities or locations.
Applying the Haversine Formula to Real-world Datasets
Let us talk about the Dataset first; the dataset is a collection of all the cities in India with each of their literacy rates, the total graduates in each city, and the number of male and female graduates. But what we are interested in is the location
column of this dataset.
Let us take a look at the dataset.
import pandas as pd
df = pd.read_csv('/content/cities_r2.csv')
df
The pandas library is necessary to load the dataset into our environment. The loaded dataset is read into a variable called df(short for data frame), and the data frame is printed in the next line.

The dataset has many columns, but we are interested in only two columns – the name of the city and the location. So, let us make a new data frame with just these two columns.
newdf = df[['name_of_city','location']]
newdf
We selected the columns we needed from the data frame(df) and stored them in another data frame called newdf
. This data frame is printed in the next line.

Let us now calculate the distance between the cities.
from haversine import haversine, Unit
import pandas as pd
cities = 5
distances = []
for i in range(cities):
for j in range(i + 1, cities):
loc1 = newdf.loc[i, "location"]
loc2 = newdf.loc[j, "location"]
lat1, lon1 = eval(loc1)
lat2, lon2 = eval(loc2)
distance = haversine((lat1, lon1), (lat2, lon2), unit=Unit.KILOMETERS)
city1 = newdf.loc[i, "name_of_city"]
city2 = newdf.loc[j, "name_of_city"]
distances.append((city1, city2, distance))
dist_df = pd.DataFrame(distances, columns=["City1", "City2", "Distance (km)"])
dist_df
We have to calculate the distance between the first five cities. So we initialized a variable called cities with a value of 5. An empty list is created to store the distances. We extract the latitudes, longitudes, and city names from the data frame using a for loop. Inside this for loop itself, we are calculating the distance between the two cities. The city names and the distance are stored in a new data frame called dist_df
.

Summary
You’re now equipped to calculate great-circle distances using both Python libraries and raw mathematical formulas. Isn’t it fascinating how mathematical formulas like the Haversine can be so applicable in our digital age?