Haversine Formula for Calculating GPS Distances

CALCULATING THE DISTANCE BETWEEN TWO GPS POINTS

Geospatial analysis is such an interesting field of technology that deals with latitude, longitude, locations, directions, and visualization of course. It is one of the most immersive fields to work in.

Geospatial Machine Learning is also a trending field that involves building and training machine learning models for more complex tasks involving the Earth.

In this post, we are going to try to calculate the distance and bearing between two GPS points(latitude and longitude coordinates) using the Haversine Formula.

The Haversine Formula is used to calculate the great-circle distance between two points on Earth given their latitude and longitude. This method assumes the Earth is a perfect sphere, which can result in slight inaccuracies. Python libraries, such as the haversine module, and mathematical implementations help perform these calculations efficiently

Recommended Read: Satellite Imagery using Python

Understanding the Core of the Haversine Formula

The Haversine Formula, derived from trigonometric formulas is used to calculate the great circle distance between two points given their latitudes and longitudes.

The haversine formula works well on spherical objects. It is based on an assumption that the Earth is a perfect sphere and well, we know that the Earth is not a perfect sphere so the results might differ from actual readings.

The Haversine Formula can be deduced as follows:

The Haversine Formula
The Haversine Formula

With this formula, we can compute the distance between two points as shown below.

Distance between two points
Distance between two points

The ϕ represents the latitudes of the two points and ƛ represents the longitudes respectively.

Let us see the mathematical approach to calculating the Haversine distance between two points followed by a Python library for doing the same.

Check out other approaches to calculating the distance between two points.

Calculating Distances: Haversine Formula in Python

Let us go by the mathematical deduction and implement the Haversine Formula in Python. Imagine you are moving from Manhattan to London and want to know the distance between them.

You take your phone and check the shortest distance between these two places on Google Maps. How would you know the distance if you were in a pre-tech era? Use the Haversine Formula of course!

import math

R = 6371.0


def haver_dist(lati1, long1, lati2, long2):
    lati1 = math.radians(lati1)
    long1 = math.radians(long1)
    lati2 = math.radians(lati2)
    long2 = math.radians(long2)
    dlong = long2 - long1
    dlati = lati2 - lati1
    a = (
        math.sin(dlati / 2) ** 2
        + math.cos(lati1) * math.cos(lati2) * math.sin(dlon / 2) ** 2
    )
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    distance = R * c
    return distance


def bearing(lati1, long1, lati2, long2):
    lati1 = math.radians(lati1)
    long1 = math.radians(long1)
    lati2 = math.radians(lati2)
    long2 = math.radians(long2)
    y = math.sin(long2 - long1) * math.cos(lati2)
    x = math.cos(lati1) * math.sin(lati2) - math.sin(lati1) * math.cos(
        lati2
    ) * math.cos(long2 - long1)
    intbearing = math.atan2(y, x)
    intbearing = math.degrees(intbearing)
    intbearing = (intbearing + 360) % 360
    return intbearing


lati1 = 40.7772
long1 = 73.9661
lati2 = 51.4847
long2 = 0.1279

distance = haver_dist(lati1, long1, lati2, long2)
intbearing = bearing(lati1, long1, lati2, long2)

print(f"Distance: {distance:.2f} km")
print(f"Initial Bearing: {intbearing} degrees")

We imported the math library to carry out the calculations in the first line. The standard radius of the Earth is declared in the second line, stored in R. Following that, we have created a special function to calculate the distance using the Haversine Formula, which takes the respective latitudes and longitudes of the points we choose. The latitudes and longitudes are converted into radians before use. Thereafter, we implement the formula discussed in the previous section to calculate the distance and the bearing angle between Manhattan and London.

We call the functions defined earlier, to calculate the distance and bearing. The calculated values are printed onto the screen.

Distance between Manhattan and London
Distance between Manhattan and London

The calculated distance closely matches the actual distance with the shortest distance calculated by Google Maps(5560 km).

Now let us learn how to use the Haversine module to calculate the distance.

Simplifying Distance Calculation with Python’s Haversine Module

Fortunately, Python has a module that does all the work for us to calculate the distance between two points and returns the result in any unit we want(Kilometers, Nautical Miles, Miles, and so on).

The haversine module can be installed using the command:

pip install haversine 

Now that we have the module ready, let us start using it!

from haversine import haversine, Unit

manhattan = (40.7772, 73.9661)
london = (51.4847, 0.1279)
dist = haversine(manhattan, london, unit=Unit.KILOMETERS)
print(f"The Haversine distance between Manhattan and London is {dist:.2f} kilometers.")

The haversine module and the unit module are imported from the haversine library at the beginning of the code snippet. Next, we define the coordinates of the two places in the following lines. The dist variable has the result computed by the haversine module, with the unit being kilometers. The result is displayed in the next line.

Distance calculated by haversine module
Distance calculated by haversine module

Let us take real-world data and compute the distance between two cities or locations.

Applying the Haversine Formula to Real-world Datasets

Let us talk about the Dataset first; the dataset is a collection of all the cities in India with each of their literacy rates, the total graduates in each city, and the number of male and female graduates. But what we are interested in is the location column of this dataset.

Let us take a look at the dataset.

import pandas as pd
df = pd.read_csv('/content/cities_r2.csv')
df

The pandas library is necessary to load the dataset into our environment. The loaded dataset is read into a variable called df(short for data frame), and the data frame is printed in the next line.

Geospatial Analysis Dataset
Geospatial Analysis Dataset

The dataset has many columns, but we are interested in only two columns – the name of the city and the location. So, let us make a new data frame with just these two columns.

newdf = df[['name_of_city','location']]
newdf

We selected the columns we needed from the data frame(df) and stored them in another data frame called newdf. This data frame is printed in the next line.

New data frame
New data frame

Let us now calculate the distance between the cities.

from haversine import haversine, Unit
import pandas as pd

cities = 5
distances = []
for i in range(cities):
    for j in range(i + 1, cities):
        loc1 = newdf.loc[i, "location"]
        loc2 = newdf.loc[j, "location"]
        lat1, lon1 = eval(loc1)
        lat2, lon2 = eval(loc2)
        distance = haversine((lat1, lon1), (lat2, lon2), unit=Unit.KILOMETERS)
        city1 = newdf.loc[i, "name_of_city"]
        city2 = newdf.loc[j, "name_of_city"]
        distances.append((city1, city2, distance))
dist_df = pd.DataFrame(distances, columns=["City1", "City2", "Distance (km)"])
dist_df

We have to calculate the distance between the first five cities. So we initialized a variable called cities with a value of 5. An empty list is created to store the distances. We extract the latitudes, longitudes, and city names from the data frame using a for loop. Inside this for loop itself, we are calculating the distance between the two cities. The city names and the distance are stored in a new data frame called dist_df.

Distance between two Cities
Distance between two Cities

Summary

You’re now equipped to calculate great-circle distances using both Python libraries and raw mathematical formulas. Isn’t it fascinating how mathematical formulas like the Haversine can be so applicable in our digital age?

Dataset

You can find the dataset here

References

Haversine module documentation