In this article, we’ll learn to implement K-Nearest Neighbors from Scratch in Python. KNN is a Supervised algorithm that can be used for both classification and regression tasks.
KNN is very simple to implement. In this article, we will implement the KNN algorithm from scratch to perform a classification task.
The intuition behind the K-Nearest Neighbors Algorithm
In K-Nearest Neighbors there is no learning required as the model stores the entire dataset and classifies data points based on the points that are similar to it. It makes predictions based on the training data only.

Consider the figure above. There are two classes of data (Red and Green) and we were given a new data point (black) and asked to specify which class does this new datapoint belongs to?
Well, KNN drives on the notion that similar items tend to be closer in groups. So it is quite evident that the new data point is closer to the red group and hence the algorithm will classify this point as Red. You can read more about the algorithm on its Wiki page
Ways to calculate the distance in KNN:
- Manhattan Method
- Euclidean Method
- Minkowski Method
- mahalanobis distance
- etc..
In this article, we will be using Euclidean distance to calculate the proximity of a new data point from each point in our training dataset.
Implementing K-Nearest Neighbors from Scratch in Python
First we will figure out the steps involved in the implementation of K-Nearest Neighbors from Scratch.
Step 1. Figure out an appropriate distance metric to calculate the distance between the data points.
Step 2. Store the distance in an array and sort it according to the ascending order of their distances (preserving the index i.e. can use NumPy argsort method).
Step 3. Select the first K elements in the sorted list.
Step 4. Perform the majority Voting and the class with the maximum number of occurrences will be assigned as the new class for the data point to be classified.
Complete Python code for K-Nearest Neighbors
Now converting the steps mentioned above in code to implement our K-Nearest Neighbors from Scratch
#Importing the required modules
import numpy as np
from scipy.stats import mode
#Euclidean Distance
def eucledian(p1,p2):
dist = np.sqrt(np.sum((p1-p2)**2))
return dist
#Function to calculate KNN
def predict(x_train, y , x_input, k):
op_labels = []
#Loop through the Datapoints to be classified
for item in x_input:
#Array to store distances
point_dist = []
#Loop through each training Data
for j in range(len(x_train)):
distances = eucledian(np.array(x_train[j,:]) , item)
#Calculating the distance
point_dist.append(distances)
point_dist = np.array(point_dist)
#Sorting the array while preserving the index
#Keeping the first K datapoints
dist = np.argsort(point_dist)[:k]
#Labels of the K datapoints from above
labels = y[dist]
#Majority voting
lab = mode(labels)
lab = lab.mode[0]
op_labels.append(lab)
return op_labels
Our predict function requires a Training dataset, True Labels, Datapoints to classify, and the number of nearest neighbor (K) as the input arguments.
K-Nearest Neighbors from Scratch with the iris dataset
Now it’s time to test our implementation on some data.
#Importing the required modules
#Importing required modules
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
from numpy.random import randint
#Loading the Data
iris= load_iris()
# Store features matrix in X
X= iris.data
#Store target vector in
y= iris.target
#Creating the training Data
train_idx = xxx = randint(0,150,100)
X_train = X[train_idx]
y_train = y[train_idx]
#Creating the testing Data
test_idx = xxx = randint(0,150,50) #taking 50 random samples
X_test = X[test_idx]
y_test = y[test_idx]
#Applying our function
y_pred = predict(X_train,y_train,X_test , 7)
#Checking the accuracy
accuracy_score(y_test, y_pred)
Output:
0.98
With K equals to 7, our implemented model seems to perform very well on the given data.
Conclusion
In this article, we implemented our very own K-Nearest Neighbors from Scratch and applied it to a classification problem.
We determined the inner working of the KNN algorithm and looked into the steps involved in making the algorithm. Being so simple KNN is a very powerful and useful algorithm in Machine Learning.
If you’re interested in some related from the scratch implementations, take a look at these articles:
- Logistic Regression From Scratch
- K-Means Clustering Algorithm From Scratch in Python
- Creating Bag of Words Model from Scratch in Python
- Creating TF-IDF Model from Scratch in Python
- Linear Regression from Scratch
Till we meet next time.
Happy Learning!