Crime Prediction in Python – A Complete Guide

FeaImg Crime Prediction

We’re covering how to perform crime prediction in Python today. In today’s world, crime is rising on a daily basis, and the number of law enforcement officers is decreasing, therefore we may utilize machine learning models to predict if a person is a criminal or not.

Implementing Crime Prediction in Python

In this article, we will develop a model to predict whether or not a person is a criminal based on some of their characteristics.

The dataset is taken from techgig. You can get a Python notebook, data dictionary, and dataset here.

Step 1 : Import all needed libraries

Before we get into the main part of crime prediction, let’s import the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Step 2 :  Load the dataset

The next step is to load the data file into our program using the read_csv function of the pandas module.

df = pd.read_csv('train.csv')

Step 3 : Data Cleaning

The next step is to see whether there are any missing values in it. For the sake of this tutorial, we have removed all of the missing values.


Step 4 : Train-Test Split

In this step, the data is split into training and testing datasets using the 80-20 rule and sklearn library functions.

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix , plot_roc_curve
from imblearn.over_sampling import SMOTE
smote = SMOTE()

#stratify for equal no. of classes in train and test set
x_train,x_test ,y_train,y_test = train_test_split(df.iloc[:,1:-1],df.iloc[:,-1], stratify=df.iloc[:,-1],test_size=0.2 ,random_state = 42)

X_re ,y_re= smote.fit_resample(x_train,y_train)

To address the issue of imbalance in criminal classes, we employ SMOTE (Synthetic Minority Oversampling Approach), a dataset-balancing technique. We will only balance training data and not test data.

In summary, Smote uses clustering to produce new instances of the imbalance class for oversampling.

Step 5 : Creating a Tree Based Classifier

Tree-based models may be used for numerous category characteristics. ExtraTreesClassifier was utilised.

clf = ExtraTreesClassifier(),y_re)

The output displayed a score of 0.94335 which is pretty good if we look at it.

Step 6 : Display the ROC Curve

Finally, let’s plot the ROC curve for our model using the code mentioned below.

plot_roc_curve( clf,x_test,y_test)
ROCCurve CrimePrediction
ROCCurve CrimePrediction


Congratulations! You just learned how to build a crime predictor using the Python programming language and Machine Learning. Hope you enjoyed it! 😇

Liked the tutorial? In any case, I would recommend you to have a look at the tutorials mentioned below:

  1. Stock Price Prediction using Python
  2. Crypto Price Prediction with Python
  3. Stock Price Prediction using Python
  4. Box Office Revenue Prediction in Python – An Easy Implementation

Thank you for taking your time out! Hope you learned something new!! 😄