Curve fitting in Python: A Complete Guide

Python (1)

In this article, we’ll learn curve fitting in python in different methods for a given dataset. But before we begin, let’s understand what the purpose of curve fitting is.

The purpose of curve fitting is to look into a dataset and extract the optimized values for parameters to resemble those datasets for a given function. To do so, We are going to use a function named curve_fit().

Before getting started with our code snippet, let’s import some important modules that we need to import before getting started.

#importing modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

What is curve fitting in Python?

Given Datasets x = {x1, x2, x3 …} and y= {y1, y2, y3 …} and a function f, depending upon an unknown parameter z. We need to find an optimal value for this unknown parameter z such that the function y = f(x, z) best resembles the function and given datasets. This process is known as curve fitting.

To do so, we need to apply two different methods for our curve fitting as well.

  • Least Square Method
  • Maximum Likelihood Estimation

Least square method

In this method, We are going to minimize a functioni (f(xi , z) – yi )2 by adjusting the values in z.

We can find the optimized value after the highest minimization.

Maximum Likelihood estimation

We can use this method when we are having some errors in our datasets. that is σ.

We need to minimize the function i (f(xi , z) – yi )2/ σ2 . It gives the optimum value for z after the highest minimization of the above function.

Let’s have a look at our sample datasets as follows.

x_data = np.array([ 0.23547456, 0.15789474, 0.31578947, 0.47368421, 0.63157895, 
                   0.78947368, 0.94736842, 1.10526316, 1.26315789, 1.42105263, 
                   1.57894737, 1.73684211, 1.89473684, 2.05263158, 2.21052632, 
                   2.36842105, 2.52631579, 2.68421053, 2.84210526, 3.45454545 ])
y_data = np.array([ 2.95258285, 2.49719803, -2.1984975, -4.88744346, -7.41326345, 
                   -8.44574157, -10.01878504, -13.83743553, -12.91548145, -15.41149046, 
                   -14.93516299, -13.42514157, -14.12110495, -17.6412464 , -16.1275509 , 
                   -16.11533771, -15.66076021, -13.48938865, -11.33918701, -11.70467566])

plt.scatter(x_data , y_data)
plt.show()

The above code snippet will give the output for our sample datasets as follows.

Plotting For Sample Datasets
Plotting For Sample Datasets

Curve Fitting Example 1

To describe the unknown parameter that is z, we are taking three different variables named a, b, and c in our model. In order to determine the optimal value for our z, we need to determine the values for a, b, and c respectively. i.e. z= (a, b, c). And the function y = f (x, z) = f (x, a, b, c) = a(x-b)2 + c . Let’s move step by step.

Step 1: Defining the model function

def model_f(x,a,b,c):
  return a*(x-b)**2+c

Step 2 : Using the curve_fit() function

popt, pcov = curve_fit(model_f, x_data, y_data, p0=[3,2,-16])

In the above function, We are providing the initial values for a, b and c as p0=[3,2,-16].

The above function will return two values popt, pcov respectively.

popt
array([  4.34571181,   2.16288856, -16.22482919])

pcov
array([[ 0.19937578, -0.02405734, -0.1215353 ],
       [-0.02405734,  0.00517302,  0.00226607],
       [-0.1215353 ,  0.00226607,  0.29163784]])
  • popt : estimated optimized value for a, b, c
  • pcov: covariance matrix or errors

Now Let us plot the same function for the obtained optimized values for a, b, and c. In this case, we’re going to interpret our popt value only(Least Square Method), On the next code snippet, we will interpret our pcov value(i.e. interpreting the error value).

a_opt, b_opt, c_opt = popt
x_model = np.linspace(min(x_data), max(y_data), 100)
y_model = model_f(x_model, a_opt, b_opt, c_opt) 

plt.scatter(x_data, y_data)
plt.plot(x_model, y_model, color='r')
plt.show()

The above code snippet will give the output as given below.

Now interpreting the pcov value, We can have a better fit for the given function(Maximum likelihood Estimation). Let’s have a quick look at below code snippet.

plt.imshow(np.log(np.abs(pcov)))
plt.colorbar()
plt.show()

The above code snippet will give the output as follows.

Example 2

Let us understand with another example (for a different function) for the given datasets and try out two different methods for the same. In this example, to describe the unknown parameter z, we are taking four different variables named a, b, c and d in our model. In order to determine the optimal value for our z, we need to determine the values for a, b, c and d respectively. Let’s have a quick look at the following code snippet.

Step 1: Defining the model function

#Defining our function
def fit_f(x,a,b,c,d):
  return a*(x-b)**2+c+d*0.0001*np.cos(x)

Step 2: Using the curve_fit() function

#using our curve_fit() function
popt, pcov = curve_fit(fit_f,x_data,y_data,p0=[1,2,-16,1])

In the above function, We are providing the initial values for a, b, c and d as p0=[1,2,-16,1].

We can see the values of popt and pcov by printing the same.

popt

array([ 5.00494014e+00,  2.75689923e+00, -2.21559741e+01, -8.97724662e+04])

pcov

array([[ 1.71072218e-01,  4.21450805e-03, -4.30580853e-01,
        -5.74603933e+03],
       [ 4.21450805e-03,  3.33701247e-02, -3.97891468e-01,
        -4.49561407e+03],
       [-4.30580853e-01, -3.97891468e-01,  5.68973874e+00,
         6.50631130e+04],
       [-5.74603933e+03, -4.49561407e+03,  6.50631130e+04,
         7.82484767e+08]])

Now Let us plot the same function for the obtained optimized values for a, b, and c. In this case, were going to interpret our popt value only (Least Square method), On the next code snippet, we will interpret our pcov value(i.e. interpreting the error value).

a_opt, b_opt, c_opt, d_opt = popt
x_model = np.linspace(min(x_data), max(y_data), 100)
y_model = fit_f(x_model, a_opt, b_opt, c_opt,d_opt) 

plt.scatter(x_data, y_data)
plt.plot(x_model, y_model, color='r')
plt.show()

The above code snippet will give the output as follows.

Now interpreting the pcov(covariance error matrix) value, We can have a better fit for the given function(Maximum likelihood Estimation ). Let’s have a quick look at below code snippet.

plt.imshow(np.log(np.abs(pcov)))
plt.colorbar()
plt.show()

Summary

Today, We’ve learned about Python curve fitting. We’ve seen how to optimize a given function with the curve_fit() method for given datasets. You can take any other datasets other than our example for the same and try the above code snippets.

You can try taking the CSV files for the different datasets and extracting your optimized value for the same.

I hope you will find this one more helpful. We must visit again with some more exciting topics.