In today’s world, we get products recommended everywhere or anywhere we go. From Amazon to Flipkart to Myntra, we just keep on buying the best-rated products recommended to us. Have you ever wondered how this happens?
All this is made possible by machine learning. Machine learning models are algorithms that essentially predict a scenario based on historical data.
In this article, we will learn about the most commonly used machine learning models: linear regression, logistic regression, Decision tree, Random forests, and Support Vector Machine ( SVM ).
What is machine learning?
Machine learning essentially allows computers to predict the outcome of a specific situation based on some historical data. Machine learning is divided into two categories i.e. Supervised and Unsupervised learning.
In supervised learning, the algorithm is trained using data labels. Famous models like Linear regression, logistic regression, Support Vector Machine (SVM), Random Forests, Decision Trees, etc. are parts of supervised learning.
Unlike supervised learning, unsupervised learning is not trained using labeled data and they learn the patterns and relationships on their own. We are mainly focusing on supervised learning in this article.
Recommended: A Comprehensive Guide to Greek Math Symbols in Machine Learning
Linear Regression
Linear regression represents the overall trend in the data. It gives us the best-fit line of all the points which gives us a linear equation. Given below is the code of Linear Regression in Python. It is a type of supervised learning.
import numpy as np
import matplotlib.pyplot as plt
# Generate random data
np.random.seed(10) # Set a seed for reproducibility
x = np.random.rand(20)
y = 3 * x + 2 + np.random.randn(20) # Add some noise
# Perform linear regression using numpy.polyfit
m, b = np.polyfit(x, y, 1)
# Calculate predicted y values
y_pred = m * x + b
# Plot the data and regression line
plt.scatter(x, y, color='blue', label='Data')
plt.plot(x, y_pred, color='red', label='Regression Line')
# Add labels and title
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Linear Regression with Scatter Plot')
# Add legend
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
Let us look at the output of the above code.
We can observe the best-fit line of the randomly generated data points.
Random Forest
Random forest is a type of non-supervised machine learning which is a combination of multiple decision trees.
It is like a team of experts who give you suggestions. Let us look at the code of Random Forest below.
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
import matplotlib.pyplot as plt
# Generate random weather data
np.random.seed(42)
data = np.random.rand(100, 5) # 100 data points, 5 features
# Define feature names
features = ["Temperature", "Humidity", "Wind Speed", "Pressure", "Cloud Cover"]
# Create labels (sunny, rainy, cloudy)
labels = np.random.choice(["sunny", "rainy", "cloudy"], size=100)
# Split data into features and target variable
X = data
y = labels
# Create a Random Forest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the model
model.fit(X, y)
# Generate new data for prediction
new_data = np.random.rand(1, 5)
# Predict the weather for the new data
prediction = model.predict(new_data)[0]
# Print the features and predicted weather
print("Features:", new_data[0])
print("Predicted weather:", prediction)
# Calculate feature importances
result = permutation_importance(model, X, y, n_repeats=10)
importances = result.importances_mean
# Plot feature importances
plt.bar(features, importances)
plt.xlabel("Features")
plt.ylabel("Importance")
plt.title("Feature importances in Random Forest")
plt.show()
Below is the output of our Random Forest model. The prediction is cloudy.
Support Vector Machine
What Support Vector Machine ( SVM ) does is create an optimal decision boundary to separate different classes of data. They can be linear, polynomial, or a radial function.
Let us look at the code to understand it further.
import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
# Generate sample data
X = np.array([[1, 1], [2, 1], [3, 2], [1, 4], [2, 4], [3, 5]])
y = np.array([0, 0, 0, 1, 1, 1])
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the SVM model
clf = SVC(kernel='linear') # Use linear kernel for this example
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Print the predicted labels and actual labels
print("Predicted Labels:", y_pred)
print("Actual Labels:", y_test)
# Additional information about the model (optional)
print("Support vectors:", clf.support_vectors_)
print("Support vector indices:", clf.support_)
print("Number of support vectors for each class:", clf.n_support_)
Let us look at the output of the code above.
The above code identifies our labels correctly with the help of support vectors.
Logistic Regression
Unlike linear regression, logistic regression is used for classification purposes. We calculate the respective odds of an event and its log-likelihood. Then with the help of a sigmoid function, we can categorise different points. Let us understand it further with Python code.
import numpy as np
import matplotlib.pyplot as plt
# Generate 20 random data points
np.random.seed(10) # Set a seed for reproducibility
x = np.random.rand(20) * 10 # Random inputs between 0 and 10
y = (np.random.rand(20) > 0.5).astype(int) # Random binary labels (0 or 1)
# Define the sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Add a bias term (x0 = 1) for convenience
x = np.c_[np.ones(20), x] # Add a column of ones to the input data
# Initialize weights with random values
w = np.random.rand(2)
# Learning rate
learning_rate = 0.01
# Perform logistic regression training using gradient descent
for _ in range(1000):
# Calculate predicted probabilities
y_pred = sigmoid(np.dot(x, w))
# Calculate the error
error = y - y_pred
# Update weights using gradient descent
w += learning_rate * np.dot(x.T, error)
# Plot the data and the decision boundary
plt.scatter(x[:, 1], y, color='blue', label='Data')
# Get the line equation from the weights
m = -w[1] / w[0]
b = -w[0] / w[0]
x_line = np.linspace(0, 10, 100)
y_line = m * x_line + b
plt.plot(x_line, y_line, color='red', label='Decision Boundary')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.title('Logistic Regression with Random Data')
plt.show()
Let us look at the output of the code above.
Decision Tree
A decision tree is a branch-like structure or flowchart that branches out into different decisions with different probabilities until you reach a particular decision. Given below is an example.
Let us understand it further with a code.
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_blobs
# Create a dataset with 3 features and 2 classes
X, y = make_blobs(n_samples=1000, centers=2, n_features=3, random_state=0)
# Create a decision tree classifier
clf = DecisionTreeClassifier()
# Train the classifier on the data
clf.fit(X, y)
# Make predictions on a new data point
new_data = [[5, 3, 1]] # Example data point
prediction = clf.predict(new_data)
# Print the predicted class
print("Predicted class:", prediction[0])
Let us look at the output of the decision tree.
We can observe that our model predicts the class of our input as Class 0.
Conclusion
Here you go! Now you know a lot more about machine learning. In this article, we discussed Linear and logistic regression, SVM, Decision Trees, and Random Forests. We have not discussed unsupervised machine learning which is a whole other world.
Hope you enjoyed reading it!!
Recommended: Complete Guide to the Perceptron Algorithm in Machine Learning