Hello there! Today we are going to learn how to predict stock prices of various categories using the Python programming language.
Stock market prediction is the act of trying to determine the future value of company stock or other financial instruments traded on an exchange.
The successful prediction of a stock’s future price could yield a significant profit. In this application, we used the LSTM network to predict the closing stock price using the past 60-day stock price.
For the application, we used the machine learning technique called Long Short Term Memory (LSTM). LSTM is an artificial recurrent neural network (RNN) architecture used in the field of deep learning.
Unlike standard feed-forward neural networks, LSTM has feedback connections. It can not only process single data points (such as images), but also entire sequences of data (such as speech or video).
LSTM is widely used for the problems of sequence prediction and been very effective
Implementation of Stock Price Prediction in Python
1. Importing Modules
First step is to import all the necessary modules in the project.
import numpy as np import pandas as pd import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense, LSTM import math from sklearn.preprocessing import MinMaxScaler
We would also require the math module for basic calculation and preprocessing module of sklearn to handle the data in a better and simpler way.
2. Loading and Preparation of Data
For the project we will be using the
all_stocks_5yrs csv file which includes stock data for 5 years and has seven columns which are listed below.
- Date – Format of date is: “yy-mm-dd”
- Open – Price of the stock at open market
- High – Highest price reached in the day
- Low – Lowest price reached in the day
- Close – Price of the stock at the close market
- Volume – Number of shares traded
- Name – The name of the stock ticker
head function displays first five rows of the dataset.
3. Understanding the Data
3.1 Getting Unique Stock Names
From the whole dataset, we will first extract all the unique stock ticks name with the help of
unique function. In the dataset, we have 444 different stock names.
all_stock_tick_names = data['Name'].unique() print(all_stock_tick_names)
3.2 Extracting Data for a specific stock name
We will try to understand how the stock data works by taking an input of a stock name from the user and collecting all data of that particular stock name.
# 1. Getting a stock name stock_name = input("Enter a Stock Price Name: ") # 2. Extrating all the data having the name same as the stock name entered all_data = data['Name'] == stock_name # 3. Putting all the rows of specific stock in a variable final_data = data[all_data] # 4. Printing first 5 rows of the stock data of a specific stock name final_data.head()
3.3 Visualizing the stock data
To visualize the data we will be first plotting the date vs close market prices for the FITB stock for all the data points.
To make the visualization simpler, we would be plotting the same plot but for only the first 60 data points.
# Plotting date vs the close market stock price final_data.plot('date','close',color="red") # Extract only top 60 rows to make the plot a little clearer new_data = final_data.head(60) # Plotting date vs the close market stock price new_data.plot('date','close',color="green") plt.show()
4. Creating a new Dataframe and Training data
To make our study easier we will only consider the
closing market price and predict the closing market price using Python. The whole train data preparation is shown in the steps below. Comments are added for your reference.
# 1. Filter out the closing market price data close_data = final_data.filter(['close']) # 2. Convert the data into array for easy evaluation dataset = close_data.values # 3. Scale/Normalize the data to make all values between 0 and 1 scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(dataset) # 4. Creating training data size : 70% of the data training_data_len = math.ceil(len(dataset) *.7) train_data = scaled_data[0:training_data_len , : ] # 5. Separating the data into x and y data x_train_data= y_train_data = for i in range(60,len(train_data)): x_train_data=list(x_train_data) y_train_data=list(y_train_data) x_train_data.append(train_data[i-60:i,0]) y_train_data.append(train_data[i,0]) # 6. Converting the training x and y values to numpy arrays x_train_data1, y_train_data1 = np.array(x_train_data), np.array(y_train_data) # 7. Reshaping training s and y data to make the calculations easier x_train_data2 = np.reshape(x_train_data1, (x_train_data1.shape,x_train_data1.shape,1))
Here we create a data set to train the data that contains the closing price of 60 days ( 60 data points) so that we could do the prediction for the 61st closing price.
Now the x_train data set will contain a total of 60 values, the first column will contain from the index of 0 to 59 and the second column from the index of 1 to 60, and so on
The y_train data set will contain the 61st value at its first column located at index 60 and for the second column, it will contain the 62nd value located at index 61 and so on.
Converting both the independent and dependent train data set as x_train_data and y_train_data respectively, into the NumPy arrays so that they can be used to train the LSTM model.
Also, as the LSTM model is expecting the data in 3-dimensional data set, using reshape() function we will reshape the data in the form of 3-dimension.
5. Building LSTM Model
The LSTM model will have two LSTM layers with 50 neurons and two Dense layers, one with 25 neurons and the other with one neuron.
model = Sequential() model.add(LSTM(units=50, return_sequences=True,input_shape=(x_train_data2.shape,1))) model.add(LSTM(units=50, return_sequences=False)) model.add(Dense(units=25)) model.add(Dense(units=1))
6. Compiling the Model
The LSTM model is compiled using the mean squared error (MSE) loss function and the adam optimizer.
model.compile(optimizer='adam', loss='mean_squared_error') model.fit(x_train_data2, y_train_data1, batch_size=1, epochs=1)
Using the fit() function which is another name for train, we are training the data sets. Here, batch_size is the total number of training examples present in the single batch, and epochs are the number of iterations when an entire data set is passed forward and backward through the neural network.
7. Testing the model on testing data
The code below will get all the rows above the training_data_len from the column of the closing price. Then convert the x_test data set into the NumPy arrays so that they can be used to train the LSTM model.
As the LSTM model is expecting the data in 3-dimensional data set, using reshape() function we will reshape the data set in the form of 3-dimension.
Using the predict() function, get the predicted values from the model using the test data. And scaler.inverse_transform() function is undoing the scaling.
# 1. Creating a dataset for testing test_data = scaled_data[training_data_len - 60: , : ] x_test =  y_test = dataset[training_data_len : , : ] for i in range(60,len(test_data)): x_test.append(test_data[i-60:i,0]) # 2. Convert the values into arrays for easier computation x_test = np.array(x_test) x_test = np.reshape(x_test, (x_test.shape,x_test.shape,1)) # 3. Making predictions on the testing data predictions = model.predict(x_test) predictions = scaler.inverse_transform(predictions)
8. Error Calculation
RMSE is the root mean squared error, which helps to measure the accuracy of the model.
rmse=np.sqrt(np.mean(((predictions- y_test)**2))) print(rmse)
The lower the value, the better the model performs. The 0 value indicates the model’s predicted values match the actual values from the test data set perfectly.
rmse value we received was 0.6505512245089267 which is decent enough.
9. Make Predictions
The final step is to plot and visualize the data. To visualize the data we use these basic functions like title, label, plot as per how we want our graph to look like.
train = data[:training_data_len] valid = data[training_data_len:] valid['Predictions'] = predictions plt.title('Model') plt.xlabel('Date') plt.ylabel('Close') plt.plot(train['close']) plt.plot(valid[['close', 'Predictions']]) plt.legend(['Train', 'Val', 'Predictions'], loc='lower right') plt.show()
10. The Actual vs Predicted Values
Congratulations! Today we learned how to predict stock prices using an LSTM model! And the values for actual (close) and predicted (predictions) prices match quite a lot.
Thank you for reading!