Stock Price Prediction using Python

Hello there! Today we are going to learn how to predict stock prices of various categories using the Python programming language.

Stock market prediction is the act of trying to determine the future value of company stock or other financial instruments traded on an exchange.

The successful prediction of a stock’s future price could yield a significant profit. In this application, we used the LSTM network to predict the closing stock price using the past 60-day stock price.

For the application, we used the machine learning technique called Long Short Term Memory (LSTM). LSTM is an artificial recurrent neural network (RNN) architecture used in the field of deep learning.

Unlike standard feed-forward neural networks, LSTM has feedback connections. It can not only process single data points (such as images), but also entire sequences of data (such as speech or video).

LSTM is widely used for the problems of sequence prediction and been very effective

Implementation of Stock Price Prediction in Python

1. Importing Modules

First step is to import all the necessary modules in the project.

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, LSTM
import math
from sklearn.preprocessing import MinMaxScaler
```

For the project, we will be using basic modules like numpy, pandas, and matplotlib. In addition to this, we will be using some submodules of `keras` to create and build our model properly.

We would also require the math module for basic calculation and preprocessing module of sklearn to handle the data in a better and simpler way.

For the project we will be using the `all_stocks_5yrs` csv file which includes stock data for 5 years and has seven columns which are listed below.

1. Date – Format of date is: “yy-mm-dd”
2. Open – Price of the stock at open market
3. High – Highest price reached in the day
4. Low – Lowest price reached in the day
5. Close – Price of the stock at the close market
6. Volume – Number of shares traded
7. Name – The name of the stock ticker
```data=pd.read_csv("all_stocks_5yr..csv")
```

The `head` function displays first five rows of the dataset.

3. Understanding the Data

3.1 Getting Unique Stock Names

From the whole dataset, we will first extract all the unique stock ticks name with the help of `unique` function. In the dataset, we have 444 different stock names.

```all_stock_tick_names = data['Name'].unique()
print(all_stock_tick_names)
```

3.2 Extracting Data for a specific stock name

We will try to understand how the stock data works by taking an input of a stock name from the user and collecting all data of that particular stock name.

```# 1. Getting a stock name
stock_name = input("Enter a Stock Price Name: ")

# 2. Extrating all the data having the name same as the stock name entered
all_data = data['Name'] == stock_name

# 3. Putting all the rows of specific stock in a variable
final_data = data[all_data]

# 4. Printing first 5 rows of the stock data of a specific stock name
```

3.3 Visualizing the stock data

To visualize the data we will be first plotting the date vs close market prices for the FITB stock for all the data points.

To make the visualization simpler, we would be plotting the same plot but for only the first 60 data points.

```#  Plotting date vs the close market stock price
final_data.plot('date','close',color="red")

# Extract only top 60 rows to make the plot a little clearer

#  Plotting date vs the close  market stock price
new_data.plot('date','close',color="green")

plt.show()
```

4. Creating a new Dataframe and Training data

To make our study easier we will only consider the `closing market price` and predict the closing market price using Python. The whole train data preparation is shown in the steps below. Comments are added for your reference.

```# 1. Filter out the closing market price data
close_data = final_data.filter(['close'])

# 2. Convert the data into array for easy evaluation
dataset = close_data.values

# 3. Scale/Normalize the data to make all values between 0 and 1
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

# 4. Creating training data size : 70% of the data
training_data_len = math.ceil(len(dataset) *.7)
train_data = scaled_data[0:training_data_len  , : ]

# 5. Separating the data into x and y data
x_train_data=[]
y_train_data =[]
for i in range(60,len(train_data)):
x_train_data=list(x_train_data)
y_train_data=list(y_train_data)
x_train_data.append(train_data[i-60:i,0])
y_train_data.append(train_data[i,0])

# 6. Converting the training x and y values to numpy arrays
x_train_data1, y_train_data1 = np.array(x_train_data), np.array(y_train_data)

# 7. Reshaping training s and y data to make the calculations easier
x_train_data2 = np.reshape(x_train_data1, (x_train_data1.shape[0],x_train_data1.shape[1],1))
```

Here we create a data set to train the data that contains the closing price of 60 days ( 60 data points) so that we could do the prediction for the 61st closing price.

Now the x_train data set will contain a total of 60 values, the first column will contain from the index of 0 to 59 and the second column from the index of 1 to 60, and so on

The y_train data set will contain the 61st value at its first column located at index 60 and for the second column, it will contain the 62nd value located at index 61 and so on.

Converting both the independent and dependent train data set as x_train_data and y_train_data respectively, into the NumPy arrays so that they can be used to train the LSTM model.

Also, as the LSTM model is expecting the data in 3-dimensional data set, using reshape() function we will reshape the data in the form of 3-dimension.

5. Building LSTM Model

The LSTM model will have two LSTM layers with 50 neurons and two Dense layers, one with 25 neurons and the other with one neuron.

```model = Sequential()
```

6. Compiling the Model

The LSTM model is compiled using the mean squared error (MSE) loss function and the adam optimizer.

```model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train_data2, y_train_data1, batch_size=1, epochs=1)
```

Using the fit() function which is another name for train, we are training the data sets. Here, batch_size is the total number of training examples present in the single batch, and epochs are the number of iterations when an entire data set is passed forward and backward through the neural network.

7. Testing the model on testing data

The code below will get all the rows above the training_data_len from the column of the closing price. Then convert the x_test data set into the NumPy arrays so that they can be used to train the LSTM model.

As the LSTM model is expecting the data in 3-dimensional data set, using reshape() function we will reshape the data set in the form of 3-dimension.

Using the predict() function, get the predicted values from the model using the test data. And scaler.inverse_transform() function is undoing the scaling.

```# 1. Creating a dataset for testing
test_data = scaled_data[training_data_len - 60: , : ]
x_test = []
y_test =  dataset[training_data_len : , : ]
for i in range(60,len(test_data)):
x_test.append(test_data[i-60:i,0])

# 2.  Convert the values into arrays for easier computation
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0],x_test.shape[1],1))

# 3. Making predictions on the testing data
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
```

8. Error Calculation

RMSE is the root mean squared error, which helps to measure the accuracy of the model.

```rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print(rmse)
```

The lower the value, the better the model performs. The 0 value indicates the model’s predicted values match the actual values from the test data set perfectly.

rmse value we received was 0.6505512245089267 which is decent enough.

9. Make Predictions

The final step is to plot and visualize the data. To visualize the data we use these basic functions like title, label, plot as per how we want our graph to look like.

```train = data[:training_data_len]
valid = data[training_data_len:]

valid['Predictions'] = predictions

plt.title('Model')
plt.xlabel('Date')
plt.ylabel('Close')

plt.plot(train['close'])
plt.plot(valid[['close', 'Predictions']])

plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')

plt.show()
```

Conclusion

Congratulations! Today we learned how to predict stock prices using an LSTM model! And the values for actual (close) and predicted (predictions) prices match quite a lot.