How many times it has happened that we invest in a stock and one fine day we hear speculating news about it and we scuffle our mobile phone, Pcs, brokers to get a heads-up. We tend to check different prices by repeatedly typing scrip names or tapping stock from long lists on our mobile phones. In the end, you get less for much of your time.
But there are many easy ways through which we can scraBut there are many easy ways through which you can scrap stock prices very easily from your favorite stock screening websites, that too with some lines of python code. In this article, we will cover the development of data scraping from page’s HTML codes using Beautifulsoup as a python library.
What is Beautifulsoup and why are we using it?
Beautiful soup is a screen screening python library released in 2004, which is employed to extract data from websites by using HTML or XML source codes.
Though there are better web scrapping libraries like scrappy and selenium, we are using Beautifulsoup in this article as it is very user-friendly and easy to learn. Beautifulsoup struggles to extract complex data from websites such as java scripts, but simple data extraction of smaller scale can be easily done through it.
Working With BeautifulSoup in Python
There are many ways to do it and it depends on what kind of machine or OS you are running. We will cover the installation part in windows OS and PyCharm IDE, for wide-ranging audiences. Also, installing packages and creating environments is simplistic in PyCharm IDE
Python and Python pip must be installed in your machine before we venture ahead.
Open cmd and enter:
pip install beautifulsoup4
The python library will get automatically installed on your machine. Once done, we need to install parsers. Parsers are supporting python libraries for beautifulsoup, required to parse HTML and XML codes.
pip install lxml pip install requests
Installing Beautifulsoup in PyCharm
Installing python packages in PyCharm is relatively easy and hassle-free than other IDE’s, so we would be going ahead with it.
- Create a new project and attach a python file to it (with the .py extension) to it.
- Then head to File > Settings and in the left pane click on the title of the project you just created.
- The option ‘Python Interpreter’ opens a new window that contains all of the interpreters required for that project.
- Find the Plus sign directly above the ‘Package’ column and click it.
- When you click it, a new window pops up with a long list of Python interpreters.
- You must search for ‘Beautifulsoup4’ and at the bottom of the page, click Install Package.
Beautifulsoup4 will now be installed in your PyCharm system.
In order to get stock prices from HTML, we would be needing two things foremost:
- Website’s URL
- Inspecting elements of stock price’s attributes
In this article, we will be taking examples from two different websites, on understanding how to identify the right attributes to inspect.
Extracting Yahoo Finance Data
In the first example, we will be fetching the real-time price of NASDAQ through Yahoo Finance’s website. In order to do so, google ‘Nasdaq yahoo finance’. The search engine will be directly taking you to the quote price page of NASDAQ. There we need to copy the page’s URL.
Secondly, we will be requiring the attributes of the quoted price. To fetch that, select the quoted price, right-click and hit on inspect.
When the inspect menu pop-ups, required attributes would be already highlighted. In the example below, we have our required HTML code snippet highlighted, and we just need to select and copy essential data in it. We just need the code inside double-quotes.
Note: When we move the cursor over embedded code, it displays the element linked with it. In the image below, dotted lines are bordering the quoted price, as the cursor is over the HTML code embedded with it.
Extract Yahoo Finance Data Using Python BeautifulSoup
Let’s get into the code for extracting the stock data.
from bs4 import BeautifulSoup import requests url_of_page = 'https://finance.yahoo.com/quote/%5EIXIC/' def computequoteprice(): url_requests = requests.get(url_of_page) soup_ocreate = BeautifulSoup(url_requests.text, 'lxml') quote_price = soup_ocreate.find('span', class_='Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)').text return quote_price print ("Quote price= " +str(computequoteprice()))
As you can see in the code above, the website’s URL is stored in the variable ‘url’. similarly, attributes are used for the variable ‘price’. The code goes to the website’s URL and requests all the HTML data from that page. Then ‘soup.find’ code is used to search specifically for span id in that HTML code and class contains the attributes of the quoted price we want to fetch from the page.
Let’s take another example from a different website. Here Reliance industries’ stock price will be fetched from moneycontrol.com. The steps are the same, except for the difference in HTML attributes. Yahoo finance uses ‘span’ as an id whereas money control uses ‘div’ as id.
Note: Identifying the right id of the attribute is important and different websites use different id’s but the overall process is similar.
Code To Extract Stock Prices From Moneycontrol Using Python BeautifulSoup
from bs4 import BeautifulSoup import requests url_of_page = 'https://www.moneycontrol.com/india/stockpricequote/refineries/relianceindustries/RI' def computequoteprice(): url_requests = requests.get(url_of_page) soup_ocreate = BeautifulSoup(url_requests.text, 'lxml') quote_price = soup_ocreate.find('div', class_='inprice1 nsecp').text return quote_price print ("Quote price= " +str(computequoteprice()))
In this article, we have learned how stock prices can be easily fetched from stock screening websites. We also learned about the beautifulsoup library, how to install it and how it works. To learn more about stock price scrapping, you can google ‘AskPython stocks scrappy’.