Fetch Data From a Webpage Using Selenium [Complete Guide]

Fetch Webpage Data Using Python Selenium

In this tutorial, we will make a web scraper using Selenium to fetch data from any website. Selenium is an open-source project which is used to automate browsers. It provides a wide range of tools and libraries for automation. We can write scripts to automate the browser in various languages i.e., java, python, c#, Kotlin, etc.

Implementing a Web Scraper to Fetch data

In our example, we will demonstrate python web scraping by getting the list of most popular movies from IMDB.

Step 1. Import Modules

To begin with our web scrapper, we import Selenium and related modules

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

Step 2. Initializing WebDriver

In order to automate the browser, we need to download the WebDriver of the web browser which we intend to use. In our case, I’m using Google Chrome, so I have downloaded the chrome WebDriver.

Make sure that the Chrome version and WebDriver version are the same. We need to pass the path for the WebDriver like shown below, in the Chrome method:

driver = webdriver.Chrome('C://software/chromedriver.exe')

Step 3. Access Website Via Python

In order to access website data, we need to open the website URL which we are going to scrape.

To do that, we use the get method and pass the website URL as the method’s parameter. In our case, it is IMDB’s webpage for the most popular movies.

driver.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm")

When we run this code it will open the web browser in our computer system with the passed address (URL) website.

Step 4. Find The Specific Information You’re Scrapping

In our case, we’re looking for the names of the top-rated movies from IMDB, so we’ll find the HTML element’s XPath.

XPath can be understood as the path location to some specific event(object) in an HTML document, which is used to find or locate element/s on a webpage.

To get the XPath of an element, got to Inspect tool of the browser, then select that particular tool (of which we need to get the path) using the selector tool and right-click on the HTML code, and then select Copy XPath.

Inspect Element In Webpage
Inspect Element In Webpage

In our example, after inspecting name elements of movies it seems that every name is in the class – titleColumn, so we can pass this as xpath in our code and access the movie names.

<td class="titleColumn">
      <a href="" title="Chloé Zhao (dir.), Gemma Chan, Richard Madden">Eternals</a>        
</td>

We’ll use the method find_elements_by_xpath() to find every titleColumn class.

movies = driver.find_elements_by_xpath('//td[@class="titleColumn"]')

Note: Every xpath is preceded by the double slash. — ‘//td[@class=“titleColumn”]’ 

Step 5. Storing the Data in a Python List 

Now that we can successfully fetch the desired information, we need to store it in a variable or data structure for retrieval and processing in the later part of the code. We can store the scraped data in various data structures such as an array, list, tuple, dictionary.

Here, storing our scraped data (top-rated movie names) in a list. To do that, we can write a loop that will iterate through every movie name and store it in a list.

movies_list is an empty list that contains all the information fetched from the website.

movies_list = []
for p in range(len(movies)):
    movies_list.append(movies[p].text)

The final python code for web scraping website data is:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('C://software/chromedriver.exe')
driver.get('https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm')
movies = driver.find_elements_by_xpath('//td[@class="titleColumn"]')

movies_list = []
for p in range(len(movies)):
    movies_list.append(movies[p].text)

If we print the movies_list list in a single line then:

print(*movies_list, sep = "\n")

We get the output such as:

Scraped Web Data Output
Scraped Web Data Output

Conclusion

This is how you can scrape website data from almost any website using Selenium and Python. As long as you find the right XPath and can identify the pattern that the website uses, it becomes really easy to get access to all data on any website.

Go ahead and experiment with the same and let us know! I hope you enjoyed this tutorial. Follow AskPython.com for many more interesting tutorials.