Python urllib: A Complete Reference

Hello everybody and welcome to another Python 3 tutorial article. In this write-up, we’re discussing the Python urllib library that’s a part of the standard Library modules.

The idea of Python urllib is that it allows you to do all sorts of amazing things that the internet allows with simple programming.

So with that let’s go ahead and get started.

Importing Python urllib

The first thing that you’re going to have to do is you’re going to need to import URLlib.

Now if you’re coming from python 2.7 you’re used to just import urllib and that’s it.

import urllib

Whereas with Python 3 and onward you will have to import the request package from the urllib.

import urllib.request

Access a website using Python urllib Module

So an example of visiting a website will be as follows.

1. GET Request to access a website

x = urllib.request.urlopen('https://www.google.com')

We will define a variable x and call the urlopen method and specify the url of the website.

Now, this will serve as an HTTP GET request to get data from the URL. We will use the read() to get the data.

print(x.read())

The above code snippet returns the source code of the page google.com. It returns all the contents of the page such as html tags and styling attributes on the python console.

However, as a user, you may not be interested in getting the source code of the page and require only the textual data.

As a normal user, you will go to the search bar in websites such as python.org and specify the content you want to search and click on the submit button.

You notice the URL in the address bar changes to the URL shown below. This URL contain some ? and & which are query parameters.

https://www.python.org/search/?q=urllib&submit=

For your further understanding, the ? and & are the search queries you provide in the search bar and these are posted to the URL. You can make a POST request to this URL to retrieve the content. But what if you have to post it from python?

2. POST Request to access a website

Apart from the request module, we will also import the parse module as this will help us parse values to our request.

import urllib.request as rq
import urllib.parse as ps

To understand the post request better, we will be using the python.org website. We will define a dictionary and this will have keys being “search parameters” and the values will be the keywords.

url='https://www.python.org/search/'
dictionary = { 'q': 'urllib' }

data = ps.urlencode(dictionary)
data = data.encode('utf-8')

req = rq.Request(url,data)
res = rq.urlopen(req)

print(res.read())

After specifying the url parameters, it is important to understand that the world wide web using the standard encoding of utf-8. So hence we will convert our url into the encoded content.

We will then pass our URL and the encoded data into the req object and issue and urlopen request to this. The response for the urlopen is stored in the res object.

This is because the website we have posted here does not grant us access to their content without use of APIs. We can use RESTFUL APIs or certain other Headers to retrieve the data. We will not be discussing this in this article.

We still get the entire web page including all the HTML tags into the python console.

Conclusion

Hope, you have understood how to issue HTTP GET and HTTP POST requests to the browser and sites using python. Do let us know your feedback in the comment section and also mention any other topics you would like to read upon.