Extracting Domain Name from a URL in Python

A URL or a uniform resource locator is nothing but a web address. It is used for locating resources on computer networks, mainly the world wide web.

A web address is the address on the web which contains various parts in it. Just like a normal address which consists of many different parts such as house number, street name, etc. a web address is also composed of various portions.

The domain name is one such portion of an URL. The domain name is essentially the name of the website which helps in locating the content we look at on the internet. The largest domain of all is the world wide web. Every website on the web is therefore a domain.

An URL is a link that you can click on webpages or in any other software. You can also type it directly into your web browser and visit.

Components of a URL

Let’s dive deeper into the various components of a URL and understand their significance. A web address or a URL can be broken down into smaller chunks and they are as follows:

The protocol: This is the first part of an URL. Protocols can be of the HTTP(hypertext transfer protocol) type, HTTPS(hypertext transfer protocol secured) type, this is more secure than the previous one or FTP(file transfer protocol) type.
A colon followed by two forward slashes.
The domain: The domain name can be subdivided into three parts as well. They are:
- Subdomain: It specifies the page of a website. This enables search engines to visit different pages of your website. If there is only one main page in that website, it can be “www” as well.
- Domain Name: This is the main name of your website. This is the domain name of your website. In this article we will learn how to extract this domain name from an URL in python.
- Suffix: This refers to the type of website you have. For example, organizations may use “.org” and companies mostly use “.com”.
Subdirectory(optional): This is an optional portion of an URL. If you have specific sections on your website, then this will redirect according to the subdirectory mentioned.

This is the basic structure of an URL on the world wide web, there may be added elements such as port numbers or paths in it which will ultimately take you to your desired location. Since we will learn how to extract a domain name in this tutorial, it can be done regardless of the structure of an URL.

Do check out: URL Shortener in Python – A Beginner’s Guide.

Why is a domain name important?

A domain name not only makes your company or website stand out, it adds credibility to your business. This is why domain names are rarely free and you have to pay in order to obtain a domain to your name.

A good domain name will attract traffic to your webpage and will add legitimacy to your work. An attractive domain name is easy to remember and in the long run might even become a brand in itself in e-commerce.

There are many paid services that can extract the domain name from an URL on the internet. But let’s see how we can do it for free using python in the next section.

Extracting Domain Names from URL in Python

The tldextract library is utilized to extract the domain name from a URL using Python. By installing the library using pip and importing it into your script, you can use the extract() function to find the domain name of any given URL

Run the following in your command prompt. Use it in administrator mode during installation to avoid PATH conflicts.

pip install tldextract

The tldextract is an in-built python library that extracts the different parts of an URL such as the domain name, subdomains, public suffixes etc. It contains the extract() function which is a namedtuple which makes accessing the separate parts easier. Read more about it here.

The following will extract the domain name from your URL.

import tldextract

# Get URL from user
url = input("Enter URL: ")

# Extract information from URL
extracted_info = tldextract.extract(url)

# Print all extracted information
print("The result after extraction is:", extracted_info)

# Print only the domain name
print("Domain name is:", extracted_info.domain)

The output is:

Enter URL= https://www.askpython.com/
The result after extraction is: ExtractResult(subdomain='www', domain='askpython', suffix='com')
Domain name is: askpython

Suggested: Introduction to Python Modules.

Conclusion

Now you have a clear understanding of URLs and their components, including the importance of domain names for credibility and branding. By leveraging Python and the tldextract library, you can easily extract domain names from URLs without relying on paid services. This opens up new opportunities for analyzing and working with web addresses in your projects.