Matching Entire Strings in Python using Regular Expressions

Regular Expressions

Regular expressions, also known as regex, are an incredibly powerful tool for searching and manipulating text. Python’s regex library, re, makes it easy to match exact strings and perform other types of text processing tasks. In this article, we will explore how to use the re library to match exact strings in Python, with good implementation examples.

What are Regular expressions?

Regular expressions, often abbreviated as “regex,” are a powerful tool used in computer programming, text processing, and data validation to match, search, and manipulate text patterns. In essence, a regular expression is a sequence of characters that define a search pattern.

This pattern can be used to match a specific string, a set of strings that share a common format or structure, or even to identify and extract certain pieces of data from a larger dataset.

The syntax of regular expressions varies depending on the implementation and the specific task at hand, but it generally involves using a combination of characters and metacharacters that have special meanings when used in a certain way. Before starting with the Python regex module let’s see how to actually write regex using metacharacters or special sequences.

Metacharacters in Regex

Some of the common metacharacters used in regular expressions are:

MetacharacterDescription
.Matches any single character except a newline.
^Matches the beginning of a string.
$Matches the end of a string.
*Matches zero or more occurrences of the preceding character.
+Matches one or more occurrences of the preceding character.
?Matches zero or one occurrence of the preceding character.
{n}Matches exactly n occurrences of the preceding character.
{n,}Matches at least n occurrences of the preceding character.
{n,m}Matches between n and m occurrences of the preceding character.
[]Matches any one of the characters inside the brackets.
[^]Matches any one character that is not inside the brackets.
()Groups a sequence of characters together for use with metacharacters like *, +, and ?.
\Escapes the next character, so that it is treated literally rather than as a metacharacter. For example, \. matches a literal dot, rather than any character.
Metacharacters

Special Sequences in Regex

Special sequences do not match for the actual character in the string instead it tells the specific location in the search string where the match must occur. It makes it easier to write commonly used patterns.

SequenceDescriptionExample QueryOutput
\AMatches if the string begins with the given characters‘\Aask’ask in python
\bMatches if the word begins or ends with the given character. \b(string) will
check for the beginning of the word and (string)\b will check for the ending of the word.
‘\bask’askpython
\VIt is the opposite of the \b i.e. the string should not start or end with the given regex‘\Bask’table, fast
\dMatches any decimal digit, this is equivalent to the set class [0-9]‘[a-zA_z]\d’123,ask2
\DMatches any non-digit character, this is equivalent to the set class [^0-9]‘\D’python, start
\sMatches any whitespace character.‘\s’as kpython
\SMatches any non-whitespace character‘\S’abcd
\wMatches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_].‘\w’123
\WMatches any non-alphanumeric character.‘\W’>$
\ZMatches if the string ends with the given regex‘ac\Z’abrfac
Special Sequences

Regex module in python

The regex module in Python is an alternative regular expression engine that supports several advanced features, such as recursive patterns, atomic groups, and lookbehind assertions with variable-length patterns. To install the regex module, you can use pip, the Python package manager. Open a command prompt or terminal and enter the following command:

pip install regex

For detailed information about the module read: Official Documentation

How to match the entire string in a regular expression?

Let’s get right into the different Python methods we can use to match strings using regular expressions.

The re.search method searches the given string for a match to the specified regular expression pattern. To match an exact string, you can simply pass the string as the pattern. For example:

import re

text = "The quick brown fox"
pattern = "quick"

match = re.search(pattern, text)
if match:
    print("Match found!")
else:
    print("Match not found")

Output: Match found!

2. Using re.match()

The re.match method works like re.search, but only matches the pattern at the beginning of the string. To match an exact string, you can use the ^ and $ anchors to match the start and end of the string. For example:

import re

text = "The quick brown fox"
pattern = "^The quick brown fox$"

match = re.match(pattern, text)
if match:
    print("Match found!")
else:
    print("Match not found")

Output:: Match found!

3. Using re.fullmatch()

The re.fullmatch method matches the entire string against the pattern. To match an exact string, you can use the ^ and $ anchors as with re.match. For example:

import re

text = "The quick brown fox"
pattern = "^The quick brown fox$"

match = re.fullmatch(pattern, text)
if match:
    print("Match found!")
else:
    print("Match not found")

Output  : Match found!

4. Using re.findall()

The re.findall method finds all non-overlapping matches of the pattern in the string, and returns them as a list. To match an exact string, you can use the () grouping operator to create a capturing group around the string, and then use a backreference to match the exact same string again. For example we have a text file given below:

This is a sample text file.
It contains some text that we will search using regular expressions.
We can find specific patterns of text using regular expressions.

We can read this file into a string variable using Python’s built-in open and read functions, and then use regular expressions to search for specific patterns of text within the file:

import re

with open('sample.txt', 'r') as f:
    text = f.read()

# Find all occurrences of the word "text" in the file
matches = re.findall(r'\btext\b', text)

# Print the matches
print(matches)
Output  : ['text', 'text', 'text']

In this example, we first open the sample.txt file using the open function, read its contents using the read method, and assign it to the text variable. We then use the re.findall function to search for all non-overlapping occurrences of the word “text” in the text string, using the regular expression pattern \btext\b which matches the word “text” when it appears as a standalone word surrounded by word boundaries. Finally, we print the list of matches to the console.

Also read: How To Extract Emails From a Text File Using regex in Python

5. Using Pandas Series.str.extract()

Series.str can be used to access the values of the Pandas series as strings and apply several methods to it. Pandas Series.str.extract() function is used to extract capture groups in the regex pat as columns in a DataFrame. To show the example first let’s create a dataframe:

import pandas as pd

data = {"Name": ["John Doe", "Jane Smith", "Adam Johnson"],
        "Age": [32, 25, 42],
        "Email": ["[email protected]", "[email protected]", "[email protected]"]}

df = pd.DataFrame(data)

# Extract the domain names from email addresses using regex
df["Domain"] = df["Email"].str.extract(r'@(\w+\.\w+)')

print(df)
Output
         Name  Age                   Email         Domain
0      John Doe   32      [email protected]     example.com
1    Jane Smith   25   [email protected]     example.com
2  Adam Johnson   42        [email protected]     example.com

Summary

The article explains regular expressions in Python and their usage to match and manipulate text strings. It covers the syntax and metacharacters used in regular expressions, and demonstrates how to use the re module to perform various operations on strings, such as searching, replacing, and splitting.