Regular expressions, also known as regex, are an incredibly powerful tool for searching and manipulating text. Python’s regex library, re, makes it easy to match exact strings and perform other types of text processing tasks. In this article, we will explore how to use the re library to match exact strings in Python, with good implementation examples.
What are Regular expressions?
Regular expressions, often abbreviated as “regex,” are a powerful tool used in computer programming, text processing, and data validation to match, search, and manipulate text patterns. In essence, a regular expression is a sequence of characters that define a search pattern.
This pattern can be used to match a specific string, a set of strings that share a common format or structure, or even to identify and extract certain pieces of data from a larger dataset.
The syntax of regular expressions varies depending on the implementation and the specific task at hand, but it generally involves using a combination of characters and metacharacters that have special meanings when used in a certain way. Before starting with the Python regex module let’s see how to actually write regex using metacharacters or special sequences.
Metacharacters in Regex
Some of the common metacharacters used in regular expressions are:
Metacharacter | Description |
---|---|
. | Matches any single character except a newline. |
^ | Matches the beginning of a string. |
$ | Matches the end of a string. |
* | Matches zero or more occurrences of the preceding character. |
+ | Matches one or more occurrences of the preceding character. |
? | Matches zero or one occurrence of the preceding character. |
{n} | Matches exactly n occurrences of the preceding character. |
{n,} | Matches at least n occurrences of the preceding character. |
{n,m} | Matches between n and m occurrences of the preceding character. |
[] | Matches any one of the characters inside the brackets. |
[^] | Matches any one character that is not inside the brackets. |
() | Groups a sequence of characters together for use with metacharacters like * , + , and ? . |
\ | Escapes the next character, so that it is treated literally rather than as a metacharacter. For example, \. matches a literal dot, rather than any character. |
Special Sequences in Regex
Special sequences do not match for the actual character in the string instead it tells the specific location in the search string where the match must occur. It makes it easier to write commonly used patterns.
Sequence | Description | Example Query | Output |
\A | Matches if the string begins with the given characters | ‘\Aask’ | ask in python |
\b | Matches if the word begins or ends with the given character. \b(string) will check for the beginning of the word and (string)\b will check for the ending of the word. | ‘\bask’ | askpython |
\V | It is the opposite of the \b i.e. the string should not start or end with the given regex | ‘\Bask’ | table, fast |
\d | Matches any decimal digit, this is equivalent to the set class [0-9] | ‘[a-zA_z]\d’ | 123,ask2 |
\D | Matches any non-digit character, this is equivalent to the set class [^0-9] | ‘\D’ | python, start |
\s | Matches any whitespace character. | ‘\s’ | as kpython |
\S | Matches any non-whitespace character | ‘\S’ | abcd |
\w | Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_]. | ‘\w’ | 123 |
\W | Matches any non-alphanumeric character. | ‘\W’ | >$ |
\Z | Matches if the string ends with the given regex | ‘ac\Z’ | abrfac |
Regex module in python
The regex
module in Python is an alternative regular expression engine that supports several advanced features, such as recursive patterns, atomic groups, and lookbehind assertions with variable-length patterns. To install the regex
module, you can use pip
, the Python package manager. Open a command prompt or terminal and enter the following command:
pip install regex
For detailed information about the module read: Official Documentation
How to match the entire string in a regular expression?
Let’s get right into the different Python methods we can use to match strings using regular expressions.
1. Using re.search()
The re.search
method searches the given string for a match to the specified regular expression pattern. To match an exact string, you can simply pass the string as the pattern. For example:
import re
text = "The quick brown fox"
pattern = "quick"
match = re.search(pattern, text)
if match:
print("Match found!")
else:
print("Match not found")
Output: Match found!
2. Using re.match()
The re.match
method works like re.search
, but only matches the pattern at the beginning of the string. To match an exact string, you can use the ^
and $
anchors to match the start and end of the string. For example:
import re
text = "The quick brown fox"
pattern = "^The quick brown fox$"
match = re.match(pattern, text)
if match:
print("Match found!")
else:
print("Match not found")
Output:: Match found!
3. Using re.fullmatch()
The re.fullmatch
method matches the entire string against the pattern. To match an exact string, you can use the ^
and $
anchors as with re.match
. For example:
import re
text = "The quick brown fox"
pattern = "^The quick brown fox$"
match = re.fullmatch(pattern, text)
if match:
print("Match found!")
else:
print("Match not found")
Output : Match found!
4. Using re.findall()
The re.findall
method finds all non-overlapping matches of the pattern in the string, and returns them as a list. To match an exact string, you can use the ()
grouping operator to create a capturing group around the string, and then use a backreference to match the exact same string again. For example we have a text file given below:
This is a sample text file.
It contains some text that we will search using regular expressions.
We can find specific patterns of text using regular expressions.
We can read this file into a string variable using Python’s built-in open
and read
functions, and then use regular expressions to search for specific patterns of text within the file:
import re
with open('sample.txt', 'r') as f:
text = f.read()
# Find all occurrences of the word "text" in the file
matches = re.findall(r'\btext\b', text)
# Print the matches
print(matches)
Output : ['text', 'text', 'text']
In this example, we first open the sample.txt
file using the open
function, read its contents using the read
method, and assign it to the text
variable. We then use the re.findall
function to search for all non-overlapping occurrences of the word “text” in the text
string, using the regular expression pattern \btext\b
which matches the word “text” when it appears as a standalone word surrounded by word boundaries. Finally, we print the list of matches to the console.
Also read: How To Extract Emails From a Text File Using regex in Python
5. Using Pandas Series.str.extract()
Series.str
can be used to access the values of the Pandas series as strings and apply several methods to it. Pandas Series.str.extract()
function is used to extract capture groups in the regex pat as columns in a DataFrame. To show the example first let’s create a dataframe:
import pandas as pd
data = {"Name": ["John Doe", "Jane Smith", "Adam Johnson"],
"Age": [32, 25, 42],
"Email": ["[email protected]", "[email protected]", "[email protected]"]}
df = pd.DataFrame(data)
# Extract the domain names from email addresses using regex
df["Domain"] = df["Email"].str.extract(r'@(\w+\.\w+)')
print(df)
Output
Name Age Email Domain
0 John Doe 32 [email protected] example.com
1 Jane Smith 25 [email protected] example.com
2 Adam Johnson 42 [email protected] example.com
Summary
The article explains regular expressions in Python and their usage to match and manipulate text strings. It covers the syntax and metacharacters used in regular expressions, and demonstrates how to use the re
module to perform various operations on strings, such as searching, replacing, and splitting.