Python’s ‘re’ module, short for regular expressions, provides a powerful toolset for pattern recognition in text. There is a module called “re” which stands for regular expression in Python which contains various in-built functions for performing special operations on text and string objects.
Unlike standard string objects, regular expressions account for patterns of various characters, including symbols and special characters
In simple terms, a regular expression is a string of characters that uses search patterns to find a particular sequence of characters. Using the “re” module, it is very easy to identify and find key words that we might need for various purposes.
This library is like another language and can be used through a Python code.
Exploring the re Module: Common Commands and Functions
Like many other in built methods, the regular expression module is also an in-built library for dealing with expressions in Python. There are some commands in regular expressions which are used to identify specific patterns or characters and a ton of functions stored in this module.
Some of the very easy commands are:
\w:This command is used to match any alphabet and English letters and numbers, that is, alphanumeric characters including upper and lower cases [a-z,A-Z,0-9_]
\d:This command is used to match or identify decimal digits or numbers in the base10 system[0-9]
\D:This is used to identify and match non-numeric characters from ^0-9.
\W:This is used to match with non-alphanumeric characters.
\s:This command matches any whitespace characters.
\S:This matches any non-whitespace characters in the given expression.
Some functionalities present in this module are:-
re.search() ->This function searches for a specific pattern and returns where the expression matches.
re.match()->This function is used to match the beginning of a string with the given expression.
re.sub()->This function replaces the matched string of the expression and replaces it with something else.
Related: Matching Entire Strings in Python using Regular Expressions.
Understanding re.compile() Method
re.compile() is used to compile a regular expression like a string into a regular expression(regex) object. The syntax of the re.compile() method is written as follows:
re.compile(string, count or flag)
string-> This is the required pattern that needs to be compiled into a regex pattern object.
count or flag-> This is an optional parameter which can be specified with different regular expression values.
Example: Case-Insensitive Matching with re.compile()
In the following example, we will use re.compile() function to match a string without it being case sensitive. Hence in this case, re.IGNORECASE , will help us solve this problem where the match will be found even if the letters are upper or lower cases.
#importing module import re #assigning text txt="HelloWoRld" # using re.compile and re.IGNORECASE cpt=re.compile("hELLOWORld", re.IGNORECASE) print(cpt.match(txt))
In this program, our string called “txt” is the same as the text specified as the argument in the re.compile() function, the only difference is their cases. Hence the re.IGNORECASE is ignoring the cases and match() is trying to find the perfect match. The output should return a re.match object. Let’s take a look at the output:
<re.Match object; span=(0, 10), match='HelloWoRld'>
There are other ways to find similar patterns in strings of characters where there are mixed case characters and we need to ignore them without using re. compile(). We can use the
re.search()function and using re.IGNORECASE as the parameter of that function like the flag. Or, Using the
re.match()function directly and using the re.IGNORECASE as the argument of that function.
Do check out: How to Detect ASCII Characters in Python Strings.
Alternative Methods for Case-Insensitive Matching
There are other ways to find similar patterns in strings of characters where there are mixed case characters and we need to ignore them without using re. compile().
We can use two different functions:
We can use the
re.search() function and using re.IGNORECASE as the parameter of that function like the flag. Let’s take a look at how we can do that:
#importing module import re #assigning text txt="askpyTHON" print(re.search('ASKpython',txt,re.IGNORECASE))
The output would be:
<re.Match object; span=(0, 9), match='askpyTHON'>
re.match() function directly and using the re.IGNORECASE as the argument of that function.
#importing module import re #assigning text txt="ASSIgnmeNT" print(re.match('ASSIGNMENT',txt,re.IGNORECASE))
The output would be:
<re.Match object; span=(0, 10), match='ASSIgnmeNT'>
Conclusion and Further Reading