Regular Expressions Difference Between [0-9] and [0-9.]

Regular expression, also known as ‘regex’, is one of the most powerful tools in entire programming. Working on strings becomes a lot easier with regular expressions since it creates a new dimension in string manipulation. All the stuff you see over the internet on a daily basis has applications of regular expressions.

In this article, we are going to go in-depth on Regular Expressions and learn regular expressions and how to use them effectively. In the process, we’re also going to explore what is the difference between [0-9] and [0-9.].

Origin of Regular Expressions

The origin of regular expressions dates back to the 1940s. Stephen Cole Kleene was the Pioneer of regular expressions who introduced regular expressions to the world. Ken Thompson and Rob Pike at Bel Labs were developing the Unix operating system. They were the ones who implemented the idea of regular expressions for the first time. There used regular expressions in Unix utilities like grep and ed.

After that, slowly, all the programming languages started implementing regular expressions. Python has an entire built-in module solely for regular expressions named ‘re’. re consists of a lot of functions to search for patterns in a string in different ways.

Related: Regex in Python.

What is a Regular expression?

Let’s first try to understand what is regular expressions. A regular expression is a string of metacharacters and normal characters that defines a pattern. This expression can be used to search for a specific pattern in the provided text.

What is pattern searching?

Pattern search is the process of searching for a pattern in the provided text. The pattern can be a single character like ‘a’ or something complex like a string starting with a number, having 3 a’s, and ending with an @ symbol.

Let’s consider an example to understand pattern searching better. Suppose you want to search for ‘aa’ in ‘abbaaba’. The pattern is matched in the string for once. It spans from indices 3 to 5. Now a regular expression can be created for this purpose. The regular expression for ‘aa’ would be 'a{2}'.

Why do we need regular expressions?

In the above example, we had to find ‘aa’ in a given string. This can be simply done with a for loop iterating through the entire string. Then why do we need an entire library for it? The example that we checked just now was a simple one. What if, you want the find a pattern like a string starting with an a and having any number of digits in between but should end with 2 ones? Now you may say, this is a hard one, but it’s not like it can’t be done. I would just need a little time to solve this.

This is exactly why there are regular expressions. So that you won’t need to write a lot of code for pattern searching and it can be done in seconds.

Related: Learn to extract emails from a text file using regex.

What are the metacharacters?

Metacharacters are an important part of a regular expression. They help define the pattern. They are a set of characters that, in a raw string, don’t behave like normal characters. They have an implied meaning. Let’s check a few of the most important metacharacters and what they represent.

^

^ also known as a carat, represents the starting character of the pattern.
Ex:- ^a represents a string starting with an a.

Carat in square brackets represents negation.

$

$ is used to indicate the ending character of the pattern.
Ex:- $a represents a string ending with an a.

[]

Square brackets define a set of characters that are allowed.
Ex:- [abc] represents a single character which is from {a, b, c}.

+

+ matches one or more occurrences of the preceding character or group.
Ex:- a+ matches with a, aab, aaaa.

{}

{n} matches n times of preceding characters.
Ex:- a{3} doesn’t match with aa but matches with aaa.

[m-n]

[m-n] represents a set of characters from the range m to n.
Ex:- [a-z] represents all lowercase characters; consequently, [^a-z] represents any character but a lowercase character.

\

To escape the metacharacters. To escape \ you can use another \ resulting \\.
Ex:- \+ represents ‘+’ character.

\d

Represents any digit.

\n

Represents a new line element.

\w

Represents [A-Za-z0-9_].

\W

Represents [^A-Za-z0-9_].

How to write a regular expression?

Okay, now that we know what a regular expression and meta characters are, we can learn how to write a regular expression.

Write a regular expression for a string starting with a lowercase alphabet and having any number of alphabets before it ends with a single-digit number.

^[a-z]+[0-9]{1}$

^ is for specifying the starting character, and [] is for specifying a range. + is for 1 to n number of repetitions of the preceding character. {} to specify the number of preceding previous characters. So the above expression says there must be a single-digit number following n number of lowercase letters only.

So whenever we see a pattern, we break it down into pieces and then try to construct each sub-pattern and later combine all of them.

Difference between [0-9] and [0-9.]

Now that we got pretty experienced with regular expressions, we are ready to understand this article’s main question: What is the difference between [0-9] and [0-9.]? Try to guess it now. It’s very simple.

[0-9]

[m-n] represents a range of characters from m to n. Consequently, [0-9] represents a range of characters from 0 to 9.

[.]

[.] represents a period. That means it will only match to a period.

[0-9.]

[0-9.] is a conjunction set of [0-9] and [.]. It represents a range of characters from 0 to 9 and a period(.).

[0-9] vs [0-9.]

[0-9] specifies any digit, and [0-9.] represents any digit or a period(.).
To be more clear, [0-9] represents the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and [0-9.] represents the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, .}.
That means “1” will match with both [0-9] and [0-9.] but “.” will match only with [0-9.] but not [0-9].

Avoiding Pitfalls and Misunderstandings

Often beginners think the period(.) here is some kind of metacharacter. The period(.) is a metacharacter, though. It represents and single character excluding an EOL(end of the line). But here, it’s just a period(.) character. Don’t get it confused with the metacharacter here.

People also sometimes forget that [] is a set of ranges and individual characters. So they get confused when they see a period with a range of characters.

Whenever you write regular expressions, remember the steps, break the problem pattern into sub-problems, and write regular expressions for each sub-problem and later combine them. Whenever we write regular expressions for subproblems, always remember to test them and also test them when they’re combined. This will reduce the chances of errors.

Conclusion

Regex can be hard for beginners as it has a lot of metacharacters with their meanings. Let alone memorizing, learning what each of these metacharacters does can be a challenge in itself. The key to Regular expression is practice. You need to keep practicing more and more until you become completely thorough with all the metacharacters and different patterns.

References

Stack Overflow answer for the same question.

Official Python Documentation.