How to Detect ASCII Characters in Python Strings

ASCII

There are more than letters in python strings that exist and today we will learn about them. American Standard Code for Information Interchange aka ASCII characters is a character encoding standard that defines 128 characters, including letters, digits, and punctuation marks represented by Unicode code points 0 10 127.

To detect whether these characters are present in a given string we make use of isascii() function, with that we also have ord(), regular expressions, and encode() also helps with the above.

Understanding ASCII characters

ASCII is a character encoding standard containing the below:

  • Numbers from 0-9
  • upper case and lower case alphabets from a -z and A- Z.
  • Some special characters.
ASCII Code
ASCII Code – Image source VLSIFacts

Methods for Detecting ASCII Characters in Strings in Python

Let’s get right into the different methods for detecting ASCII in strings.

Method 1: Using isascii() Function

string1 = "Hello, World!"  # contains only ASCII characters
string2 = "H€llo, Wørld!"  # contains non-ASCII characters

if string1.isascii():
    print(string1,": string1 is entirely ASCII")
else:
    print(string1,": string1 is not entirely ASCII")

if string2.isascii():
    print(string2,": string2 is entirely ASCII")
else:
    print(string2,": string2 is not entirely ASCII")

Explanation:

We mention two strings where string1 contains a string with ASCII characters and string2 does not contain any ASCII characters.The isascii() method is called to check if any ASCII character is present in any above-mentioned strings. The first if else will check ASCII characters for only string1 and second if else for string2 and a string is returned depending on them.

Output:

Ex1 Op

Method 2: Using ord() Function

string1 = "Hello, World!"  # contains only ASCII characters
string2 = "H€llo, Wørld!"  # contains non-ASCII characters

if any(ord(c) > 127 for c in string1):
    print(string1,": string1 contains non-ASCII characters")
else:
    print(string1,": string1 does not contain non-ASCII characters")

if any(ord(c) > 127 for c in string2):
    print(string2,": string2 contains non-ASCII characters")
else:
    print(string2,": string2 does not contain non-ASCII characters")

Explanation:

We pass two strings, string1 containing only ASCII characters and string2 which does not contain any ASCII characters. The ord() function is called upon each string to get its Unicode code point. >127 is used to figure out if the code point is greater than 127 and if so the character is not an ASCII character as the highest value of ASCII is 127. Strings according to the input strings are returned as result.

Output:

Ex2 Op

Method 3: Using Regular Expressions

import re

string1 = "Hello, World!"  # contains only ASCII characters
string2 = "H€llo, Wørld!"  # contains non-ASCII characters

if re.match(r'^[\x00-\x7F]+$', string1):
    print(string1,": string1 is entirely ASCII")
else:
    print(string1,": string1 is not entirely ASCII")

if re.match(r'^[\x00-\x7F]+$', string2):
    print(string2,": string2 is entirely ASCII")
else:
    print(string2,": string2 is not entirely ASCII")

Explanation:

In this example, we import re to use regular expressions. r'^[\x00-\x7F]+$' is used for matching any string that contains only ASCII characters ie code points from 0 to 27. The re.match() function is called on each string to match the regular expression if it matches it means only ASCII characters are present.

Output:

Ex3 Op

Method 4: Using encode() Function

string1 = "Hello, World!"  # contains only ASCII characters
string2 = "H€llo, Wørld!"  # contains non-ASCII characters

try:
    string1.encode('ascii')
    print(string1,": string1 is entirely ASCII")
except UnicodeEncodeError:
    print(string1,": string1 is not entirely ASCII")

try:
    string2.encode('ascii')
    print(string2,": string2 is entirely ASCII")
except UnicodeEncodeError:
    print(string2,": string2 is not entirely ASCII")

Explanation:

The basic idea of encoding is restored in this example, encode() encodes string1 and string2 for ASCII characters if any one of them contains the ASCII characters then try block is executed. If they contain non-ASCII characters then UnicodeEncodeError is executed in except block.

Output:

Ex4 Op

Wrapping Up

In this article, we learned how we can check a string for ASCII characters via isascii(),ord(), regular expressions, and encode().Checking for ASCII characters is useful in data validation, text processing, encoding and decoding, and network communication, making it an essential skill for handling text data in Python.

Browse more