Python Speech Recognition Module - A Complete Introduction

Hey there! Today let’s learn about converting speech to text using the speech recognition library in Python programming language. So let’s begin!

Introduction to Speech Recognition

Speech recognition is defined as the automatic recognition of human speech and is recognized as one of the most important tasks when it comes to making applications like Alexa or Siri.

Python comes with several libraries which support speech recognition feature. We will be using the speech recognition library because it is the simplest and easiest to learn.

Importing Speech Recognition Module

The first step, as always, is to import the required libraries. In this case, we only need to import the speech_recognition library.

import speech_recognition as SR

If the statement gives an error, you might need to install the library using the pip command.

Implementing Speech Recognition in Python

To convert speech from our audio to text, we need the Recognizer class from the speech_recognition module to create an object which contains all the necessary functions for further processing.

1. Loading Audio

Before we continue, we’ll need to download an audio file. The one I used to get started is a speech from Emma Watson which can be found here.

We download the audio file and converted it into wav format because it works best to recognize speech. But make sure you save it to the same folder as your Python file.

To load audio we will be using the AudioFile function. The function opens the file, reads its contents and store all the information in an AudioFile instance called source.

We will traverse through the source and do the following things:

Every audio has some noise involved which can be removed using the adjust_for_ambient_noise function.
Making use of the record method which reads the audio file and stores certain information into a variable to be read later on.

The complete code to load the audio is mentioned below.

import speech_recognition as SR
SR_obj = SR.Recognizer()

info = SR.AudioFile('speech.wav')
with info as source:
    SR_obj.adjust_for_ambient_noise(source)
    audio_data = SR_obj.record(source,duration=100)

Here we have also mentioned a parameter known as duration because it will take a lot more time to recognize speech for a longer audio. So will will only be taking first 100 seconds of the audio.

2. Reading data from audio

Now that we have successfully loaded the audio, we can now invoke recognize_google() method and recognize any speech in the audio.

The method can take several seconds depending on your internet connection speed. After processing the method returns the best possible speech that the program was able to recognize from the first 100 seconds.

The code for the same is shown below.

import speech_recognition as SR
SR_obj = SR.Recognizer()

info = SR.AudioFile('speech.wav')
with info as source:
    SR_obj.adjust_for_ambient_noise(source)
    audio_data = SR_obj.record(source,duration=100)
SR_obj.recognize_google(audio_data)

The output comes out to be a bunch of sentences from the audio which turn out to be pretty good. The accuracy can be increased by the use of more functions but for now it does the basic functionalities.

"I was appointed 6 months and I have realised for women's rights to often become synonymous with man heating if there is one thing I know it is that this has to stop someone is by definition is the belief that men and women should have equal rights and opportunities is the salary of the economic and social policy of the success of a long time ago when I was 8 I was confused sinkhole but I wanted to write the play Aise the width on preparing for the 14 isostasy sacralized elements of the media 15 my girlfriend Statue of Liberty sports team because they don't want to pay monthly 18 18 Mai Mela friends were unable to express their feelings I decided that I am business analyst at the seams and complicated to me some recent research has shown me feminism has become"

Conclusion

Congratulalations! Today in this tutorial you learned about recognizing speech from audio and displaying the same on your screen.

I would also like to mention that speech recognition is a very deep and vast concept, and what we have learned here barely scratches the surface of the whole subject.

Thank you for reading!