Hey there! Today let’s learn about converting speech to text using the
speech recognition library in Python programming language. So let’s begin!
Introduction to Speech Recognition
Speech recognition is defined as the automatic recognition of human speech and is recognized as one of the most important tasks when it comes to making applications like Alexa or Siri.
Python comes with several libraries which support speech recognition feature. We will be using the
speech recognition library because it is the simplest and easiest to learn.
Importing Speech Recognition Module
The first step, as always, is to import the required libraries. In this case, we only need to import the
import speech_recognition as SR
If the statement gives an error, you might need to install the library using the
Implementing Speech Recognition in Python
To convert speech from our audio to text, we need the
Recognizer class from the
speech_recognition module to create an object which contains all the necessary functions for further processing.
1. Loading Audio
Before we continue, we’ll need to download an audio file. The one I used to get started is a speech from Emma Watson which can be found here.
We download the audio file and converted it into
wav format because it works best to recognize speech. But make sure you save it to the same folder as your Python file.
To load audio we will be using the
AudioFile function. The function opens the file, reads its contents and store all the information in an AudioFile instance called
We will traverse through the source and do the following things:
- Every audio has some
noiseinvolved which can be removed using the
- Making use of the
recordmethod which reads the audio file and stores certain information into a variable to be read later on.
The complete code to load the audio is mentioned below.
import speech_recognition as SR SR_obj = SR.Recognizer() info = SR.AudioFile('speech.wav') with info as source: SR_obj.adjust_for_ambient_noise(source) audio_data = SR_obj.record(source,duration=100)
Here we have also mentioned a parameter known as
duration because it will take a lot more time to recognize speech for a longer audio. So will will only be taking first 100 seconds of the audio.
2. Reading data from audio
Now that we have successfully loaded the audio, we can now invoke
recognize_google() method and recognize any speech in the audio.
The method can take several seconds depending on your internet connection speed. After processing the method returns the best possible speech that the program was able to recognize from the first 100 seconds.
The code for the same is shown below.
import speech_recognition as SR SR_obj = SR.Recognizer() info = SR.AudioFile('speech.wav') with info as source: SR_obj.adjust_for_ambient_noise(source) audio_data = SR_obj.record(source,duration=100) SR_obj.recognize_google(audio_data)
The output comes out to be a bunch of sentences from the audio which turn out to be pretty good. The accuracy can be increased by the use of more functions but for now it does the basic functionalities.
"I was appointed 6 months and I have realised for women's rights to often become synonymous with man heating if there is one thing I know it is that this has to stop someone is by definition is the belief that men and women should have equal rights and opportunities is the salary of the economic and social policy of the success of a long time ago when I was 8 I was confused sinkhole but I wanted to write the play Aise the width on preparing for the 14 isostasy sacralized elements of the media 15 my girlfriend Statue of Liberty sports team because they don't want to pay monthly 18 18 Mai Mela friends were unable to express their feelings I decided that I am business analyst at the seams and complicated to me some recent research has shown me feminism has become"
Congratulalations! Today in this tutorial you learned about recognizing speech from audio and displaying the same on your screen.
I would also like to mention that speech recognition is a very deep and vast concept, and what we have learned here barely scratches the surface of the whole subject.
Thank you for reading!