How to Build a Speech Recognition tool with Python and Flask - Tinker Tuesdays #3
Learn how to build a Speech-to-Text Transcription service on audio file uploads with Python and Flask using the SpeechRecognition module! Beginner friendly project and get experience with Get and Post requests and rendered transcribed results of a speech file.
Back with another Tinker Tuesday! Today we are building a Speech-to-Text Transcription service on audio file uploads with Python and Flask using the SpeechRecognition module! This one is one of my favorite projects as I love working with audio and a unique one to add to your portfolio!
Here's a roadmap for today's project:
- We'll learn how to use the SpeechRecognition module in Python
- Then, we'll use Flask to take in an Audio file and create both a GET and POST request on the same route
- Finally, we'll render the transcribed results of the speech file to the user.
Before we begin, I want to mention that the guide below is an abridged version of the free video tutorial. You can find more free courses and projects on my website, TheCodex to learn how to design and build applications. You can find all the code for this project at my GitHub Repo here.

Step 1: Getting the Audio File Input in Flask
The first step with this project is to build a simple Flask Web application that takes in an input audio file from the user. Let's go ahead and initialize an empty project (PyCharm is my preference) and then create the our Flask file app.py.
For now, our app.py should just contain the simple Flask structure with one route, our home page that will facilitate both the audio upload and the rendering of the transcription.
from flask import Flask, render_template, request, redirect
import speech_recognition as sr
app = Flask(__name__)
@app.route("/", methods=["GET", "POST"])
def index():
return "Hello World"
if __name__ == "__main__":
app.run(debug=True, threaded=True)
You'll notice we've already imported the speech_recognition for future use. If you need to install this module in your environment, run:
python3 -m pip install SpeechRecognition
Awesome! You'll notice that we added two methods to our route, a GET and a POST method. This is because our page needs a GET method to load the content of the site, and then the POST method will facilitate the retrieval of the audio file from the user and transcription of the audio.
The next step is to create an html template file for rendering the view for audio file input. In your project, create a new folder called templates and inside of that, create a new file called index.html with the following content:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Speech Recognition in Python</title>
<link href="https://fonts.googleapis.com/css2?family=Lato:[email protected];700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="{{ url_for('static', filename='styles/index.css') }}" />
</head>
<body>
<div id="speechContainer">
<h1>Upload new File</h1>
<form method="post" enctype="multipart/form-data">
<input type="file" name="file"/>
<br>
<input type="submit" id="submitButton" value="Transcribe"/>
</form>
</div>
</body>
</html>
And then create a new folder called static, with a sub-folder called styles. Inside of styles, create a new file called index.css and leave it blank for now.
The HTML code here enables us to get the audio file as input from the user. We created a form with a POST method that will trigger our end point in the route we setup above. There's also a bunch of extra CSS styling with id's that will be used in the last step of this blog to style our page.
To show that our page can now handle both the GET and POST request as well as render our newly created template, let's modify our app.py file slightly:
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "POST":
print("POST request received!")
return render_template('index.html')
If you go ahead and run the project now, with the command: python3 app.py, you should see a simple file upload display, that prints our above command once the submit button is clicked.

Step 2: Analyzing and Transcribing the Audio File
Now that we got a simple UI that can analyze our Audio File, let's go ahead and retrieve the file from our POST request and start analyzing it.
First, let's make sure that a file was actually sent in the POST request. If the file object exists, we also want to make sure that the file has an associated value. Right now if the Transcribe button is pressed with no file uploaded, an empty file is sent and we want to make sure we catch this edge case.
Once we've gotten the file object, we simply have to pass that into an instance of the SpeechRecognition module. There's a nifty function called sr.AudioFile that takes in an audio file as input, and returns an AudioFile instance of SpeechRecognition. Once we have that, we simply have to record the file into an object that the SpeechRecognition module can recognize and then pass it to one of the several different speech recognition platforms in the module.
To put all the above steps together, let's update our app.py as such:
@app.route("/", methods=["GET", "POST"])
def index():
transcript = ""
if request.method == "POST":
print("FORM DATA RECEIVED")
if "file" not in request.files:
return redirect(request.url)
file = request.files["file"]
if file.filename == "":
return redirect(request.url)
if file:
recognizer = sr.Recognizer()
audioFile = sr.AudioFile(file)
with audioFile as source:
data = recognizer.record(source)
transcript = recognizer.recognize_google(data, key=None)
print(transcript)
return render_template('index.html', transcript=transcript)
Breaking this down line by line, we can see that we first check to make sure "file" exists in the request's POST method. Once that's done, we ensure that the file actually has a file name.
If a file has successfully been delivered, we run the above SpeechRecognition code to convert it into an analyzable format and then run Google's Speech Recognition API on the file. Read more about the linked module to see all the amazing speech recognition tasks you can perform with it.
Note: The basic recognizer.recognize_google allows for roughly <1 minute of audio transcription. If you want to analyze larger files, you'll need to specify an actual API key / upgrade to a paid license on Google's API key. Check out SpeechRecognition on PyPI for more info: https://pypi.org/project/SpeechRecognition/
Try playing around with the recognizer object of the module. SpeechRecognition is an amazing library that hosts a wide variety of different recognizers that you can implement.

Good work, but hold up. We wrote all this code, but we don't have a file to test it with. The Google speech recognizer asks for a WAV file to be uploaded for analysis. There's a high chance you don't have a WAV file lying around. Let's head over to the Open Speech Repository and download a sample WAV file of someone speaking. Any of the WAV files listed at the domain should work. Download one upload it to your web service running on localhost.
If everything worked, you should now see the printed out transcript at the bottom of your Python console.
Step 3: Displaying the Transcription + Final Touches
Almost done! The last step we have is to take the transcription we're printing out and pass it to our template, rendering the results to our user.
We did something very similar in a previous project - Building a Weather Dashboard with Python and Flask. Check it out if you enjoyed this project and want to build more Python applications!
Let's go ahead and pass in the transcription as a variable in our render_template method. Your final app.py index function should look like this:
@app.route("/", methods=["GET", "POST"])
def index():
transcript = ""
if request.method == "POST":
print("FORM DATA RECEIVED")
if "file" not in request.files:
return redirect(request.url)
file = request.files["file"]
if file.filename == "":
return redirect(request.url)
if file:
recognizer = sr.Recognizer()
audioFile = sr.AudioFile(file)
with audioFile as source:
data = recognizer.record(source)
transcript = recognizer.recognize_google(data, key=None)
return render_template('index.html', transcript=transcript)
Now, all we have to do is use some Jinja2 in our template and render the transcript to the user. Heading back over to our index.html file, let's update the div holding our form with the following code:
<div id="speechContainer">
<h1>Upload new File</h1>
<form method="post" enctype="multipart/form-data">
<input type="file" name="file"/>
<br>
<input type="submit" id="submitButton" value="Transcribe"/>
</form>
{% if transcript != "" %}
<div id="speechTranscriptContainer">
<h1>Transcript</h1>
<p id="speechText">{{ transcript }}</p>
</div>
{% endif %}
</div>
We're writing a simple if statement in Jinja2 that only renders the transcription div if the transcription has been received from our script. If the transcript exists, we're rendering it and displaying the final text to the user.
The final task is to beautify our code. Remember that index.css file you made in Part 1? Let's head back over to that and add the following code:
h1, p, input {
font-family: 'Lato', sans-serif;
}
#speechContainer {
margin: 20px;
}
#submitButton {
background-color: #0191FE;
color: white;
border-radius: 5px;
border: none;
padding: 10px 30px;
margin-top: 20px;
}
#submitButton:hover {
cursor: pointer;
}
#speechTranscriptContainer {
margin-top: 20px;
}
#speechText {
font-size: 18px;
width: 500px;
}
Feel free to modify the CSS however you like - this just helps me beautify our Speech Recognition application. And voila! You're done. If you've followed everything along, you should see the following result when uploading you WAV file:

That's it folks! You just built an end-to-end Flask app that can take in any WAV file and transcribe the speech spoken in the audio file. You can find all the code for this project at our GitHub Repo here. As always, if you have face any troubles building this project, join our discord and TheCodex community can help!
For those of you interested in more project walkthroughs: Every Tuesday, I release a new Python/Data Science Project tutorial. I was honestly just tired of watching webcasted lectures and YouTube videos of instructors droning on with robotic voices teaching pure theory, so I started recording my own fun and practical projects. Next Tuesday, I'll be releasing a tutorial on how to build a COVID-19 Case Tracker to map the global spread of the virus!

Hey! I'm Avi - your new Python and data science teacher. I've taught over 600,000 students around the world not just how to code, but how to build real projects. I'm on a mission to help you jumpstart your career by helping you master python and data science. Start your journey on TheCodex here: https://thecodex.me/