[Workshop] Quickstart to Speech Recognition: Google Speech-to-Text with Colab for Beginners

3 min readMar 23, 2024

Introduction

Google’s Speech-to-Text API makes it easy to turn spoken words into written text, useful for things like writing out meetings, making chatbots, adding captions to videos, analyzing feelings in spoken words, and making learning resources more accessible. This guide will show you how to use this cool tool to start turning speech into text quickly and easily.

Prerequisites

Ensure you have the following ready:

A Google account
Access to Google Colab
A GCP account with available credit

Colab with Full Code

https://colab.research.google.com/drive/1_2VmDQZ1BMv4RkS5xrSYseNJMLY3hFyG?usp=sharing

Step 1: Create a Project in GCP

Visit the GCP Console Project Selector page: https://console.cloud.google.com/projectselector2/
Sign in with your Google account if prompted.
Click on the “New Project” button.
Enter a project name and click “Create” to initialize your new project.

Step 2: Enable the Cloud Speech-to-Text API

In your GCP project dashboard, navigate to the “APIs & Services” dashboard.
Click “Enable APIs and Services”.
Search for “Speech-to-Text API” and select it.
Click “Enable” to activate the API for your project.

Preparing Google Colab

Step 1: Authentication

Open Google Colab: Google Colab
Install the Google Cloud SDK by running the following command in a cell:

!pip install google-cloud

3. Authenticate your session with the following code:

from google.colab import auth
# Replace 'your-project-id' with your actual GCP project ID.
# the first image shows an example of project id
auth.authenticate_user(project_id='your-project-id')

Speech Recognition with Python

Import the necessary library:

from google.cloud import speech

Set up your GCP credentials (this is usually handled automatically in Colab after authentication).

Prepare your audio file (audiofile.wav) by uploading it to Colab.
Use the following script to transcribe the audio file:

# Initialize the speech client to interact with the Google Cloud Speech-to-Text API.
client = speech.SpeechClient()

# Specify the name of the audio file to be transcribed.
file_name = "audiofile.wav"

# Open the audio file in read-binary mode and read its content.
with open(file_name, "rb") as audio_file:
    audio_content = audio_file.read()

# Prepare the audio content for recognition by wrapping it in a RecognitionAudio object.
audio = speech.RecognitionAudio(content=audio_content)

# Configure the recognition request settings such as audio encoding, language, model type,
# number of audio channels, and whether to enable word time offsets in the response.
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,  # Specify the audio encoding type.
    language_code="es-US",  # Set the language of the audio content.
    model="default",  # Use the default model for recognition.
    audio_channel_count=1,  # Indicate that the audio is mono (one channel).
    enable_word_time_offsets=True,  # Request time offsets for each word in the transcript.
)

# Initiate an asynchronous request for speech recognition and wait for it to complete.
operation = client.long_running_recognize(config=config, audio=audio)
print("Waiting for operation to complete...")
response = operation.result(timeout=90)  # Wait up to 90 seconds for the operation to complete.

# Iterate through the results and print the transcript of each segment of the audio.
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

This code configures the recognition settings (such as language and audio format), sends the audio file for processing, and prints out the transcribed text.

By following these steps, you should be able to successfully convert speech to text using Google’s Cloud Speech-to-Text API and Google Colab.

[Workshop] Quickstart to Speech Recognition: Google Speech-to-Text with Colab for Beginners

Introduction

Prerequisites

Colab with Full Code

Step 1: Create a Project in GCP

Step 2: Enable the Cloud Speech-to-Text API

Preparing Google Colab

Step 1: Authentication

Speech Recognition with Python

Congrats!

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Miluska Romero

No responses yet

More from Miluska Romero

Create a RAG ChatBot with GCP using the Vertex AI Platform

Introduction

Creating Custom Chatbots Using CSV Data with Python and OpenAI API

This step-by-step guide is designed to help you create a chatbot that utilizes your own CSV data for personalized interactions.

Building a Basic Math Tutor Assistant with OpenAI and Python [Beginners]

A basic example of the creation of a digital assistant with code

[Workshop] Google Cloud Vision API and Colab: A Beginner’s Guide to Image Analysis

Dive into the fascinating world of image analysis with our beginner-friendly tutorial on using Google Cloud Vision API within Google Colab.

Recommended from Medium

Mastering OpenAI Whisper: Fine-Tuning for Custom Speech Recognition on Colab

So recently I have been working on Fine Tuning OpenAI Whisper on my custom dataset. It has been a tremendous journey with a lot of down and…

Building a Speech-to-Text Analysis System with Python

Speaker Diarization and Identification

Basic Python voice bot using Realtime OpenAI API

As I found myself with some downtime, it felt natural to use this opportunity to refine my interviewing skills. To streamline the process…

Whisper Example: How to Use OpenAI’s Whisper for Speech Recognition

Whisper by OpenAI is a cutting-edge, open-source speech recognition model designed to handle multilingual transcription and translation…

BERT vs. VADER: Sentiment Wars Unveiled!

What happens when two giants of sentiment analysis go head-to-head? 🥊 One’s a suave transformer model, the other a lexicon-based master of…

Building a Telegram Bot using Langchain, OpenAI, and the Telegram API

Let's build an AI powered telegram bot.