[Workshop] Quickstart to Speech Recognition: Google Speech-to-Text with Colab for Beginners
Introduction
Google’s Speech-to-Text API makes it easy to turn spoken words into written text, useful for things like writing out meetings, making chatbots, adding captions to videos, analyzing feelings in spoken words, and making learning resources more accessible. This guide will show you how to use this cool tool to start turning speech into text quickly and easily.
Prerequisites
Ensure you have the following ready:
- A Google account
- Access to Google Colab
- A GCP account with available credit
Colab with Full Code
https://colab.research.google.com/drive/1_2VmDQZ1BMv4RkS5xrSYseNJMLY3hFyG?usp=sharing
Step 1: Create a Project in GCP
- Visit the GCP Console Project Selector page: https://console.cloud.google.com/projectselector2/
- Sign in with your Google account if prompted.
- Click on the “New Project” button.
- Enter a project name and click “Create” to initialize your new project.

Step 2: Enable the Cloud Speech-to-Text API
- In your GCP project dashboard, navigate to the “APIs & Services” dashboard.
- Click “Enable APIs and Services”.
- Search for “Speech-to-Text API” and select it.
- Click “Enable” to activate the API for your project.

Preparing Google Colab
Step 1: Authentication
- Open Google Colab: Google Colab
- Install the Google Cloud SDK by running the following command in a cell:
!pip install google-cloud
3. Authenticate your session with the following code:
from google.colab import auth
# Replace 'your-project-id' with your actual GCP project ID.
# the first image shows an example of project id
auth.authenticate_user(project_id='your-project-id')
Speech Recognition with Python
Import the necessary library:
from google.cloud import speech
- Set up your GCP credentials (this is usually handled automatically in Colab after authentication).

- Prepare your audio file (
audiofile.wav
) by uploading it to Colab. - Use the following script to transcribe the audio file:
# Initialize the speech client to interact with the Google Cloud Speech-to-Text API.
client = speech.SpeechClient()
# Specify the name of the audio file to be transcribed.
file_name = "audiofile.wav"
# Open the audio file in read-binary mode and read its content.
with open(file_name, "rb") as audio_file:
audio_content = audio_file.read()
# Prepare the audio content for recognition by wrapping it in a RecognitionAudio object.
audio = speech.RecognitionAudio(content=audio_content)
# Configure the recognition request settings such as audio encoding, language, model type,
# number of audio channels, and whether to enable word time offsets in the response.
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, # Specify the audio encoding type.
language_code="es-US", # Set the language of the audio content.
model="default", # Use the default model for recognition.
audio_channel_count=1, # Indicate that the audio is mono (one channel).
enable_word_time_offsets=True, # Request time offsets for each word in the transcript.
)
# Initiate an asynchronous request for speech recognition and wait for it to complete.
operation = client.long_running_recognize(config=config, audio=audio)
print("Waiting for operation to complete...")
response = operation.result(timeout=90) # Wait up to 90 seconds for the operation to complete.
# Iterate through the results and print the transcript of each segment of the audio.
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
This code configures the recognition settings (such as language and audio format), sends the audio file for processing, and prints out the transcribed text.
By following these steps, you should be able to successfully convert speech to text using Google’s Cloud Speech-to-Text API and Google Colab.