What you'll build
A voice assistant that listens to what you say, understands your question, and responds by speaking. Like Alexa or Siri, but yours.
You speak into the microphone: "What's the weather today?". The assistant converts your voice to text, sends it to Gemini to get an intelligent response, and reads the answer out loud. All in a continuous conversation loop.
When finished, you'll have a voice assistant in Python that uses speech_recognition to listen, Gemini to think, and gTTS to speak. Customizable for any use case: home automation, accessibility, or productivity.
The prompt to start
Create a voice assistant in Python that:
- Listens from microphone with speech_recognition
- Converts voice to text
- Sends to Gemini for processing
- Converts response to speech with gTTS
- Continuous listening loop
What the AI will create
import speech_recognition as sr
from gtts import gTTS
import google.generativeai as genai
import os
import tempfile
from playsound import playsound
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
# Initialize recognizer
recognizer = sr.Recognizer()
mic = sr.Microphone()
def listen() -> str:
"""Listen from microphone"""
with mic as source:
print("๐ค Listening...")
recognizer.adjust_for_ambient_noise(source, duration=0.5)
audio = recognizer.listen(source, timeout=5)
try:
text = recognizer.recognize_google(audio, language="en-US")
print(f"I heard: {text}")
return text
except sr.UnknownValueError:
return ""
except sr.RequestError as e:
print(f"Error: {e}")
return ""
def think(text: str) -> str:
"""Process with Gemini"""
response = model.generate_content(
f"You are a friendly assistant. Respond briefly: {text}"
)
return response.text
def speak(text: str):
"""Convert text to speech"""
tts = gTTS(text=text, lang="en")
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as f:
tts.save(f.name)
playsound(f.name)
os.unlink(f.name)
def run_assistant():
"""Main assistant loop"""
print("๐ค Assistant started. Say 'exit' to quit.")
while True:
text = listen()
if not text:
continue
if "exit" in text.lower():
speak("Goodbye!")
break
response = think(text)
print(f"๐ค: {response}")
speak(response)
# Run
run_assistant()
Installation
pip install SpeechRecognition gTTS playsound pyaudio google-generativeai
Next level
โ Multimodal App