๐ŸŽ™๏ธ

Voice Assistant

๐Ÿ‘จโ€๐Ÿณ๐Ÿ‘‘ Master Chefโฑ๏ธ 45 minutes

๐Ÿ“‹ Suggested prerequisites

  • โ€ขPython
  • โ€ขMicrophone

What you'll build

A voice assistant that listens to what you say, understands your question, and responds by speaking. Like Alexa or Siri, but yours.

You speak into the microphone: "What's the weather today?". The assistant converts your voice to text, sends it to Gemini to get an intelligent response, and reads the answer out loud. All in a continuous conversation loop.

When finished, you'll have a voice assistant in Python that uses speech_recognition to listen, Gemini to think, and gTTS to speak. Customizable for any use case: home automation, accessibility, or productivity.


The prompt to start

Create a voice assistant in Python that:

  1. Listens from microphone with speech_recognition
  2. Converts voice to text
  3. Sends to Gemini for processing
  4. Converts response to speech with gTTS
  5. Continuous listening loop

What the AI will create

import speech_recognition as sr
from gtts import gTTS
import google.generativeai as genai
import os
import tempfile
from playsound import playsound

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")

# Initialize recognizer
recognizer = sr.Recognizer()
mic = sr.Microphone()

def listen() -> str:
    """Listen from microphone"""
    with mic as source:
        print("๐ŸŽค Listening...")
        recognizer.adjust_for_ambient_noise(source, duration=0.5)
        audio = recognizer.listen(source, timeout=5)

    try:
        text = recognizer.recognize_google(audio, language="en-US")
        print(f"I heard: {text}")
        return text
    except sr.UnknownValueError:
        return ""
    except sr.RequestError as e:
        print(f"Error: {e}")
        return ""

def think(text: str) -> str:
    """Process with Gemini"""
    response = model.generate_content(
        f"You are a friendly assistant. Respond briefly: {text}"
    )
    return response.text

def speak(text: str):
    """Convert text to speech"""
    tts = gTTS(text=text, lang="en")
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as f:
        tts.save(f.name)
        playsound(f.name)
        os.unlink(f.name)

def run_assistant():
    """Main assistant loop"""
    print("๐Ÿค– Assistant started. Say 'exit' to quit.")

    while True:
        text = listen()

        if not text:
            continue

        if "exit" in text.lower():
            speak("Goodbye!")
            break

        response = think(text)
        print(f"๐Ÿค–: {response}")
        speak(response)

# Run
run_assistant()

Installation

pip install SpeechRecognition gTTS playsound pyaudio google-generativeai

Next level

โ†’ Multimodal App