Lo que vas a construir

Un asistente de voz que escucha lo que dices, entiende tu pregunta, y te responde hablando. Como Alexa o Siri, pero tuyo.

Hablas al microfono: "Que tiempo hace hoy?". El asistente convierte tu voz a texto, lo envia a Gemini para obtener una respuesta inteligente, y te lee la respuesta en voz alta. Todo en un loop continuo de conversacion.

Al terminar tendras un asistente de voz en Python que usa speech_recognition para escuchar, Gemini para pensar, y gTTS para hablar. Personalizable para cualquier caso de uso: domotica, accesibilidad, o productividad.

El prompt para empezar

Crea un asistente de voz en Python que:

Escuche del microfono con speech_recognition

Convierta voz a texto

Envie a Gemini para procesar

Convierta respuesta a voz con gTTS

Loop continuo de escucha

Lo que la IA creará

import speech_recognition as sr
from gtts import gTTS
import google.generativeai as genai
import os
import tempfile
from playsound import playsound

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")

# Inicializar reconocedor
recognizer = sr.Recognizer()
mic = sr.Microphone()

def listen() -> str:
    """Escuchar del micrófono"""
    with mic as source:
        print("🎤 Escuchando...")
        recognizer.adjust_for_ambient_noise(source, duration=0.5)
        audio = recognizer.listen(source, timeout=5)

    try:
        text = recognizer.recognize_google(audio, language="es-ES")
        print(f"Escuché: {text}")
        return text
    except sr.UnknownValueError:
        return ""
    except sr.RequestError as e:
        print(f"Error: {e}")
        return ""

def think(text: str) -> str:
    """Procesar con Gemini"""
    response = model.generate_content(
        f"Eres un asistente amigable. Responde brevemente: {text}"
    )
    return response.text

def speak(text: str):
    """Convertir texto a voz"""
    tts = gTTS(text=text, lang="es")
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as f:
        tts.save(f.name)
        playsound(f.name)
        os.unlink(f.name)

def run_assistant():
    """Loop principal del asistente"""
    print("🤖 Asistente iniciado. Di 'salir' para terminar.")

    while True:
        text = listen()

        if not text:
            continue

        if "salir" in text.lower():
            speak("¡Hasta luego!")
            break

        response = think(text)
        print(f"🤖: {response}")
        speak(response)

# Ejecutar
run_assistant()

Instalación

pip install SpeechRecognition gTTS playsound pyaudio google-generativeai

Siguiente nivel

→ App Multimodal

Asistente de Voz

📋 Prerequisitos sugeridos

Lo que vas a construir

El prompt para empezar

Lo que la IA creará

Instalación

Siguiente nivel