๐Ÿ”

Vector Search

๐Ÿ‘จโ€๐Ÿณ๐Ÿ‘‘ Master Chefโฑ๏ธ 40 minutes

๐Ÿ“‹ Suggested prerequisites

  • โ€ขBasic Python
  • โ€ขChromaDB

What you'll build

A semantic search engine that understands the MEANING of what you're looking for, not just keywords.

Imagine searching for "domestic animal resting" and finding documents about "the cat sleeps on the sofa". That's vector search: it converts text into mathematical vectors and finds conceptual similarities.

When finished, you'll have a Python system with ChromaDB that indexes documents, creates embeddings with Gemini, and allows searching by meaning. Perfect for searching FAQs, articles, or any text collection.


The prompt to start

Create a vector search system in Python with:

  1. ChromaDB as vector database
  2. Gemini embeddings (free)
  3. Function to add documents
  4. Function to search similar
  5. Show similarity score

What the AI will create

import chromadb
from chromadb.utils import embedding_functions
import os

# Configure embeddings with Gemini
gemini_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(
    api_key=os.environ["GEMINI_API_KEY"],
    model_name="models/embedding-001"
)

# Create client and collection
client = chromadb.PersistentClient(path="./vector_db")
collection = client.get_or_create_collection(
    name="documents",
    embedding_function=gemini_ef,
    metadata={"hnsw:space": "cosine"}
)

def add_documents(docs: list[dict]):
    """Add documents with metadata"""
    collection.add(
        documents=[d["text"] for d in docs],
        metadatas=[{"source": d.get("source", "unknown")} for d in docs],
        ids=[f"doc_{i}" for i in range(len(docs))]
    )

def search(query: str, n_results: int = 5):
    """Search similar documents"""
    results = collection.query(
        query_texts=[query],
        n_results=n_results,
        include=["documents", "distances", "metadatas"]
    )

    for i, (doc, dist, meta) in enumerate(zip(
        results["documents"][0],
        results["distances"][0],
        results["metadatas"][0]
    )):
        similarity = 1 - dist  # Convert distance to similarity
        print(f"{i+1}. [{similarity:.2%}] {doc[:100]}...")
        print(f"   Source: {meta['source']}")

# Example
docs = [
    {"text": "The cat sleeps on the sofa", "source": "pets.txt"},
    {"text": "Python is a programming language", "source": "tech.txt"},
    {"text": "My dog runs in the park", "source": "pets.txt"},
    {"text": "JavaScript is used for web", "source": "tech.txt"},
]

add_documents(docs)
search("domestic animal resting")
# โ†’ Finds "The cat sleeps on the sofa" with high similarity

SQL vs Vector comparison

Traditional SQLVector search
LIKE '%cat%'Searches literal "cat"
Only exact matchesUnderstands synonyms
No context"pet" โ†’ "cat", "dog"

Next level

โ†’ Custom MCP Server