What you'll build
A semantic search engine that understands the MEANING of what you're looking for, not just keywords.
Imagine searching for "domestic animal resting" and finding documents about "the cat sleeps on the sofa". That's vector search: it converts text into mathematical vectors and finds conceptual similarities.
When finished, you'll have a Python system with ChromaDB that indexes documents, creates embeddings with Gemini, and allows searching by meaning. Perfect for searching FAQs, articles, or any text collection.
The prompt to start
Create a vector search system in Python with:
- ChromaDB as vector database
- Gemini embeddings (free)
- Function to add documents
- Function to search similar
- Show similarity score
What the AI will create
import chromadb
from chromadb.utils import embedding_functions
import os
# Configure embeddings with Gemini
gemini_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(
api_key=os.environ["GEMINI_API_KEY"],
model_name="models/embedding-001"
)
# Create client and collection
client = chromadb.PersistentClient(path="./vector_db")
collection = client.get_or_create_collection(
name="documents",
embedding_function=gemini_ef,
metadata={"hnsw:space": "cosine"}
)
def add_documents(docs: list[dict]):
"""Add documents with metadata"""
collection.add(
documents=[d["text"] for d in docs],
metadatas=[{"source": d.get("source", "unknown")} for d in docs],
ids=[f"doc_{i}" for i in range(len(docs))]
)
def search(query: str, n_results: int = 5):
"""Search similar documents"""
results = collection.query(
query_texts=[query],
n_results=n_results,
include=["documents", "distances", "metadatas"]
)
for i, (doc, dist, meta) in enumerate(zip(
results["documents"][0],
results["distances"][0],
results["metadatas"][0]
)):
similarity = 1 - dist # Convert distance to similarity
print(f"{i+1}. [{similarity:.2%}] {doc[:100]}...")
print(f" Source: {meta['source']}")
# Example
docs = [
{"text": "The cat sleeps on the sofa", "source": "pets.txt"},
{"text": "Python is a programming language", "source": "tech.txt"},
{"text": "My dog runs in the park", "source": "pets.txt"},
{"text": "JavaScript is used for web", "source": "tech.txt"},
]
add_documents(docs)
search("domestic animal resting")
# โ Finds "The cat sleeps on the sofa" with high similarity
SQL vs Vector comparison
| Traditional SQL | Vector search |
|---|---|
LIKE '%cat%' | Searches literal "cat" |
| Only exact matches | Understands synonyms |
| No context | "pet" โ "cat", "dog" |