What are vector databases?
They store embeddings (numeric vectors) for semantic similarity search.
The traditional problem
SQL: WHERE title LIKE '%cat%'
โ Only finds "cat", not "feline" or "pet"
Vector: embedding("cat")
โ Finds semantically SIMILAR concepts
How they work
| Step | Process |
|---|---|
| 1. Embed | Text โ Vector [0.1, 0.3, ...] |
| 2. Store | Save vector in DB |
| 3. Query | Search similar vectors |
| 4. Return | Results by cosine/distance |
Popular databases
| DB | Type | Ideal for |
|---|---|---|
| Pinecone | Cloud | Easy production |
| Weaviate | Self-host | Full control |
| ChromaDB | Local | Development |
| pgvector | PostgreSQL | If using Postgres |
| Qdrant | Self-host | High performance |
ChromaDB example
import chromadb
# Create client
client = chromadb.Client()
collection = client.create_collection("docs")
# Add documents (auto-embedding)
collection.add(
documents=["The cat sleeps", "The dog runs"],
ids=["doc1", "doc2"]
)
# Search similar
results = collection.query(
query_texts=["pet resting"],
n_results=1
)
# โ Finds "The cat sleeps"
Vector indexes
| Algorithm | Speed | Precision |
|---|---|---|
| Flat | Slow | 100% |
| IVF | Medium | ~95% |
| HNSW | Fast | ~95% |
Practice
โ Vector Search