What are vector databases?

They store embeddings (numeric vectors) for semantic similarity search.

The traditional problem

SQL: WHERE title LIKE '%cat%'
→ Only finds "cat", not "feline" or "pet"

Vector: embedding("cat")
→ Finds semantically SIMILAR concepts

How they work

Step	Process
1. Embed	Text → Vector [0.1, 0.3, ...]
2. Store	Save vector in DB
3. Query	Search similar vectors
4. Return	Results by cosine/distance

Popular databases

DB	Type	Ideal for
Pinecone	Cloud	Easy production
Weaviate	Self-host	Full control
ChromaDB	Local	Development
pgvector	PostgreSQL	If using Postgres
Qdrant	Self-host	High performance

ChromaDB example

import chromadb

# Create client
client = chromadb.Client()
collection = client.create_collection("docs")

# Add documents (auto-embedding)
collection.add(
    documents=["The cat sleeps", "The dog runs"],
    ids=["doc1", "doc2"]
)

# Search similar
results = collection.query(
    query_texts=["pet resting"],
    n_results=1
)
# → Finds "The cat sleeps"

Vector indexes

Algorithm	Speed	Precision
Flat	Slow	100%
IVF	Medium	~95%
HNSW	Fast	~95%

Practice

→ Vector Search

Vector Databases

What are vector databases?

The traditional problem

How they work

Popular databases

ChromaDB example

Vector indexes

Practice