๐Ÿ—„๏ธ

Vector Databases

๐Ÿ‘จโ€๐Ÿณ๐Ÿ‘‘ Master Chef

What are vector databases?

They store embeddings (numeric vectors) for semantic similarity search.


The traditional problem

SQL: WHERE title LIKE '%cat%'
โ†’ Only finds "cat", not "feline" or "pet"

Vector: embedding("cat")
โ†’ Finds semantically SIMILAR concepts

How they work

StepProcess
1. EmbedText โ†’ Vector [0.1, 0.3, ...]
2. StoreSave vector in DB
3. QuerySearch similar vectors
4. ReturnResults by cosine/distance

Popular databases

DBTypeIdeal for
PineconeCloudEasy production
WeaviateSelf-hostFull control
ChromaDBLocalDevelopment
pgvectorPostgreSQLIf using Postgres
QdrantSelf-hostHigh performance

ChromaDB example

import chromadb

# Create client
client = chromadb.Client()
collection = client.create_collection("docs")

# Add documents (auto-embedding)
collection.add(
    documents=["The cat sleeps", "The dog runs"],
    ids=["doc1", "doc2"]
)

# Search similar
results = collection.query(
    query_texts=["pet resting"],
    n_results=1
)
# โ†’ Finds "The cat sleeps"

Vector indexes

AlgorithmSpeedPrecision
FlatSlow100%
IVFMedium~95%
HNSWFast~95%

Practice

โ†’ Vector Search