Text as vectors
Embeddings convert text into numbers (vectors). This allows searching by meaning, not just exact words.
What is an embedding?
"Hello world" โ [0.12, -0.34, 0.56, ..., 0.78]
(1536 dimensions)
Similar texts have similar vectors.
Use cases
| Use | Example |
|---|---|
| Semantic search | "italian food" finds "pizza" |
| Recommendations | Similar products |
| RAG | Find relevant context for LLMs |
| Classification | Group texts by topic |
Embedding models
| Model | Dimensions | Provider |
|---|---|---|
text-embedding-3-small | 1536 | OpenAI |
text-embedding-3-large | 3072 | OpenAI |
voyage-3 | 1024 | Voyage AI |
all-MiniLM-L6-v2 | 384 | HuggingFace (local) |
Example with OpenAI
import OpenAI from 'openai'
const openai = new OpenAI()
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here'
})
const vector = response.data[0].embedding
// [0.12, -0.34, 0.56, ...]
Cosine similarity
Measures how similar two vectors are (0 to 1):
function cosineSimilarity(a: number[], b: number[]): number {
let dot = 0, normA = 0, normB = 0
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i]
normA += a[i] * a[i]
normB += b[i] * b[i]
}
return dot / (Math.sqrt(normA) * Math.sqrt(normB))
}
Vector databases
To store and search embeddings efficiently:
| DB | Type | Ideal for |
|---|---|---|
| Pinecone | Cloud | Production |
| Supabase pgvector | Cloud | Full-stack |
| ChromaDB | Local | Development |
๐ฏ Real Case: Anomaly Detection
Embeddings are ideal for detecting unusual behavior in fintech.
Example: Suspicious Transactions
// Vectorize the user's "normal" behavior
const normalProfile = await embedTransaction({
averageAmount: 150,
usualHours: '9am-6pm',
locations: ['New York', 'Boston'],
frequentMerchants: ['Amazon', 'Uber', 'Starbucks']
})
// New transaction
const transaction = await embedTransaction({
amount: 5000,
time: '3:47am',
location: 'Lagos, Nigeria',
merchant: 'CryptoExchange_xyz'
})
// Calculate similarity
const similarity = cosineSimilarity(normalProfile, transaction)
if (similarity < 0.3) {
await flagForReview(transaction) // ๐จ Manual review
await notifyUser('We detected unusual activity')
}
Fintech Applications
| Case | Input | Output |
|---|---|---|
| Fraud detection | History + new transaction | Risk score 0-100 |
| Credit scoring | Profile data | Similarity to "good" profiles |
| Automated KYC | Documents + behavior | Identity match |
| Segmentation | User history | Behavior cluster |
๐ก Embeddings capture patterns that traditional rules cannot detect.
Practice
โ Vector Search
Useful links
- ๐ OpenAI Embeddings
- ๐ Pinecone Docs