๐Ÿงฎ

Embeddings

๐Ÿง‘โ€๐Ÿณ Cook

Text as vectors

Embeddings convert text into numbers (vectors). This allows searching by meaning, not just exact words.


What is an embedding?

"Hello world" โ†’ [0.12, -0.34, 0.56, ..., 0.78]
                     (1536 dimensions)

Similar texts have similar vectors.


Use cases

UseExample
Semantic search"italian food" finds "pizza"
RecommendationsSimilar products
RAGFind relevant context for LLMs
ClassificationGroup texts by topic

Embedding models

ModelDimensionsProvider
text-embedding-3-small1536OpenAI
text-embedding-3-large3072OpenAI
voyage-31024Voyage AI
all-MiniLM-L6-v2384HuggingFace (local)

Example with OpenAI

import OpenAI from 'openai'

const openai = new OpenAI()

const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Your text here'
})

const vector = response.data[0].embedding
// [0.12, -0.34, 0.56, ...]

Cosine similarity

Measures how similar two vectors are (0 to 1):

function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, normA = 0, normB = 0
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i]
    normA += a[i] * a[i]
    normB += b[i] * b[i]
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB))
}

Vector databases

To store and search embeddings efficiently:

DBTypeIdeal for
PineconeCloudProduction
Supabase pgvectorCloudFull-stack
ChromaDBLocalDevelopment

๐ŸŽฏ Real Case: Anomaly Detection

Embeddings are ideal for detecting unusual behavior in fintech.

Example: Suspicious Transactions

// Vectorize the user's "normal" behavior
const normalProfile = await embedTransaction({
  averageAmount: 150,
  usualHours: '9am-6pm',
  locations: ['New York', 'Boston'],
  frequentMerchants: ['Amazon', 'Uber', 'Starbucks']
})

// New transaction
const transaction = await embedTransaction({
  amount: 5000,
  time: '3:47am',
  location: 'Lagos, Nigeria',
  merchant: 'CryptoExchange_xyz'
})

// Calculate similarity
const similarity = cosineSimilarity(normalProfile, transaction)

if (similarity < 0.3) {
  await flagForReview(transaction) // ๐Ÿšจ Manual review
  await notifyUser('We detected unusual activity')
}

Fintech Applications

CaseInputOutput
Fraud detectionHistory + new transactionRisk score 0-100
Credit scoringProfile dataSimilarity to "good" profiles
Automated KYCDocuments + behaviorIdentity match
SegmentationUser historyBehavior cluster

๐Ÿ’ก Embeddings capture patterns that traditional rules cannot detect.


Practice

โ†’ Vector Search


Useful links