Retrieval Augmented Generation

RAG = Search relevant info + Give it to LLM = Precise answers with sources

Why RAG?

Without RAG	With RAG
LLM only knows what it learned	LLM accesses YOUR documents
May invent ("hallucinate")	Cites real sources
Static knowledge	Always updated info

RAG Architecture

                 ┌─────────────────┐
                 │   Your question │
                 └────────┬────────┘
                          ↓
┌─────────────────────────────────────────────┐
│              1. RETRIEVAL                    │
│  Query → Vector DB → Top K documents         │
└────────┬────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────────┐
│              2. AUGMENTATION                 │
│  Prompt + Document context                   │
└────────┬────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────────┐
│              3. GENERATION                   │
│  LLM generates response with context         │
└─────────────────────────────────────────────┘

Indexing flow

# 1. Load documents
docs = load_pdfs("./docs/")

# 2. Chunking (split into parts)
chunks = split_text(docs, chunk_size=500)

# 3. Embeddings
embeddings = model.embed(chunks)

# 4. Save to vector DB
vector_db.add(embeddings, chunks)

Chunking strategies

Strategy	When to use
Fixed size	Simple documents
Sentence	Natural text
Semantic	High precision
Recursive	Long documents

RAG prompt template

Answer using ONLY information from the context.
If not in context, say "I don't have that information".

CONTEXT:
{relevant_chunks}

QUESTION: {user_question}

Quality metrics

Metric	What it measures
Relevance	Correct chunks?
Faithfulness	Response based on context?
Answer quality	Useful response?

🏦 Fintech Case: Compliance Documents

Regulations (PCI DSS, SOC 2, AML) generate hundreds of PDFs. RAG lets employees query in natural language:

# Index compliance documents
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Qdrant

# Load AML policy, KYC manual, internal procedures
docs = []
for pdf in ["aml_policy.pdf", "kyc_manual.pdf", "fraud_procedures.pdf"]:
    docs.extend(PyPDFLoader(f"compliance/{pdf}").load())

# Index with source metadata
vectorstore = Qdrant.from_documents(
    docs,
    embeddings,
    metadata=lambda doc: {"source": doc.metadata["source"], "page": doc.metadata["page"]}
)

# Employee query
query = "What are the maximum limits for transactions without additional KYC verification?"
results = vectorstore.similarity_search(query, k=3)

Why it's valuable in Fintech

Without RAG	With RAG
Manually search 50 PDFs	Natural language query
"I don't know where that policy is"	Answer + exact source
Employees make up answers	Based on real documents
Auditor asks for evidence → panic	Direct link to paragraph

Security considerations

# ALWAYS include source for audit trail
response = {
    "answer": "...",
    "sources": [
        {"doc": "aml_policy.pdf", "page": 12, "section": "4.2"},
        {"doc": "kyc_manual.pdf", "page": 5, "section": "2.1"}
    ],
    "confidence": 0.92
}

# If no confident match, DON'T make things up
if confidence < 0.7:
    response["answer"] = "I didn't find specific information. Please consult with Compliance."

💡 RAG for compliance reduces search time from hours to seconds, and always cites the source.

Practice

→ RAG with PDF Documents