Retrieval Augmented Generation
RAG = Search relevant info + Give it to LLM = Precise answers with sources
Why RAG?
| Without RAG | With RAG |
|---|
| LLM only knows what it learned | LLM accesses YOUR documents |
| May invent ("hallucinate") | Cites real sources |
| Static knowledge | Always updated info |
RAG Architecture
โโโโโโโโโโโโโโโโโโโ
โ Your question โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. RETRIEVAL โ
โ Query โ Vector DB โ Top K documents โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 2. AUGMENTATION โ
โ Prompt + Document context โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 3. GENERATION โ
โ LLM generates response with context โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Indexing flow
# 1. Load documents
docs = load_pdfs("./docs/")
# 2. Chunking (split into parts)
chunks = split_text(docs, chunk_size=500)
# 3. Embeddings
embeddings = model.embed(chunks)
# 4. Save to vector DB
vector_db.add(embeddings, chunks)
Chunking strategies
| Strategy | When to use |
|---|
| Fixed size | Simple documents |
| Sentence | Natural text |
| Semantic | High precision |
| Recursive | Long documents |
RAG prompt template
Answer using ONLY information from the context.
If not in context, say "I don't have that information".
CONTEXT:
{relevant_chunks}
QUESTION: {user_question}
Quality metrics
| Metric | What it measures |
|---|
| Relevance | Correct chunks? |
| Faithfulness | Response based on context? |
| Answer quality | Useful response? |
๐ฆ Fintech Case: Compliance Documents
Regulations (PCI DSS, SOC 2, AML) generate hundreds of PDFs. RAG lets employees query in natural language:
# Index compliance documents
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Qdrant
# Load AML policy, KYC manual, internal procedures
docs = []
for pdf in ["aml_policy.pdf", "kyc_manual.pdf", "fraud_procedures.pdf"]:
docs.extend(PyPDFLoader(f"compliance/{pdf}").load())
# Index with source metadata
vectorstore = Qdrant.from_documents(
docs,
embeddings,
metadata=lambda doc: {"source": doc.metadata["source"], "page": doc.metadata["page"]}
)
# Employee query
query = "What are the maximum limits for transactions without additional KYC verification?"
results = vectorstore.similarity_search(query, k=3)
Why it's valuable in Fintech
| Without RAG | With RAG |
|---|
| Manually search 50 PDFs | Natural language query |
| "I don't know where that policy is" | Answer + exact source |
| Employees make up answers | Based on real documents |
| Auditor asks for evidence โ panic | Direct link to paragraph |
Security considerations
# ALWAYS include source for audit trail
response = {
"answer": "...",
"sources": [
{"doc": "aml_policy.pdf", "page": 12, "section": "4.2"},
{"doc": "kyc_manual.pdf", "page": 5, "section": "2.1"}
],
"confidence": 0.92
}
# If no confident match, DON'T make things up
if confidence < 0.7:
response["answer"] = "I didn't find specific information. Please consult with Compliance."
๐ก RAG for compliance reduces search time from hours to seconds, and always cites the source.
Practice
โ RAG with PDF Documents