RAG Architecture Diagram Template

Diagram a RAG retrieval pipeline — query embedding, vector database, retriever, reranker, and LLM, plus the offline ingestion flow.

Use this template

What you get

Online query flow: query embedding, retriever, reranker, LLM
Offline ingestion: documents, chunker, embedding model, index write
Vector database at the center where both flows meet

What this template is for

A RAG (Retrieval-Augmented Generation) architecture diagram shows how a language model answers a question using your own data instead of relying only on what it memorized during training. This template lays out the two flows every RAG system has: the online query flow — user question → query embedding → vector search → reranking → LLM generation — and the offline ingestion flow that prepares the knowledge base, where raw documents are chunked, embedded, and written into a vector database. Use it to design a new retrieval system, document an existing one for a design review, or explain to stakeholders exactly where your RAG pipeline retrieves, ranks, and generates.

When to use this template

Design the retrieval pipeline for a new document Q&A system before writing any code.
Document an existing RAG system's query path for a technical design review.
Explain the difference between the online query flow and the offline ingestion flow to a new engineer.
Decide where to add a reranker and show how it changes retrieval precision.
Trace latency by walking the full path from user question to generated answer.
Compare a simple RAG setup against an advanced one (hybrid search, query rewriting) by editing the diagram.

How to use it

1Start at the top with the user and the application or chat UI that receives the question.
2Add the orchestrator (LangChain, LlamaIndex) — it coordinates embedding, retrieval, and the LLM call.
3Draw the online query flow: query embedding → retriever → reranker.
4Add the vector database below the retriever and connect them — this is where semantic search happens.
5Connect the orchestrator to the LLM that generates the final answer from retrieved context.
6Add the offline ingestion flow: documents → chunker → embedding model → index write.
7Connect the ingestion index step into the vector database to show how the knowledge base is built.