RAG Architecture Diagram Template
Diagram a RAG retrieval pipeline — query embedding, vector database, retriever, reranker, and LLM, plus the offline ingestion flow.
Use this templateWhat you get
- Online query flow: query embedding, retriever, reranker, LLM
- Offline ingestion: documents, chunker, embedding model, index write
- Vector database at the center where both flows meet
What this template is for
A RAG (Retrieval-Augmented Generation) architecture diagram shows how a language model answers a question using your own data instead of relying only on what it memorized during training. This template lays out the two flows every RAG system has: the online query flow — user question → query embedding → vector search → reranking → LLM generation — and the offline ingestion flow that prepares the knowledge base, where raw documents are chunked, embedded, and written into a vector database. Use it to design a new retrieval system, document an existing one for a design review, or explain to stakeholders exactly where your RAG pipeline retrieves, ranks, and generates.
When to use this template
- Design the retrieval pipeline for a new document Q&A system before writing any code.
- Document an existing RAG system's query path for a technical design review.
- Explain the difference between the online query flow and the offline ingestion flow to a new engineer.
- Decide where to add a reranker and show how it changes retrieval precision.
- Trace latency by walking the full path from user question to generated answer.
- Compare a simple RAG setup against an advanced one (hybrid search, query rewriting) by editing the diagram.
How to use it
- 1Start at the top with the user and the application or chat UI that receives the question.
- 2Add the orchestrator (LangChain, LlamaIndex) — it coordinates embedding, retrieval, and the LLM call.
- 3Draw the online query flow: query embedding → retriever → reranker.
- 4Add the vector database below the retriever and connect them — this is where semantic search happens.
- 5Connect the orchestrator to the LLM that generates the final answer from retrieved context.
- 6Add the offline ingestion flow: documents → chunker → embedding model → index write.
- 7Connect the ingestion index step into the vector database to show how the knowledge base is built.
Quick example
Document Q&A over a company knowledge base
Start editing online
Open the template in CodePic, replace the sample nodes, and turn it into your own study board in a few minutes.
See examples: /templates/rag-architecture/examples


