Retrieval-augmented generation, commonly called RAG, is an architecture pattern used to make AI language models more accurate and grounded in specific, up-to-date information. Rather than relying solely on what a model learned during training, RAG systems first retrieve relevant documents, records, or data from a knowledge base, then pass that retrieved content to the model as context alongside the user's query.
The practical effect is that the model can answer questions about information it was never trained on, including internal company documents, proprietary data, or events that occurred after its training cutoff. Because the answer is generated with reference to specific retrieved text, the model can also cite its sources, making outputs easier to verify and audit.
RAG has become the dominant architecture pattern for enterprise AI assistants, internal knowledge tools, customer service bots, and any application where factual accuracy and traceability matter more than creative generation. It substantially reduces hallucination rates compared to using a base language model without retrieval.
Common components of a RAG system include a document ingestion pipeline that processes and chunks source materials, a vector database that stores embeddings for semantic search, a retrieval layer that selects the most relevant chunks for a given query, and a language model that synthesizes those chunks into a coherent response. The quality of the retrieval step largely determines the quality of the final output.