Retrieval-Augmented Generation combines a vector store (or hybrid BM25 + vector) with a language model. Relevant documents are retrieved at query time and injected into the model prompt, letting the LLM ground its output in specific source material instead of relying purely on pretraining.
RAG is the standard architecture for enterprise knowledge-base bots, support automation, and internal copilots in 2026. Key failure modes: chunking strategy, embedding quality, and retrieval recall.