Retrieval Augmented Generation

Retrieval augmented generation does not modify the underlying model in any way. Instead, it is an approach to directly influence its responses.

In practice, and in a significant simplification, RAG is about injecting data into Large Language Model prompt.

For example, let’s say the user asks the LLM:

To augment the response, you need to intercept the user’s question and tell LLM to respond in a way more or less like:

You are a <inser persona here>. Tell the user that the latest articles on our site are <insert latest articles metadata here>

That is greatly simplified, but generally, that is how it works. Along the way, embeddings and vector databases are involved.

LLMOps Handbook (work in progress)