Is there a future for RAG (Retrieval-Augmented Generation) at all?
If so, where does it belong?
In simple terms, the RAG process currently works in two stages:
Stage 1: Data is identified, collected, chunked, and stored in a VectorDB.
Stage 2: At prompt time (or when an AI agent performs a query), RAG searches the VectorDB for vectors similar to the ones generated from the prompt or query context. Once found, the retrieved chunks (including enterprise data) are sent to the LLM (Large Language Model).
This approach seems straightforward—but let’s dig deeper into Stage 1.
Here, the RAG designer determines which data is retrieved, chunked, and stored in the VectorDB. This inherently means a human is deciding what data will be available to the prompt or AI agent. Essentially, the data is being filtered, leading to a reduced dataset. By passing only this limited data to the LLM, we risk losing critical insights that could have been generated if the omitted data had been included.
LLMs are designed to process vast amounts of information. By pre-filtering the data, we limit their potential, resulting in suboptimal outcomes.
Additionally, RAG struggles with dynamic or frequently updated data. Consider an online application like Salesforce. Using RAG for such scenarios might result in outdated information (e.g., revenue or opportunities from 24 hours ago) unless the system is refreshed frequently—yet even then, there’s a risk of missing critical real-time updates.
While RAG performs well with static data sources, such as file systems or document management systems, it wasn’t designed to handle the high-frequency updates required for dynamic environments.
This brings us back to the original question:
Is there a future for RAG?
Yes, RAG is an excellent starting point for generating insights and knowledge from corporate data.
However, for environments that demand real-time data retrieval or when comprehensive knowledge is needed from uncensored, unfiltered sources, we need a new solution.
What’s the alternative?
Stay tuned.