Retrieval augmented generation (RAG) is one of the important techniques to augment LLMs with external data. However, simple RAG cannot address all tasks that require data-augmented LLMs, or as a new study by Microsoft Research says, “Data augmented LLM applications is not a one-size-fits-all solution.”
The study provides a framework that classifies RAG applications into four categories based on the type of external data required and the cognitive processing involved in generating accurate and relevant responses:
Explicit fact queries
This is the simplest type of query, focusing on retrieving factual information directly stated in the provided data.
The most common approach for addressing these queries is using basic RAG, where the LLM retrieves relevant information from a knowledge base and uses it to generate a response.
However, even basic RAG can be improved at the indexing, retrieval, and answer generation phase.
Implicit fact queries
These queries require the LLM to perform some level of reasoning or deduction on the retrieved data to answer the question. Sometimes, the queries might require “multi-hop question answering,” where the model breaks the original query into multiple steps and retrieves contextual information for each step separately.
An example (that I often use to check different RAG-based systems) is this: “When was the last time that Jerry Rice and Steve Young played on the same NFL team?” This query requires the model to retrieve the career games of the two NFL players separately, then perform some type of reasoning to see when they played together for the last time (it was against the Arizona Cardinals in 1999, Young’s last game).
These queries require advanced RAG techniques such as IRCoT and RAT, which use chain-of-thought prompting to guide the retrieval process based on previously recalled information. Other relevant techniques are graph-based RAG systems.
Interpretable rationale queries
These queries require LLMs to understand factual data also apply domain-specific rules. These rationales might not be present in the LLM’s pre-training data but they are also not hard to find in the knowledge corpus.
For example, a customer service chatbot might need to integrate documented guidelines on handling returns or refunds with the context provided by a customer’s complaint.
These queries require special prompt tuning and optimization techniques that can automatically adapt the prompt to the task and domain. Some relevant techniques are OPRO and Automate-CoT.
Hidden rationale queries
These queries involve domain-specific reasoning methods that are not explicitly stated in the data. The LLM must uncover these hidden rationales and apply them to answer the question.
For example, an AI legal assistant that is helping with a case needs to retrieve documents on other cases that are relevant (but not necessarily semantically similar), extract patterns from them, project them to the current case, and make suggestions.
This is the most challenging type of query and stretches RAG systems to their limits. It often requires domain-specific fine-tuning and the use of in-context learning techniques.
Read more about the framework on VentureBeat
Read the full paper on arXiv
Well written. Sharing a repo I found useful for anyone building RAG apps https://github.com/NirDiamant/RAG_Techniques
Provides an overview of existing RAG methods along with ready-to-run notebooks.