“Self-correction” is a set of techniques used to force LLMs to review and correct their own responses. There is a body of research that shows self-correction works in a range of tasks.
But a recent study conducted by Google DeepMind in collaboration with the University of Illinois at Urbana-Champaign reveals that LLMs often falter when self-correcting their responses without external feedback.
The study suggests that self-correction can sometimes impair the performance of these models, challenging the prevailing understanding of this popular technique.
Key findings:
Self-correction works well when the model has access to an external source of knowledge, such as human feedback or a code executor
But in many applications, the model will not have access to external knowledge during self-correction
The researchers define “intrinsic self-correction” as a model’s ability to use its internal knowledge to reflect its own answers
The study shows that intrinsic self-correction does not work on reasoning tasks
In many cases, self-correction causes the model to go from a correct to an incorrect answer, effectively hampering the model’s performance
Previous studies used benchmark labels as part of the evaluation of models in self-correction, which provides misleading results.
Read the full article on TechTalks.
For more on AI research:
This is so unexpected. "Please check your work and make sure this is accurate" seems to work well if I'm researching something and get some unexpected data mixed in there. What's different about me doing this vs the LLM doing it all by itself?
Feels like there's an awful lot we don't quite understand yet.