Even before they speak their first words, human babies develop mental models about objects and people. This is one of the key capabilities that allows us humans to learn to live socially and cooperate (or compete) with each other.
But for artificial intelligence, even the most basic behavioral reasoning tasks remain a challenge.
To help fill this gap, scientists at IBM, the Massachusetts Institute of Technology, and Harvard University have developed a series of tests, codenamed AGENT, that will help evaluate the capacity of AI models to reason like children, by observing and making sense of the world. Tested on two baseline models, AGENT highlights the limits of current AI systems.
Presented at this year’s International Conference on Machine Learning (ICML), AGENT provides an important benchmark for measuring the reasoning capabilities of AI systems.
For more on AI research papers: