How LLMs and VLMs are revolutionizing robotics
We are still scratching the surface of what is possible in the physical world with LLMs and VLMs
The recent months have seen several impressive projects and demos using LLMs and VLMs to control robots. This new wave of innovations is using the vast knowledge captured by these foundation models to improve the ability of robotics systems to understand natural language commands.
LLMs and VLMs are also giving complex planning and reasoning capabilities to robotics systems.
In my latest article for VentureBeat, I spoke to AI and robotics researcher Chris Paxton about the sizeable impact that LLMs and VLMs have had on robotics.
We discussed SayCan, the first project that showcased the promise of LLMs in robotics. We explored two major trends in the use of LLMs/VLMs in robotics:
1- The use of pre-trained models as reasoning and orchestration modules. The model analyzes perception data and natural language commands, reasons about the steps needed to achieve the desired goals, and maps them to the actions and affordances of the robotics system (e.g., Figure, OK-Robot)
2- The creation of specialized foundation models for robotics. In these projects, pre-trained language and vision models are modified to directly output robotics actions. The model uses the vast knowledge of language and vision pre-training from internet data and is fine-tuned with data from robots. The architecture is also adjusted to generate action tokens (e.g., Covariant RFM-1, Google RT-2 and RT-2-X).
Both approaches show much promise and we can expect exciting developments in the coming months. There is a lot of potential in the field, and as LLMs continue to advance, the benefits will also spill over to the world of robotics.
Read the full article on VentureBeat.