CoTools enables LLMs to use vast toolsets without costly retraining

Enable your LLM to use hundreds of tools without being trained on them.

Apr 09, 2025

Chain-of-Tools (CoTools), a framework designed by researchers at the Soochow University of China, boosts tool use in LLMs by enabling them to employ tools they have not been explicitly trained on.

Tool use is an essential and challenging part of building real-world applications with LLMs. Tools can be anything ranging from APIs to databases or applications like browsers.

Current methods for enabling tool use in LLMs have significant trade-offs. One popular approach is to fine-tune the LLM on examples of tool usage. This is an effective (albeit expensive) method but it restricts the model to only tools it has seen during the training phase. Fine-tuning can also hurt the foundational capabilities of LLMs, such as chain-of-thought (CoT) reasoning, instruction-following, and basic knowledge.

Another approach is to use in-context learning (ICL), where you provide the descriptions and instructions of the tools and examples of tool use in your prompt. This method provides flexibility in tool use but can result in complex and lengthy prompts. Its efficiency also decreases as the number of tools increases.

CoTools overcomes these limitations by combining aspects of both fine-tuning and in-context learning. CoTools keeps the backbone LLM frozen and does not change its weights. Instead, it trains lightweight, specialized modules that work alongside the LLM at inference time to choose the right tools.

The CoTools framework comprises three main components that work together during inference:

1) As the LLM generates each token, a Tool Judge analyzes the model’s “hidden states” to determine whether calling a tool is appropriate at that specific point in the reasoning chain.

2) If a tool is needed, a Tool Retriever module chooses the most suitable tool for the task. The Retriever chooses the tool that is semantically relevant to the task by computing the embedding of the query and response to the current point and compares it to those of the available tools. This way, it can also access tools that were not included in its training data.

3) Once the tool is selected, CoTools uses an ICL prompt with demonstrations using the tool. This way, CoTools avoids the inefficiency of adding thousands of demonstrations in the prompt for the initial tool selection.

After executing the tool, CoTools inserts the results into the model’s context and continues the response generation.

This method allows CoTools to integrate hundreds of tools into the LLM application without compromising the model’s accuracy and exploding inference costs. It is worth noting, however, that since CoTools requires access to the model’s hidden states, it can only be applied to open-weight models.

The researchers evaluated CoTools on numerical reasoning tasks using arithmetic tools and knowledge-based question answering (KBQA) requiring retrieval from knowledge bases.

On math benchmarks that required basic arithmetic operations and more complex math functions, LLaMA2-7B with CoTools matched the performance of ChatGPT and outperformed ToolkenGPT. This is proof that CoTools enhances the capabilities of the underlying foundation model.

For the KBQA tasks, the researchers created a new dataset called SimpleToolQuestions (STQuestions), featuring 1836 tools, including 837 tools that were only in the test set. CoTools showed superior tool selection accuracy and particularly excelled in scenarios with many tools and unseen tools.

Techniques like Chain-of-Tools can be promising for building LLM-powered agents and real-world applications. This is especially useful as new standards such as the Model Context Protocol (MCP) are enabling developers to easily integrate external tools and resources into their applications.

TechTalks

Discussion about this post