Photo by Geoffrey Baumbach on Unsplash

An IBM blog post authored by Kim Martineau highlights how TOUCAN, a tool-calling dataset "of 1.5 million task scenarios, field-tested and open-sourced by IBM and University of Washington, is designed to improve how agents interact with the world and get things done." 

TOUCAN is a massive new dataset designed to supercharge the development of AI assistants—specifically LLM agents—that can use various software tools to complete complex tasks. Researchers lack high-quality, real-world data showing an agent learning to use a tool, particularly when a task requires multiple steps or switching between different tools. To bridge this gap, TOUCAN synthesized 1.5 million data points from nearly 500 authentic Model Context Protocols (MCPs), which are real-world environments used for training and testing AI. Unlike older, less diverse datasets, TOUCAN’s data is realistic, complex, and includes the actual execution logs of the tools being used, making it effective at teaching AI agents how to plan, interact, and course-correct when using software.

ACTION researchers led by Radha Poovendran developed a pipeline to ensure the high quality and diversity of the TOUCAN dataset. First, they used multiple language models to generate a broad array of realistic tool-use questions. These initial questions were then passed through a rigorous quality filter to discard any poor examples. Next, three different "teacher" models were used to generate the correct, step-by-step solutions (called trajectories), simulating how a top-performing AI agent would interact with the tools to solve the problem. The team also incorporated methods to simulate multi-turn conversations and introduce variety into the tasks, making the dataset more challenging and reflective of real-world use. By making TOUCAN the largest publicly available dataset of its kind, the authors aim to accelerate the creation of highly capable, open-source AI agents that can seamlessly use software tools to automate tasks across a wide range of fields.

TOUCAN realizes research from Thrusts AI-1 and AI-3 and will play a role in SEC-4 in future reporting periods.

Date