Are Agents always the best choice?

Lukas Bieberich
16. Sept.
2 Min. Lesezeit

In the fast-evolving world of AI applications, agents have become a go-to strategy for leveraging the capabilities of large language models (LLMs). They promise flexibility, autonomy, and intelligent task management. But just because something is possible doesn’t mean it’s always the right choice.

Large language models like GPT-4o unlock impressive capabilities—especially when paired with autonomous agents that can plan and execute tasks independently. But not every application benefits from this level of autonomy. One key question should always be asked first:

Can a fixed, deterministic workflow be defined?

If so, that approach is usually preferable. Using agents means delegating control over your application’s workflow to an LLM—which inherently carries the risk of unexpected behavior and potential security vulnerabilities.

Additionally, it’s important to consider that tool or function descriptions are provided to the LLM by the framework via the system prompt. This means they are appended to the prompt behind the scenes for every user message before the model is invoked. Even in short conversations, this can lead to high token consumption—resulting in increased API costs and performance overhead.

Example Cost Estimate:

Let's assume we adopt the typical method of a React Agent, where the model verbalizes its thoughts, utilizes tools based on these thoughts, and reviews the tool outcomes in a loop until it reaches a final answer. With an average of three cycles per user request, we can estimate the tool overhead model-api costs for a 20-turn agent conversation as follows (model output tokens and other conversation tokens not included).

Component	Details	Token Overhead
Tool Descriptions	50 tools × 500 tokens/tool	25,000 input tokens/request
ReAct Pattern	3 cycles/message × 25,000 tokens	75,000 input tokens/message
Conversation	20 turns × 75,000 tokens	1.5 million tokens total
Cost (OpenAI o3)	$10 per 1 million input tokens	$15 tool overhead

As of April 2025.

It’s important to note that these figures reflect only the tool description costs, not the actual conversation tokens. Even when using more cost-efficient models than o3, the setup can quickly become unsustainable if long conversations or a large number of users are expected. A carefully considered workflow or agent setup is therefore not just a matter of quality improvement, but a critical cost factor.

If autonomy is still desired or required, a multi-agent pattern with expert routing—as described above—can help keep token consumption within reasonable limits.

While agents offer flexibility and autonomy, they come with trade-offs that shouldn’t be overlooked. Tool descriptions alone can create a substantial token overhead, even before real conversation tokens are considered. For many use cases, a clearly structured workflow delivers more predictable, efficient, and affordable results. If agent-based autonomy is still required, a multi-agent setup with expert routing can help balance power and cost. Ultimately, thoughtful design—rather than defaulting to agents—makes all the difference.

Are Agents always the best choice?

Wir schreiben nicht nur über KI - wir entwickeln sie.

Hier zum Newsletter anmelden