What makes a system an agent?

Lukas Bieberich
9. Sept.
3 Min. Lesezeit

Agents are a core concept in modern AI design—far beyond the world of language models. But what exactly makes a system an “agent”? And how do classical agent-based frameworks compare to what today’s LLMs can do?

In this first part of the series, we take a closer look at the theoretical foundations: What does it mean for a system to act autonomously? What role does the so-called policy play in governing agent behavior? And how can we apply this thinking to language models, which on the surface seem to “just” generate text?

The Concept of an Agent

The term “agent” has long existed outside the context of modern language models. Generally, it refers to a system that processes its environment and responds to its state through actions. These actions, in turn, modify the state of the environment. In this sense, the system is considered autonomous, as it independently reacts to its surroundings without external instructions.

The function that describes the connection between state and action is referred to as the policy. It defines the behavior of the agent. A policy can be specified by a fixed set of rules or be parameterized and learned (as in reinforcement learning). Applications of this perspective range from simple program logic and robotic control to the human brain.

How do Language Models function as Agents?

Assuming that actions correspond to calls to predefined programs or functions, the LLM is provided with structured descriptions of these functions in the prompt. During a conversation, it autonomously decides whether a tool call is necessary and, if so, which one. If required, the language model returns an instruction for the appropriate tool call—along with all necessary arguments—as a structured output (e.g., JSON, Markdown, or code).

As long as it adheres to the specified structure, the output can be formally parsed and executed by the framework. The tool response can then be passed back to the LLM in a subsequent call, which determines whether additional tool calls are needed or if a direct response to the user should be generated (ReAct pattern).

The ReAct Pattern in 5 Simple Steps

ReAct stands for Reasoning + Acting.

A language model like ChatGPT first thinks through a problem out loud, then decides what action to take (e.g., search a database, write and execute code, perform a calculation, etc.).

Diagram showing how a large language model (LLM) processes a user query. The query goes into the LLM for reasoning, which may call external tools, interact with an agent, or take actions within an environment. The LLM receives observations back, generates an answer, and returns a response to the user. — The ReAct pattern

1. Receive Input (Question or Problem)

The model receives a task or question from the user.

Example:

“What was Apple’s revenue in 2023?”

2. Reasoning: Think It Through

The model reflects on what it needs to do to answer the question.

It writes down its thoughts—much like a human planning a solution.

Example Thought:

“To answer this question, I need to find up-to-date information about Apple’s 2023 revenue.”

3. Acting: Tool Call

The model chooses an action, such as:

Starting a web search
Querying a database
Using a tool to perform a calculation

This action is then actually executed.

Example Action:

“Run web search for: Apple revenue 2023”

4. Observation

The model receives the result of the action (e.g., the search result) and examines it.

Example Result:

“Apple reported a revenue of 93 billion USD in 2023.”

5. Generate Answer

Now the model uses the gathered information to formulate a final response and return it to the user.

Example Answer:

“Apple reported a revenue of 383 billion USD in 2023.”

Loops Are Possible

Sometimes one step isn’t enough—the model may go through multiple reasoning and acting cycles until it has enough information.

Multi-Agent Systems and the Supervisor Pattern

The ability of language models to function as agents depends directly on their capacity to generate structured output (such as JSON or Markdown), enabling parsable tool or function calls—whether for weather APIs, databases, or shell commands. This capability is developed through conversational fine-tuning.

What is, in reality, a multi-step process appears to users of ChatGPT, Cursor, or other AI applications powered by agents in the background as a single LLM call. It is important to note that language models merely define the policy. The actual execution is handled by an intermediary framework (such as LangGraph, CrewAI, etc.).

For more complex tasks, it can be useful to connect multiple agents. In the supervisor pattern, a central agent handles the initial routing to the appropriate expert. Technically, this can be implemented—similar to tool descriptions—via the system prompt. Especially in cases where tasks are clearly separated, multi-agent systems are advantageous, as tool descriptions can be distributed across agents. This saves tokens and improves response quality.

What makes a system an agent?

The Concept of an Agent

How do Language Models function as Agents?

The ReAct Pattern in 5 Simple Steps

Multi-Agent Systems and the Supervisor Pattern

Wir schreiben nicht nur über KI - wir entwickeln sie.

Hier zum Newsletter anmelden