SECTION 05 CORE LAB

Understanding
Agents

In Section 03 you ran your first tool loop. Now you go one level deeper: understanding the architectural patterns behind every real-world agent, how an agent perceives and acts on its environment, how to make reasoning visible through explicit traces, and how to decide whether a task actually needs an agent at all. The lab builds a ReAct agent with step-by-step reasoning you can read and debug.

📖 Start Theory 🔬 Jump to Lab

01 · AGENT ARCHITECTURE PATTERNS

Four Patterns — One Spectrum of Complexity

Not all agents are equal. The architecture you choose determines how the agent reasons, how much it costs to run, and what failure modes you inherit. There are four patterns in wide production use as of 2024–2026, ranging from a simple reactive loop to agents that self-critique and revise their own plans.

PATTERN 01

Simple Reactive Loop

Prompt → LLM call → tool call (optional) → response. No explicit reasoning trace. Fast and cheap. Used for straightforward tool-augmented chat.

User → [LLM + Tools] → Answer

WIDELY USED (2023–2026)

PATTERN 02

ReAct (Reason + Act)

Interleaves explicit Thought steps with Action steps. Each iteration the model reasons about what to do before doing it. Reasoning is visible and debuggable. (Yao et al., ICLR 2023)

Thought → Action → Observation → Thought → ...

WIDELY USED (2023–2026)

PATTERN 03

Plan-and-Execute

A planner LLM call first decomposes the task into a step list. A separate executor call works through the list. Enables parallelism and clearer task decomposition for complex goals.

[Planner LLM] → [Step 1, 2, 3] → [Executor LLM × N]

EMERGING (2024–2026)

PATTERN 04

Reflection / Self-Critique

After producing an output, the agent (or a second LLM call) critiques it against goals and revises. Based on Reflexion (Shinn et al., 2023). Higher quality at higher cost.

Act → Critique → Revise → Critique → ... → Done

EMERGING (2023–2026)

Which pattern should you use? Start with the simplest pattern that solves your problem. ReAct is a strong default for multi-step tasks because the explicit reasoning trace makes debugging straightforward. Add plan-and-execute when tasks are complex enough to benefit from decomposition. Add reflection only when output quality justifies the extra API calls and cost.

📎 Sources: Yao et al. — ReAct (arXiv:2210.03629, ICLR 2023) · Shinn et al. — Reflexion (arXiv:2303.11366, 2023) · Lilian Weng — LLM Powered Autonomous Agents (2023)

02 · THE PERCEPTION-ACTION CYCLE

How an Agent Experiences Its Environment

Every agent operates within a perception-action cycle — a concept borrowed from cognitive science and robotics, applied to LLM-based systems. The agent does not "see" the world directly; it perceives a representation of the world through its context window, decides what to do, acts via tools, and receives a new observation that updates its representation.

// THE AGENT'S WORLD MODEL

ENVIRONMENT

Files · APIs · Web · DB

← Observation

Action →

AGENT

LLM + Context Window

← Response

Request →

USER / ORCHESTRATOR

Goal · Feedback

👁️

Perception

What the agent can see: the system prompt, conversation history, tool results, and any documents injected into context. Everything outside this window is invisible to the agent.

🧭

Decision

The LLM reasons over its current perception and selects an action — a tool call, a clarifying question, or a final response. This is the one step the agent "thinks."

🎬

Action

The chosen action is executed by your application code — not the LLM. The LLM only produces a description of the action; your code runs it and returns the result.

The LLM never executes tools directly. When a model calls a tool, it outputs a structured request (a JSON object). Your application code intercepts that request, runs the real function or API call, and returns the result as a new message. The model never touches external systems — your code does. This boundary is where you implement security controls and error handling.

📎 Sources: Lilian Weng — LLM Powered Autonomous Agents (2023) · Anthropic — Tool Use Documentation

03 · ANATOMY OF A PRODUCTION AGENT

Five Components Every Agent Has

Whether you write a raw-SDK agent in 80 lines or use a framework like LangGraph, every production agent is composed of the same five functional components. Understanding each one lets you reason about agent behavior independently of which library you use.

COMPONENT	WHAT IT DOES	IMPLEMENTED AS
System Prompt	Defines the agent's role, capabilities, constraints, output format, and available tools. The agent's "personality and rules."	The `system` parameter in the API call
Tool Registry	The list of tools available to the agent — their names, descriptions, and input schemas. The agent can only call what is registered.	The `tools` list in the API call
Message History	The accumulated conversation — user messages, assistant reasoning, tool calls, and tool results. This is the agent's working memory.	The `messages` list, grown each iteration
Loop Controller	The application code that calls the API, inspects `stop_reason`, routes tool calls to executors, appends results, and decides when to stop.	Your Python `while` / `for` loop
Tool Executors	The functions that actually run when a tool is called — hitting APIs, running code, reading files. They return a result string that goes back into the message history.	Your Python functions, dispatched by tool name

Frameworks don't change this anatomy. LangGraph, AutoGen, and CrewAI all implement these same five components — they just give you higher-level abstractions and pre-built tool executors. When a framework "does something unexpected," trace it back to one of these five components to find the root cause.

📎 Sources: Anthropic — Tool Use Docs · LangGraph Documentation

04 · WHEN NOT TO USE AN AGENT

The Decision Framework

Agents are powerful but expensive — in tokens, latency, and error surface. Anthropic's official guidance states plainly: "use the simplest solution that works." An agent loop where a single prompt would suffice is over-engineering, not good architecture. Use this decision tree before choosing an agent.

// AGENT OR NOT? — DECISION FLOW

Can a single LLM call solve this? → Yes → Use a prompt, not an agent.

↓ No

Does the task require tools or external data? → No → Use chain-of-thought or a multi-turn conversation.

↓ Yes

Is the number of steps known and fixed? → Yes → Use a hardcoded chain (faster, cheaper, more reliable).

↓ No

Does the task require branching decisions based on intermediate results? → Yes → Use an agent.

USE AN AGENT WHEN

✓ Steps are unknown until runtime
✓ Task requires dynamic tool selection
✓ Output of one step determines the next
✓ Recovery from tool failures is needed
✓ Task may need to ask for clarification

SKIP THE AGENT WHEN

✗ One prompt with a retrieval step handles it
✗ The pipeline is always the same N steps
✗ Latency is critical and you can't afford multiple API calls
✗ The task is purely generative (writing, summarizing)
✗ You don't have a way to verify tool outputs

📎 Sources: Anthropic — Agents & When to Use Them · Yao et al. — ReAct (arXiv:2210.03629)

05 · AGENT OBSERVABILITY

You Cannot Debug What You Cannot See

An agent that fails silently is worse than no agent at all. A core practice in production agent development is structured tracing — logging every iteration's inputs, tool calls, tool results, and stop reasons in a format that can be replayed and inspected. Without it, debugging a multi-step failure is nearly impossible.

📋

Iteration Logs

Log the iteration number, stop_reason, and number of tool calls for every loop cycle. The first sign of an infinite loop is a monotonically increasing iteration count with no end_turn.

PRODUCTION ESSENTIAL

🔍

Tool Call Traces

Log every tool call: name, input arguments, and return value. This trace is the primary artifact for understanding what the agent decided and why. Store it alongside the final output.

PRODUCTION ESSENTIAL

💬

Reasoning Traces

In ReAct-style agents, the model's Thought content is explicit reasoning you can read. Log it. When the agent makes a surprising decision, the Thought often reveals the wrong assumption.

PRODUCTION ESSENTIAL

Token budget monitoring: Track cumulative input tokens across iterations. An agent's context grows with every tool result appended to the message history. If you don't monitor this, you will eventually hit the context window limit mid-task with no graceful recovery. Log total_tokens after every API call and add a hard stop before the limit.

📎 Sources: Anthropic — Tool Use & Agent Guidance · Shinn et al. — Reflexion (arXiv:2303.11366, 2023)

SOURCES USED IN THIS SECTION

Verified References

Every claim in this section is grounded in one of these sources.

Source	Type	Covers	Recency
Yao et al. — ReAct	Academic paper	ReAct pattern, Thought/Action/Observation trace, ICLR 2023	Oct 2022 / ICLR 2023
Shinn et al. — Reflexion	Academic paper	Reflection pattern, self-critique, verbal reinforcement	2023
Lilian Weng — LLM Powered Autonomous Agents	Blog / Survey	Architecture patterns, perception-action, agent taxonomy	June 2023
Anthropic — Tool Use & Agent Docs	Official docs	Tool loop mechanics, stop_reason, agent design guidance	Maintained 2024–2026
LangGraph Documentation	Official docs	Graph-based agent architecture, state management	Maintained 2024–2026

HANDS-ON LAB

Build a ReAct Agent with Explicit Reasoning Traces

You will build a research agent that uses the ReAct pattern explicitly — printing Thought, Action, and Observation at every step so you can watch the agent reason in real time. It has two tools: a simulated web search and a word counter. The goal is to understand the loop deeply enough that you can trace any agent failure back to a specific step.

🔬

ReAct Research Agent — Explicit Trace

PYTHON · ~90 LINES · ANTHROPIC API KEY REQUIRED

Set up the project

If you already have the agent-lab directory from Section 03, just create a new file in it. Otherwise, set up fresh.

BASH

mkdir -p agent-lab && cd agent-lab
python -m venv .venv
source .venv/bin/activate    # Windows: .venv\Scripts\activate
pip install anthropic python-dotenv
echo "ANTHROPIC_API_KEY=your_key_here" > .env

Get your API key from console.anthropic.com → API Keys.

Define the system prompt and tools

Create react_agent.py. The system prompt explicitly instructs the model to reason step by step before every action — this is what makes it a ReAct agent rather than a simple reactive loop.

PYTHON — react_agent.py

import os
import anthropic
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic()

# The system prompt enforces ReAct: think before every action
SYSTEM = """You are a research agent that follows the ReAct pattern strictly.
Before every tool call, write a Thought explaining your reasoning.
After receiving a tool result, write an Observation summarising what you learned.
Keep thinking and acting until you have enough information to give a complete answer.
When you are done, write your final answer clearly."""

# Two tools: simulated search and a word counter
TOOLS = [
    {
        "name": "web_search",
        "description": (
            "Search the web for information on a topic. "
            "Returns a short summary of the top result. "
            "Use this when you need factual information you don't know."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "count_words",
        "description": "Counts the number of words in a given text string.",
        "input_schema": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "The text to count words in"
                }
            },
            "required": ["text"]
        }
    }
]

Implement the tool executors

The web search is simulated with a static lookup — in a real agent you would call a search API like Brave or Serper. The word counter is exact Python logic. Notice the pattern: both return a plain string to pass back to the model.

PYTHON — react_agent.py (continued)

# Simulated search DB — swap for a real search API in production
SEARCH_DB = {
    "ReAct paper": (
        "ReAct: Synergizing Reasoning and Acting in Language Models. "
        "Yao et al., arXiv:2210.03629, published at ICLR 2023. "
        "Interleaves chain-of-thought reasoning with action steps."
    ),
    "LangGraph": (
        "LangGraph is a library for building stateful, multi-actor applications with LLMs. "
        "Built by LangChain. Uses a graph of nodes and edges to model agent workflows. "
        "Supports cycles, branching, and human-in-the-loop patterns."
    ),
    "Anthropic": (
        "Anthropic is an AI safety company founded in 2021. "
        "Creators of the Claude model family. Developed Constitutional AI (CAI). "
        "Focused on AI safety research and interpretability."
    ),
}

def web_search(query: str) -> str:
    for key, result in SEARCH_DB.items():
        if key.lower() in query.lower():
            return result
    return f'No results found for "{query}". Try a different query.'


def count_words(text: str) -> str:
    count = len(text.split())
    return f"Word count: {count}"


def execute_tool(name: str, tool_input: dict) -> str:
    if name == "web_search":
        return web_search(tool_input["query"])
    if name == "count_words":
        return count_words(tool_input["text"])
    return f"Unknown tool: {name}"

Implement the ReAct loop with trace printing

This loop is similar to Section 03's, but it prints the full reasoning trace — every Thought block and every tool call — so you can see the agent's reasoning in real time. Study how the token budget accumulates.

PYTHON — react_agent.py (continued)

def run_react_agent(user_message: str, max_iterations: int = 8) -> str:
    messages = [{"role": "user", "content": user_message}]
    total_input_tokens = 0

    print(f"\n{'='*60}")
    print(f"USER: {user_message}")
    print(f"{'='*60}")

    for iteration in range(max_iterations):
        print(f"\n[iteration {iteration + 1}]")

        response = client.messages.create(
            model="claude-opus-4-6",  # check docs.anthropic.com for current models
            max_tokens=1024,
            system=SYSTEM,
            tools=TOOLS,
            messages=messages
        )

        total_input_tokens += response.usage.input_tokens
        print(f"  stop_reason  : {response.stop_reason}")
        print(f"  input_tokens : {response.usage.input_tokens} (total: {total_input_tokens})")

        # Print the full reasoning trace from this iteration
        for block in response.content:
            if hasattr(block, "text") and block.text.strip():
                print(f"\n  THOUGHT/TEXT:\n  {block.text.strip()}")
            elif block.type == "tool_use":
                print(f"\n  ACTION: {block.name}({block.input})")

        # Append assistant turn to history
        messages.append({"role": "assistant", "content": response.content})

        # Done
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    print(f"\n{'='*60}")
                    print(f"FINAL ANSWER:\n{block.text}")
                    print(f"{'='*60}")
                    print(f"Total input tokens used: {total_input_tokens}")
                    return block.text
            return "(end_turn with no text)"

        # Execute tool calls, collect observations
        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    print(f"  OBSERVATION: {result}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached."


if __name__ == "__main__":
    run_react_agent(
        "Search for the ReAct paper, then count how many words "
        "are in the result you find."
    )

Run it and read the trace

BASH

python react_agent.py

EXPECTED OUTPUT

============================================================
USER: Search for the ReAct paper, then count how many words are in the result.
============================================================

[iteration 1]
  stop_reason  : tool_use
  input_tokens : 612 (total: 612)

  THOUGHT/TEXT:
  I'll search for the ReAct paper first.

  ACTION: web_search({'query': 'ReAct paper'})
  OBSERVATION: ReAct: Synergizing Reasoning and Acting in Language Models.
               Yao et al., arXiv:2210.03629, published at ICLR 2023.
               Interleaves chain-of-thought reasoning with action steps.

[iteration 2]
  stop_reason  : tool_use
  input_tokens : 743 (total: 1355)

  THOUGHT/TEXT:
  I found the ReAct paper. Now I'll count the words in that result.

  ACTION: count_words({'text': 'ReAct: Synergizing Reasoning ...'})
  OBSERVATION: Word count: 30

[iteration 3]
  stop_reason  : end_turn
  input_tokens : 821 (total: 2176)

  THOUGHT/TEXT:
  I have all the information needed to answer.

============================================================
FINAL ANSWER:
The ReAct paper (Yao et al., arXiv:2210.03629, ICLR 2023) describes a pattern
that interleaves chain-of-thought reasoning with action steps. The search result
summary contains 30 words.
============================================================
Total input tokens used: 2176

Read the trace carefully. Notice that the input token count grows with each iteration — this is the context window accumulating. After 3 iterations, it has grown from 612 to 2,176 tokens. A long agent run with large tool outputs can reach tens of thousands of tokens. This is why monitoring token usage is a production essential, not an afterthought.

Extension: add a third tool and test a failed search

Add a get_current_date tool that returns today's date. Then run the agent with a query the search DB doesn't have — observe how it handles the "No results found" observation and either retries with a different query or gracefully reports the failure. This is how you discover your agent's error recovery behavior before production.

PYTHON — add to TOOLS list

{
    "name": "get_current_date",
    "description": "Returns today's date in ISO 8601 format (YYYY-MM-DD).",
    "input_schema": {"type": "object", "properties": {}}
}

PYTHON — add to execute_tool

from datetime import date

if name == "get_current_date":
    return str(date.today())

PYTHON — new test call

run_react_agent("What is today's date, and search for 'quantum computing'.")

Watch how the agent handles the "No results found" response — does it give up, retry with a different query, or tell the user it couldn't find anything? This reveals whether your system prompt provides adequate guidance for failure recovery.

Finished the theory and completed the lab? Mark this section complete to track your progress.

Last updated: April 5, 2026

UnderstandingAgents

Four Patterns — One Spectrum of Complexity

How an Agent Experiences Its Environment

Five Components Every Agent Has

The Decision Framework

You Cannot Debug What You Cannot See

Verified References

Build a ReAct Agent with Explicit Reasoning Traces

Understanding
Agents