An LLM agentic loop is only as reliable as the weakest step in its chain of reasoning, and most frameworks don’t expose that weakness clearly enough.
Let’s watch an agent try to plan a trip for a user who wants to go from San Francisco to Tokyo, staying for 7 days, with a budget of $3000, and wants to visit the Ghibli Museum.
import anthropic
from typing import List, Dict, Any
from pydantic import BaseModel
import json
client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")
class TravelPlan(BaseModel):
destination: str
duration_days: int
budget_usd: int
activities: List[str]
flights: Dict[str, Any]
accommodation: Dict[str, Any]
def call_llm(prompt: str, tool_code: str = None) -> str:
messages = [
{"role": "user", "content": prompt}
]
if tool_code:
messages.append({"role": "assistant", "content": tool_code})
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=2000,
messages=messages
)
return response.content[0].text
def plan_trip_agent(user_request: str) -> TravelPlan:
# Step 1: Initial planning and information extraction
planning_prompt = f"""
User Request: "{user_request}"
Your task is to extract key information and create an initial travel plan.
Output the plan as a JSON object adhering to the TravelPlan schema.
If any information is missing, make a reasonable assumption and note it.
"""
plan_json_str = call_llm(planning_prompt)
print(f"--- Step 1: Initial Plan JSON ---\n{plan_json_str}\n")
plan_data = json.loads(plan_json_str)
plan = TravelPlan(**plan_data)
# Step 2: Flight Search (Simulated)
flight_search_prompt = f"""
Based on the following travel plan, find suitable flight options.
Plan: {plan.model_dump_json()}
Search for round-trip flights from San Francisco (SFO) to Tokyo (TYO) for {plan.duration_days} days,
aiming for a budget of around ${plan.budget_usd}.
Output the flight details as a JSON object. Assume a placeholder for actual flight API calls.
"""
flight_details_str = call_llm(flight_search_prompt)
flight_details = json.loads(flight_details_str)
plan.flights = flight_details
print(f"--- Step 2: Flight Details ---\n{json.dumps(plan.flights, indent=2)}\n")
# Step 3: Accommodation Search (Simulated)
accommodation_search_prompt = f"""
Based on the following travel plan, find suitable accommodation options.
Plan: {plan.model_dump_json()}
Search for accommodation in Tokyo for {plan.duration_days} nights,
keeping the total budget of ${plan.budget_usd} in mind.
Output the accommodation details as a JSON object. Assume a placeholder for actual accommodation API calls.
"""
accommodation_details_str = call_llm(accommodation_search_prompt)
accommodation_details = json.loads(accommodation_details_str)
plan.accommodation = accommodation_details
print(f"--- Step 3: Accommodation Details ---\n{json.dumps(plan.accommodation, indent=2)}\n")
# Step 4: Activity Integration (Simulated)
activity_integration_prompt = f"""
Refine the travel plan by integrating the requested activities and ensuring budget constraints.
Current Plan: {plan.model_dump_json()}
Ensure the Ghibli Museum is included. Adjust flight and accommodation costs if necessary to stay within budget.
Output the final, refined travel plan as a JSON object.
"""
final_plan_str = call_llm(activity_integration_prompt)
final_plan_data = json.loads(final_plan_str)
final_plan = TravelPlan(**final_plan_data)
print(f"--- Step 4: Final Refined Plan ---\n{final_plan.model_dump_json()}\n")
return final_plan
# Example Usage
user_request = "Plan a 7-day trip from San Francisco to Tokyo with a budget of $3000. I want to visit the Ghibli Museum."
final_trip = plan_trip_agent(user_request)
print("\n--- Final Output ---\n")
print(final_trip.model_dump_json(indent=2))
This agent breaks down the complex task of travel planning into sequential steps: initial extraction, flight search, accommodation search, and activity integration. Each step uses the output of the previous one as input for the next, forming a loop of reasoning and action. The TravelPlan Pydantic model acts as the state, carrying information between these steps.
The real power comes from how the LLM can be prompted to act as a "tool" or "function" within each step. Notice how call_llm can receive a tool_code argument, which in a more sophisticated setup would be actual Python code to execute an API call. Here, it’s just a string that guides the LLM’s output format. The agent effectively "calls" the LLM for each sub-task, passing the current state and receiving an updated state or specific action result.
The mental model is one of a state machine where the LLM is the transition function. The TravelPlan is the state. Each step prompts the LLM to either update the state, perform an action that modifies the state (like finding flights), or execute a tool based on the state. The user request seeds the initial state.
The most surprising thing is how brittle this can be when the LLM hallucinates a tool’s output, or when schema mismatches occur between steps. A common failure is that the LLM might output JSON that looks like the TravelPlan schema but is subtly invalid, causing a ValidationError in the Python code. The agent then stops, but the LLM might have already "decided" on a flight price that is now impossible to reconcile.
To build reliable multi-step agents, you need robust error handling and validation at every step. This means not just catching Python exceptions, but also validating the LLM’s output against expected schemas and even asking the LLM to self-correct if its output is invalid. For instance, if plan.flights = flight_details fails due to a schema mismatch, you’d ideally re-prompt the LLM with the error message and the expected schema, asking it to fix its output. This "re-prompting for correction" loop is crucial.
The next concept to explore is how to implement memory and context management for more complex, long-running agents that need to recall past interactions or maintain a consistent persona over many turns.