The Gemini API doesn’t just complete text; it can reason about tool use and orchestrate complex, multi-step processes.

Let’s see this in action. Imagine you want to summarize recent news articles about a specific company and then check its stock price.

import google.generativeai as genai
from google.generativeai.types import Tool
import json

# Configure your API key
genai.configure(api_key="YOUR_API_KEY")

# Define the tools your agent can use
tools = [
    Tool(
        function_declarations=[
            genai.FunctionDeclaration(
                name="search_news",
                description="Searches for recent news articles about a given topic.",
                parameters={
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "The topic to search for, e.g., 'Apple Inc.'"}
                    },
                    "required": ["query"],
                },
            ),
            genai.FunctionDeclaration(
                name="get_stock_price",
                description="Retrieves the current stock price for a given ticker symbol.",
                parameters={
                    "type": "object",
                    "properties": {
                        "ticker": {"type": "string", "description": "The stock ticker symbol, e.g., 'AAPL'"}
                    },
                    "required": ["ticker"],
                },
            ),
        ]
    )
]

# Initialize the model with tool support
model = genai.GenerativeModel(model_name="gemini-1.5-pro-preview-0409", tools=tools)

# Start a chat session
chat = model.start_chat(enable_automatic_function_calling=True)

# User's request
user_request = "Summarize the latest news about Tesla and tell me its current stock price."
response = chat.send_message(user_request)

# The model will decide which tool to call and with what arguments
print(response.text)

When you run this, the response.text will contain a FunctionCall object, indicating the model wants to use a tool. For instance, it might decide to call search_news first:

{
  "tool_code": "print(genai.llm_tool_code(search_news(query='Tesla latest news')))"
}

You’d then execute this tool_code in your Python environment (or wherever your tools are hosted) and get the results. Let’s say the news search returns a summary. You’d then send that summary back to the model in the next turn:

# Assuming 'news_summary' is the result from executing search_news
news_summary = "Recent reports indicate Tesla is facing increased competition but also expanding its Supercharger network."
response = chat.send_message(
    genai.types.Part.from_function_response(
        name="search_news",
        response={"content": news_summary},
    )
)
print(response.text)

The model, now having the news summary, might then decide to call get_stock_price:

{
  "tool_code": "print(genai.llm_tool_code(get_stock_price(ticker='TSLA')))"
}

You execute this, get the stock price (e.g., "$175.50"), and send it back:

# Assuming 'stock_price' is the result from executing get_stock_price
stock_price = "$175.50"
response = chat.send_message(
    genai.types.Part.from_function_response(
        name="get_stock_price",
        response={"content": stock_price},
    )
)
print(response.text)

Finally, the model synthesizes the information and provides the complete answer: "Recent reports indicate Tesla is facing increased competition but also expanding its Supercharger network. Its current stock price is $175.50."

This is an agentic workflow: the model acts as a reasoning engine that understands a user’s intent, breaks it down into executable steps, calls external tools to get information, and synthesizes the results into a coherent answer. The core problem this solves is moving beyond simple text generation to task automation. Instead of just writing about how to find news and stock prices, the Gemini API can do it by interacting with your defined functions.

The model doesn’t just blindly execute tool calls; it reasons about when to call a tool and what arguments to pass. It also understands the output of those tools and how to use that information in subsequent steps. The enable_automatic_function_calling=True flag is crucial here; without it, the model would just output the text describing what it wants to do, rather than generating the actual executable tool_code.

The magic happens in how the model interprets the function descriptions and parameters. It uses this metadata to infer which function is appropriate for a given sub-task and how to format the arguments according to the OpenAPI schema you provide. It’s a sophisticated form of prompt engineering where the "prompt" includes not just the user’s query but also the definitions of the world your agent can interact with.

The sequence of tool calls and responses forms a conversation. You send a message, the model responds with a tool call, you execute the tool and send the result back, and the model continues until it has enough information to answer the original request. This iterative process allows for complex, multi-hop reasoning that would be impossible with a single API call.

It’s important to realize that the model is generating the tool_code. It’s not just looking up a pre-defined script. This means it can adapt to slightly different phrasing or new combinations of requests, as long as the underlying tools can support them.

The next logical step in building more complex agents is handling failures and retries gracefully, and managing state across longer, more intricate workflows.

Want structured learning?

Take the full Gemini-api course →