Function calling is the key to making LLMs useful beyond just chat.
Let’s see it in action. Imagine we have a simple Python function to get the current weather:
import datetime
def get_weather(city: str, unit: str = "celsius") -> dict:
"""Get the current weather for a specified city."""
weather_data = {
"city": city,
"temperature": 25,
"unit": unit,
"time": datetime.datetime.now().isoformat()
}
return weather_data
Now, we want our LLM to be able to call this function. We describe the function to the LLM in a specific format, usually a JSON schema.
{
"name": "get_weather",
"description": "Get the current weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city for which to get the weather."
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature to return."
}
},
"required": ["city"]
}
}
When you send a user’s query like "What’s the weather like in London?" to the LLM along with this function definition, the LLM doesn’t try to answer it directly. Instead, it outputs a structured JSON object indicating which function to call and with what arguments:
{
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"London\"}"
}
}
]
}
Your application then intercepts this output. It sees that get_weather was requested with city="London". Your code then executes get_weather(city="London"). The result of this function call (e.g., {"city": "London", "temperature": 25, "unit": "celsius", "time": "2023-10-27T10:30:00.123456"}) is then sent back to the LLM.
The LLM, now armed with the actual weather data, can then generate a natural language response for the user: "The current weather in London is 25 degrees Celsius."
This creates a loop: User Query -> LLM (identifies tool) -> Your Code (executes tool) -> LLM (generates final response).
The core problem function calling solves is bridging the gap between the LLM’s language understanding and the real-world actions or data access your application needs. LLMs are great at understanding intent and extracting structured information, but they don’t have direct access to APIs, databases, or executable code. Function calling provides a standardized, LLM-friendly way to expose these capabilities.
Internally, LLMs are trained on massive datasets that include code, API documentation, and examples of structured data. When presented with a function description (like the JSON schema), the LLM learns to map patterns in the user’s query to the parameters defined in that schema. The "magic" is in its ability to infer the correct function and arguments based on the semantic meaning of the user’s request. It’s essentially a sophisticated pattern-matching and slot-filling exercise guided by the provided tool definitions.
The tool_choice parameter in many LLM APIs is crucial for guiding the model. If you set tool_choice="auto", the LLM decides whether to call a tool or respond directly. If you set tool_choice={"type": "function", "function": {"name": "get_weather"}}, you’re forcing the LLM to call get_weather if it can, or to indicate it cannot fulfill the request using that specific tool. This explicit control is vital for building robust applications where you must use a particular tool.
The required field in the function’s parameter schema is critical. If a required parameter is missing from the user’s prompt, the LLM will often ask a clarifying question to obtain it, rather than hallucinating a value or failing silently. For instance, if the user asked "What’s the weather?", the LLM would likely respond by asking "Which city would you like the weather for?".
The most surprising aspect for many developers is how seamlessly LLMs handle ambiguity and follow-up questions. If a user asks "What’s the temperature in Paris, but in Fahrenheit?", the LLM can correctly parse both the city and the desired unit, mapping them to the city and unit parameters respectively. If the user had only said "What’s the temperature in Paris?", the LLM, seeing that unit is optional, would likely call the function with just the city and then append the default unit (Celsius in our example) to its natural language response.
The next step is handling multiple tool calls simultaneously or chaining them together.