Function Calling and Tool Use
Every module so far has treated the LLM as a text transformer: text goes in, text comes out. Function calling changes this. With function calling, the LLM can decide to invoke a function you have defined — passing it structured arguments — and use the function's return value to inform its final response. This is the mechanism that transforms a passive text generator into an active agent that can look things up, run calculations, query databases, call external APIs, and take actions in the world. Understanding function calling at the API level is essential before using any agent framework, because every framework is ultimately an abstraction over this mechanism.
How function calling works
The function calling protocol is a structured conversation between your code and the model. You define a set of tools — each described by a name, a description in natural language, and a JSON Schema specifying the arguments. You include these tool definitions in your API request. The model reads the tool descriptions, decides whether any tool is useful for the current task, and if so, returns a special response indicating which tool to call and with what arguments. Your code executes the function, collects the result, and appends it to the conversation. The model then generates its final response, informed by the function's output.
The model never executes your code directly. It generates a structured description of what to call and how to call it. Your code is responsible for the actual execution, input validation, error handling, and result formatting. This separation matters: the LLM provides the decision-making about what to call, your code provides the actual capabilities.
Defining a tool in the API
A tool definition has three fields: type (always "function"), name, description, and parameters (a JSON Schema object). The description is critical — it is what the model reads to decide whether and when to use this tool. A good description specifies what the function does, what it returns, and when it should (and should not) be used.
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specific city. "
"Use this when the user asks about current weather conditions.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'San Francisco'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
}
]
The full function calling loop
A complete implementation handles three phases: sending the initial request with tools, detecting and dispatching tool calls in the response, and sending the tool results back.
import json
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
# Step 1: Send request with tools
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
choice = response.choices[0]
# Step 2: Check if model wants to call a tool
if choice.finish_reason == "tool_calls":
tool_call = choice.message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
# Step 3: Execute the function (your code)
result = get_current_weather(**args)
# Step 4: Append assistant message and tool result to conversation
messages.append(choice.message) # assistant's tool_call message
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
# Step 5: Send again to get final response
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
print(final_response.choices[0].message.content)
GPT-4o and Claude 3+ support parallel tool calling: the model can decide to call multiple tools simultaneously rather than sequentially, returning multiple tool call objects in a single response. This matters for performance when tasks require independent information from multiple sources — fetching weather and stock data simultaneously rather than one after the other. Your dispatch loop must handle multiple tool calls in a single response and return all results before the model generates its final answer.
Tool choice control
By default, the model decides whether to call a tool or respond directly. The tool_choice parameter lets you override this. Setting it to "none" prevents tool use entirely (useful when you want a pure text response from a model that has tools available). Setting it to "required" forces the model to call at least one tool. Setting it to a specific function name forces a call to that specific function, which is useful when you need structured output in the tool call format but don't actually have a function to execute.
Designing good tool interfaces
The model's ability to call your tools correctly depends almost entirely on how well you describe them. Several principles make tool definitions more reliable.
Error handling in tool execution
Tools fail. External APIs return errors, database queries time out, input validation catches bad arguments. How you surface these failures to the model matters — you can either propagate the error as a tool result (letting the model decide how to handle it) or handle it before the model sees it.
The recommended approach is to return error information as a structured tool result rather than raising an exception that aborts the whole conversation. A result like {"error": "rate_limited", "message": "Weather API rate limit exceeded. Try again in 60 seconds."} gives the model enough information to tell the user what happened and suggest alternatives. An uncaught exception that crashes your tool dispatch loop loses all conversation state and provides no recovery path.
The model's tool call arguments should be treated as untrusted input, not as correct values. Validate all arguments before passing them to your actual function implementation: check that required fields are present and have the expected types, that string values don't exceed expected lengths, that numeric values are within expected ranges, and that any IDs or references actually exist in your system. A hallucinated user ID passed directly to a database query is a reliability bug; passed to a financial system, it is a safety issue.
Module 9 builds on function calling to cover the full stack of production AI application development: authentication, rate limit management, caching, error recovery, and the deployment patterns that turn a working prototype into a reliable production service.