Module 830 min read · Building with AI APIs

Function Calling and Tool Use

Every module so far has treated the LLM as a text transformer: text goes in, text comes out. Function calling changes this. With function calling, the LLM can decide to invoke a function you have defined — passing it structured arguments — and use the function's return value to inform its final response. This is the mechanism that transforms a passive text generator into an active agent that can look things up, run calculations, query databases, call external APIs, and take actions in the world. Understanding function calling at the API level is essential before using any agent framework, because every framework is ultimately an abstraction over this mechanism.

How function calling works

The function calling protocol is a structured conversation between your code and the model. You define a set of tools — each described by a name, a description in natural language, and a JSON Schema specifying the arguments. You include these tool definitions in your API request. The model reads the tool descriptions, decides whether any tool is useful for the current task, and if so, returns a special response indicating which tool to call and with what arguments. Your code executes the function, collects the result, and appends it to the conversation. The model then generates its final response, informed by the function's output.

The model never executes your code directly. It generates a structured description of what to call and how to call it. Your code is responsible for the actual execution, input validation, error handling, and result formatting. This separation matters: the LLM provides the decision-making about what to call, your code provides the actual capabilities.

Defining a tool in the API

A tool definition has three fields: type (always "function"), name, description, and parameters (a JSON Schema object). The description is critical — it is what the model reads to decide whether and when to use this tool. A good description specifies what the function does, what it returns, and when it should (and should not) be used.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a specific city. "
                           "Use this when the user asks about current weather conditions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'San Francisco'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

The full function calling loop

A complete implementation handles three phases: sending the initial request with tools, detecting and dispatching tool calls in the response, and sending the tool results back.

import json
from openai import OpenAI

client = OpenAI()
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

# Step 1: Send request with tools
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

choice = response.choices[0]

# Step 2: Check if model wants to call a tool
if choice.finish_reason == "tool_calls":
    tool_call = choice.message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Step 3: Execute the function (your code)
    result = get_current_weather(**args)

    # Step 4: Append assistant message and tool result to conversation
    messages.append(choice.message)  # assistant's tool_call message
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

    # Step 5: Send again to get final response
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    print(final_response.choices[0].message.content)

Parallel tool calls

GPT-4o and Claude 3+ support parallel tool calling: the model can decide to call multiple tools simultaneously rather than sequentially, returning multiple tool call objects in a single response. This matters for performance when tasks require independent information from multiple sources — fetching weather and stock data simultaneously rather than one after the other. Your dispatch loop must handle multiple tool calls in a single response and return all results before the model generates its final answer.

Tool choice control

By default, the model decides whether to call a tool or respond directly. The tool_choice parameter lets you override this. Setting it to "none" prevents tool use entirely (useful when you want a pure text response from a model that has tools available). Setting it to "required" forces the model to call at least one tool. Setting it to a specific function name forces a call to that specific function, which is useful when you need structured output in the tool call format but don't actually have a function to execute.

Designing good tool interfaces

The model's ability to call your tools correctly depends almost entirely on how well you describe them. Several principles make tool definitions more reliable.

Write descriptions for the model, not for humans

The model reads your description to decide when to use the tool and what arguments to pass. Be explicit about the purpose, the expected input format, what the function returns, and any conditions under which it should or should not be called. "Search the knowledge base for relevant documents. Returns a list of document chunks ranked by relevance score. Use this when the user asks a question that may be answered by stored documents." is much better than "Search documents."

Use enums to constrain parameters

If a parameter has a fixed set of valid values, define it as an enum in the JSON Schema. The model will choose from the specified values rather than hallucinating a plausible-but-invalid argument. For a sorting parameter, "enum": ["asc", "desc"] is better than "type": "string" with a description saying "either 'asc' or 'desc'".

Mark parameters as required thoughtfully

Required parameters that the model cannot always know — a user ID that requires authentication context the model doesn't have — will cause the model to hallucinate values or fail to call the tool. If a parameter has a sensible default, make it optional. If a parameter truly requires context the model doesn't have, inject it at the orchestration layer rather than expecting the model to supply it.

Return structured results, not prose

Tool results should be structured data — JSON with named fields — not prose descriptions. A weather function should return {"temperature": 22, "conditions": "cloudy", "humidity": 65} rather than "It is 22 degrees with cloudy conditions and 65% humidity." Structured results give the model unambiguous data to work with and are easier to validate programmatically.

Error handling in tool execution

Tools fail. External APIs return errors, database queries time out, input validation catches bad arguments. How you surface these failures to the model matters — you can either propagate the error as a tool result (letting the model decide how to handle it) or handle it before the model sees it.

The recommended approach is to return error information as a structured tool result rather than raising an exception that aborts the whole conversation. A result like {"error": "rate_limited", "message": "Weather API rate limit exceeded. Try again in 60 seconds."} gives the model enough information to tell the user what happened and suggest alternatives. An uncaught exception that crashes your tool dispatch loop loses all conversation state and provides no recovery path.

Validate before executing

The model's tool call arguments should be treated as untrusted input, not as correct values. Validate all arguments before passing them to your actual function implementation: check that required fields are present and have the expected types, that string values don't exceed expected lengths, that numeric values are within expected ranges, and that any IDs or references actually exist in your system. A hallucinated user ID passed directly to a database query is a reliability bug; passed to a financial system, it is a safety issue.

Module 9 builds on function calling to cover the full stack of production AI application development: authentication, rate limit management, caching, error recovery, and the deployment patterns that turn a working prototype into a reliable production service.