Module 225 min read · Building with AI APIs

Making Your First API Call

Theory only takes you so far. This module is about getting your hands on a working AI API integration — real code that runs, sends a request to an AI model, and gives you a response you can use. By the end you will have a pattern you can adapt for nearly any AI integration, and you will understand every line of it well enough to debug and extend it.

Setting up your Python environment

Python is the dominant language for AI API work. The openai SDK wraps the raw HTTP calls so you write clean Python instead of constructing request dictionaries manually. Understanding the underlying HTTP is valuable when things go wrong or when you are working with a provider that does not have an SDK, but for day-to-day work the SDK is the right tool.

Virtual environments

Always work in a virtual environment to isolate dependencies per project. This prevents version conflicts between projects and makes your requirements file reliable. Use Python 3.9 or higher.

mkdir my-ai-project
cd my-ai-project
python3 -m venv venv

# Activate on macOS/Linux
source venv/bin/activate

# Activate on Windows
venv\Scripts\activate

# Install the SDK and dotenv for environment variable management
pip install openai python-dotenv

API key security: environment variables

Your API key is a secret credential that grants billing authority over your account. The rule is simple: never hardcode it in source files, never commit it to version control, never expose it in client-side code. The correct pattern for development is environment variables — storing the key in your operating system's environment or a local file that is excluded from source control.

Using a .env file

Create a file named .env in your project root:

OPENAI_API_KEY=sk-proj-your-actual-key-here

Immediately add .env to your .gitignore. Do this before your first commit:

echo ".env" >> .gitignore

The .gitignore step is not optional

Thousands of API keys are leaked to GitHub every day. Automated bots scan new commits looking for API key patterns. If your key is committed even once, it should be considered compromised — rotate it immediately. The .gitignore line before your first commit is the safest habit you can build.

For production deployments: Use your hosting platform's secrets management — AWS Secrets Manager, Railway environment variables, Vercel environment variables, Heroku config vars. Never ship .env files inside containers or commit them to repositories.

Your first completion: the minimal example

Here is the simplest possible working OpenAI API call. Every line is intentional — nothing is boilerplate you can safely ignore:

import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize the client — reads OPENAI_API_KEY from the environment automatically
client = OpenAI()

# Make the API call
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Extract and print the response text
print(response.choices[0].message.content)

Run this with python app.py and you will see the model's response in your terminal. Let us understand every part.

The OpenAI client

OpenAI() creates a client instance. With no arguments, it reads your API key from the OPENAI_API_KEY environment variable — which load_dotenv() has loaded from your .env file. The client handles HTTP connection pooling, retry logic on network errors, and request formatting.

The model parameter

gpt-4o-mini is OpenAI's efficient, lower-cost model — excellent for development and prototyping. It costs roughly $0.15 per million input tokens and $0.60 per million output tokens. Use gpt-4o for tasks needing top capability at higher cost. The model name is just a string — switching providers means changing this string and the client initialization.

The messages array

The messages parameter is a list of message objects, each with a role and content. For a simple single-turn question, you need one message with role: "user". We will explore the full role system extensively in Module 3.

Reading the full response object

The response contains far more than just the text. Understanding its full structure helps you use all available information and handle edge cases properly.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Explain recursion in one paragraph."}
    ],
    max_tokens=200
)

# Response metadata
print("ID:", response.id)           # e.g. "chatcmpl-abc123"
print("Model:", response.model)     # exact version served, e.g. "gpt-4o-mini-2024-07-18"
print("Created:", response.created) # Unix timestamp

# The main content
choice = response.choices[0]
print("Finish reason:", choice.finish_reason)  # "stop", "length", or "content_filter"
print("Role:", choice.message.role)            # always "assistant"
print("Content:", choice.message.content)

# Token usage — critical for cost tracking
print("Prompt tokens:", response.usage.prompt_tokens)
print("Completion tokens:", response.usage.completion_tokens)
print("Total tokens:", response.usage.total_tokens)

The finish_reason field

This field tells you why the model stopped generating. It has three important values:

"stop" — The model finished naturally. This is what you want.
"length" — The response was cut off because it reached max_tokens. The output may be incomplete. Increase max_tokens or use chunking.
"content_filter" — The response was blocked by the provider's content policy. The content field may be null or partial.

Always check finish_reason in production code. A response that looks complete might actually be truncated if finish_reason is "length".

Understanding tokens in practice

The token usage fields in every response are your direct window into cost. AI models do not process text character by character or word by word — they process tokens, which are chunks of text roughly corresponding to 3-4 characters or about 0.75 words in English.

Pricing is always quoted per million tokens. The total cost of any API call is:

cost = (prompt_tokens / 1_000_000 * input_price) + (completion_tokens / 1_000_000 * output_price)

For gpt-4o-mini at $0.15/M input and $0.60/M output, a call with 100 prompt tokens and 200 completion tokens costs:

cost = (100 / 1_000_000 * 0.15) + (200 / 1_000_000 * 0.60)
     = $0.000015 + $0.000120
     = $0.000135 (about 0.014 cents)

This seems tiny — and individual calls are cheap — but these costs compound with scale. A system making 10,000 calls per day with 500 average total tokens will cost roughly $4.50/day on gpt-4o-mini, or $90/day on gpt-4o. Building cost awareness into your code from the start is a professional habit. Module 4 covers cost optimization in depth.

Making a call with the raw requests library

While the SDK is the right choice for production code, understanding the raw HTTP call demystifies what the SDK is doing and is valuable when troubleshooting or working with providers without an SDK:

import os
import json
import requests
from dotenv import load_dotenv

load_dotenv()

url = "https://api.openai.com/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-4o-mini",
    "messages": [
        {"role": "user", "content": "Name three programming languages."}
    ],
    "max_tokens": 100
}

response = requests.post(url, headers=headers, json=payload)

if response.status_code == 200:
    data = response.json()
    print(data["choices"][0]["message"]["content"])
else:
    print(f"Error {response.status_code}: {response.text}")

Compare this to the SDK version: the SDK handles constructing the URL, setting the Authorization header, serializing the payload as JSON, and parsing the response — but it is just HTTP POST with JSON. This is exactly what every AI API is under the hood.

Your first working integration: a simple Q&A helper

Let us put this together into a simple interactive program that reads questions from the user in a loop and gets AI answers. This is a real, usable tool:

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

def ask(question: str) -> str:
    """Send a question to GPT-4o-mini and return the answer text."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant. Give concise, accurate answers."
            },
            {
                "role": "user",
                "content": question
            }
        ],
        max_tokens=500,
        temperature=0.7
    )

    # Check for truncation
    if response.choices[0].finish_reason == "length":
        print("[Warning: response may be truncated]")

    return response.choices[0].message.content

def main():
    print("AI Q&A Helper (type 'quit' to exit)")
    print("-" * 40)

    while True:
        question = input("\nYour question: ").strip()
        if question.lower() in ("quit", "exit", "q"):
            break
        if not question:
            continue

        try:
            answer = ask(question)
            print(f"\nAnswer: {answer}")
        except Exception as e:
            print(f"Error: {e}")

if __name__ == "__main__":
    main()

Notice several things this code demonstrates correctly:

The ask() function is a clean abstraction — the calling code does not need to know about response objects or finish reasons
A system message is included to set the assistant's behavior (more on this in Module 3)
We check finish_reason and warn the user if the response was truncated
Exceptions are caught so the program does not crash on API errors
The loop allows multiple questions without restarting the program

Handling errors gracefully

Production code must handle API errors without crashing. The OpenAI SDK raises typed exceptions that you can catch and handle appropriately:

from openai import OpenAI, RateLimitError, APIStatusError, APIConnectionError
from dotenv import load_dotenv
import time

load_dotenv()
client = OpenAI()

def call_with_retry(messages: list, max_retries: int = 3) -> str:
    """Call the API with simple exponential backoff on rate limit errors."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                max_tokens=500
            )
            return response.choices[0].message.content

        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait = 2 ** attempt  # 1s, 2s, 4s
            print(f"Rate limited. Waiting {wait}s before retry {attempt + 1}/{max_retries}")
            time.sleep(wait)

        except APIConnectionError as e:
            print(f"Connection error: {e}. Retrying...")
            time.sleep(1)

        except APIStatusError as e:
            # 4xx errors are usually your fault — don't retry
            if 400 <= e.status_code < 500:
                raise
            # 5xx errors are server-side — retry with backoff
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

    raise RuntimeError("Max retries exceeded")

This pattern — retry on transient errors, fail fast on client errors — is the foundation of production reliability. Module 9 covers full production hardening including circuit breakers and observability.

Working with the Anthropic SDK

Because the course covers multiple providers, here is the same basic call using Anthropic's Claude API. The conceptual model is identical but the SDK and schema differ slightly:

import anthropic
from dotenv import load_dotenv

load_dotenv()

# Reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-3-5-haiku-20241022",  # Fast, affordable Claude model
    max_tokens=500,
    messages=[
        {"role": "user", "content": "What is recursion?"}
    ]
)

# Anthropic's response structure differs slightly from OpenAI
print(message.content[0].text)

# Token usage
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

Key differences from OpenAI: Anthropic uses max_tokens as a required field (not optional). The response is accessed via message.content[0].text rather than response.choices[0].message.content. The system prompt is a separate parameter (system="...") rather than a messages entry. These are the kinds of provider-specific differences that Module 10's abstraction strategies address.

You have everything you need to start building

With a working API call, environment variable security, response parsing, and basic error handling, you have the complete foundation for an AI integration. Every more advanced concept in this course builds on exactly what you have learned here. The next module dives into the messages system — where the real power of chat-based APIs lives.