Making Your First API Call
Theory only takes you so far. This module is about getting your hands on a working AI API integration — real code that runs, sends a request to an AI model, and gives you a response you can use. By the end you will have a pattern you can adapt for nearly any AI integration, and you will understand every line of it well enough to debug and extend it.
Setting up your Python environment
Python is the dominant language for AI API work. The openai SDK wraps the raw HTTP calls so you write clean Python instead of constructing request dictionaries manually. Understanding the underlying HTTP is valuable when things go wrong or when you are working with a provider that does not have an SDK, but for day-to-day work the SDK is the right tool.
Virtual environments
Always work in a virtual environment to isolate dependencies per project. This prevents version conflicts between projects and makes your requirements file reliable. Use Python 3.9 or higher.
API key security: environment variables
Your API key is a secret credential that grants billing authority over your account. The rule is simple: never hardcode it in source files, never commit it to version control, never expose it in client-side code. The correct pattern for development is environment variables — storing the key in your operating system's environment or a local file that is excluded from source control.
Using a .env file
Create a file named .env in your project root:
Immediately add .env to your .gitignore. Do this before your first commit:
Thousands of API keys are leaked to GitHub every day. Automated bots scan new commits looking for API key patterns. If your key is committed even once, it should be considered compromised — rotate it immediately. The .gitignore line before your first commit is the safest habit you can build.
For production deployments: Use your hosting platform's secrets management — AWS Secrets Manager, Railway environment variables, Vercel environment variables, Heroku config vars. Never ship .env files inside containers or commit them to repositories.
Your first completion: the minimal example
Here is the simplest possible working OpenAI API call. Every line is intentional — nothing is boilerplate you can safely ignore:
Run this with python app.py and you will see the model's response in your terminal. Let us understand every part.
The OpenAI client
OpenAI() creates a client instance. With no arguments, it reads your API key from the OPENAI_API_KEY environment variable — which load_dotenv() has loaded from your .env file. The client handles HTTP connection pooling, retry logic on network errors, and request formatting.
The model parameter
gpt-4o-mini is OpenAI's efficient, lower-cost model — excellent for development and prototyping. It costs roughly $0.15 per million input tokens and $0.60 per million output tokens. Use gpt-4o for tasks needing top capability at higher cost. The model name is just a string — switching providers means changing this string and the client initialization.
The messages array
The messages parameter is a list of message objects, each with a role and content. For a simple single-turn question, you need one message with role: "user". We will explore the full role system extensively in Module 3.
Reading the full response object
The response contains far more than just the text. Understanding its full structure helps you use all available information and handle edge cases properly.
The finish_reason field
This field tells you why the model stopped generating. It has three important values:
- "stop" — The model finished naturally. This is what you want.
- "length" — The response was cut off because it reached
max_tokens. The output may be incomplete. Increasemax_tokensor use chunking. - "content_filter" — The response was blocked by the provider's content policy. The content field may be null or partial.
Always check finish_reason in production code. A response that looks complete might actually be truncated if finish_reason is "length".
Understanding tokens in practice
The token usage fields in every response are your direct window into cost. AI models do not process text character by character or word by word — they process tokens, which are chunks of text roughly corresponding to 3-4 characters or about 0.75 words in English.
Pricing is always quoted per million tokens. The total cost of any API call is:
For gpt-4o-mini at $0.15/M input and $0.60/M output, a call with 100 prompt tokens and 200 completion tokens costs:
This seems tiny — and individual calls are cheap — but these costs compound with scale. A system making 10,000 calls per day with 500 average total tokens will cost roughly $4.50/day on gpt-4o-mini, or $90/day on gpt-4o. Building cost awareness into your code from the start is a professional habit. Module 4 covers cost optimization in depth.
Making a call with the raw requests library
While the SDK is the right choice for production code, understanding the raw HTTP call demystifies what the SDK is doing and is valuable when troubleshooting or working with providers without an SDK:
Compare this to the SDK version: the SDK handles constructing the URL, setting the Authorization header, serializing the payload as JSON, and parsing the response — but it is just HTTP POST with JSON. This is exactly what every AI API is under the hood.
Your first working integration: a simple Q&A helper
Let us put this together into a simple interactive program that reads questions from the user in a loop and gets AI answers. This is a real, usable tool:
Notice several things this code demonstrates correctly:
- The
ask()function is a clean abstraction — the calling code does not need to know about response objects or finish reasons - A system message is included to set the assistant's behavior (more on this in Module 3)
- We check
finish_reasonand warn the user if the response was truncated - Exceptions are caught so the program does not crash on API errors
- The loop allows multiple questions without restarting the program
Handling errors gracefully
Production code must handle API errors without crashing. The OpenAI SDK raises typed exceptions that you can catch and handle appropriately:
This pattern — retry on transient errors, fail fast on client errors — is the foundation of production reliability. Module 9 covers full production hardening including circuit breakers and observability.
Working with the Anthropic SDK
Because the course covers multiple providers, here is the same basic call using Anthropic's Claude API. The conceptual model is identical but the SDK and schema differ slightly:
Key differences from OpenAI: Anthropic uses max_tokens as a required field (not optional). The response is accessed via message.content[0].text rather than response.choices[0].message.content. The system prompt is a separate parameter (system="...") rather than a messages entry. These are the kinds of provider-specific differences that Module 10's abstraction strategies address.
With a working API call, environment variable security, response parsing, and basic error handling, you have the complete foundation for an AI integration. Every more advanced concept in this course builds on exactly what you have learned here. The next module dives into the messages system — where the real power of chat-based APIs lives.