Module 122 min read · Building with AI APIs

API Fundamentals for AI Builders

Every AI application you will ever build rests on the same foundation: the ability to send a request to a remote system and receive a structured response. Understanding that foundation deeply — not just at the surface level of "paste API key, get output" — is what separates builders who can debug, optimize, and scale their applications from those who are permanently confused when things go wrong. This module covers everything from HTTP basics to the full landscape of AI API providers.

What is an API?

An Application Programming Interface is a defined contract between two software systems. One system (the client) makes requests in a specified format; the other system (the server) processes those requests and returns responses in a specified format. The contract defines what requests are valid, what parameters they accept, and what responses look like. Everything outside that contract is the server's internal business — the client doesn't need to know how the computation is performed, only that it returns the right result.

Think of an API as a restaurant menu. The menu defines exactly what you can order and what you will receive. You don't need to know how the kitchen works to order effectively. The chef doesn't need to explain their techniques — they just need to deliver what the menu promises. If you order outside the menu (an invalid API request), you get rejected, not a creative improvisation.

AI APIs follow this same pattern but with a specific focus: they expose the capabilities of large machine learning models over the network, so you can use billion-parameter models without running them yourself. The model computation happens on the provider's servers — you simply send your input and receive the model's output.

How HTTP and REST work

HTTP (HyperText Transfer Protocol) is the language of the web and the foundation of nearly every AI API. When you make an API call, you're sending an HTTP message from your program to a server somewhere. HTTP defines how messages are structured and what verbs describe the type of operation you're performing.

The core HTTP verbs are: GET (retrieve a resource), POST (create or process something), PUT (replace a resource), PATCH (partially update a resource), and DELETE (remove a resource). AI API calls are almost always POST requests — you're sending data to be processed and receiving a result.

REST (Representational State Transfer) is a set of architectural conventions layered on top of HTTP. RESTful APIs use URLs to identify resources, HTTP verbs to describe operations, and HTTP status codes to indicate outcomes. For example, a REST AI API might have a URL like https://api.openai.com/v1/chat/completions — the URL identifies what you're requesting (a chat completion), and POST is the verb used to submit the request body.

The Request-Response Model

Every API interaction follows the same cycle: your program builds a request (method + URL + headers + body), sends it to the server, and waits. The server processes the request and sends back a response (status code + headers + body). Your program reads the response and acts on it. Everything in AI API development is a variation of this cycle.

The anatomy of an HTTP request

An HTTP request has four components that matter for AI API work:

1. The URL (endpoint): Where to send the request. For OpenAI's chat API, this is https://api.openai.com/v1/chat/completions. The URL encodes which service and which specific capability you're calling.

2. The HTTP method: POST for AI completions. This tells the server what kind of operation you're performing.

3. Headers: Metadata attached to the request. For AI APIs, the two most important headers are Authorization (carrying your API key) and Content-Type: application/json (telling the server your body is JSON).

4. The body: The actual payload. For AI APIs, this is a JSON object describing your request — the model to use, the messages, and any configuration parameters.

JSON as the universal language

JSON (JavaScript Object Notation) is the format that virtually every AI API uses for both request bodies and response bodies. It's a human-readable text format that represents structured data as key-value pairs and arrays. You don't need to know JavaScript to use JSON — it's just a data format.

JSON has six data types: strings (text in double quotes), numbers, booleans (true/false), null, objects (key-value pairs in curly braces), and arrays (ordered lists in square brackets). AI API requests and responses are deeply nested JSON objects that combine all of these types.

A minimal OpenAI chat completion request body looks like this:

{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Explain what an API is in one sentence." } ], "temperature": 0.7, "max_tokens": 150 }

Every key is a string. The value of "messages" is an array of objects. The value of "temperature" is a number. This structure is consistent across all providers, though the specific keys differ. Once you understand JSON deeply, learning any AI provider's API becomes much faster.

Authentication patterns

AI APIs need to know who is making requests for two reasons: billing (they need to charge you) and rate limiting (they need to enforce fair use). Authentication is how they verify your identity. There are several patterns used across the AI API landscape.

API Keys

The most common pattern. You register for a provider, they issue you a long random string (the API key), and you include it in every request. The server looks up the key in its database to identify you. API keys are typically sent in the Authorization header:

Authorization: Bearer sk-proj-abc123yourApiKeyHere...

The word "Bearer" is part of the HTTP Bearer Token standard — it tells the server that what follows is a credential that "bears" authority. OpenAI, Anthropic, Google Gemini, and most other AI providers use this pattern.

OAuth 2.0

A more complex protocol used when applications need to act on behalf of users, not just themselves. OAuth involves a multi-step flow where users grant your application specific permissions. For simple AI API integrations where you're calling the API with your own credentials (not on behalf of end users), you don't need OAuth — API keys suffice.

Never expose API keys in client-side code

API keys are secrets. Anyone who has your API key can make requests billed to your account. Never put API keys in browser-side JavaScript, mobile app binaries, or public repositories. We'll cover secure key management in depth in the next module, but the rule is simple: API keys belong on your server, in environment variables, not anywhere a user could extract them.

Rate limits and quotas

AI providers enforce limits on how much you can use their APIs, for two practical reasons: protecting their infrastructure from abuse and ensuring fair access across customers. Understanding rate limits is essential for building reliable applications.

There are typically two types of limits. Rate limits cap how many requests you can make per unit of time (requests per minute, tokens per minute). If you hit them, the server returns a 429 error and you must wait before retrying. Quotas cap your total usage per billing period — you might have a monthly token limit before you need to upgrade your plan.

Rate limits are almost always expressed in two dimensions simultaneously: Requests Per Minute (RPM) and Tokens Per Minute (TPM). Even if you're well under your RPM limit, if you're sending very long prompts and receiving very long responses, you might hit your TPM limit. Both matter.

Provider TierTypical RPMTypical TPMBehavior at Limit
Free / Starter3–60 RPM40K–60K TPMHTTP 429, retry-after header
Pay-as-you-go Tier 1500–3,500 RPM90K–200K TPMHTTP 429, exponential backoff
Higher Tiers5,000–10,000 RPM800K–2M+ TPMSame, limits negotiable

HTTP status codes and error handling

HTTP status codes are 3-digit numbers that summarize the outcome of your request. Understanding them is crucial for handling errors gracefully in production applications.

The major categories: 2xx (success — your request worked), 4xx (client error — something wrong with your request), 5xx (server error — the provider's system had a problem).

200 OK
Your request was successful. The response body contains the model's output. This is the happy path.
400 Bad Request
Your request was malformed — missing a required field, invalid parameter value, or malformed JSON. Fix your request before retrying.
401 Unauthorized
Your API key is missing, invalid, or expired. Check your authentication headers. Don't retry automatically — this won't fix itself.
429 Too Many Requests
You've hit a rate limit. The response usually includes a Retry-After header. Use exponential backoff — wait, then retry with increasing delays.
500 / 503 Server Error
The provider's infrastructure has a problem. These are usually transient. Implement retry logic with backoff for 5xx errors.

The AI API landscape

Multiple providers offer powerful language model APIs, each with different strengths, pricing models, and capabilities. Understanding the landscape helps you make informed decisions about which provider to use for which use case — and the multi-provider strategies we'll cover in Module 10 become much more tractable when you understand what each provider offers.

OpenAI

The pioneering provider and market leader. OpenAI's API powers GPT-4o, o1, and o3 models. Their API design has become the de facto standard that most other providers have adopted or are compatible with. Strong ecosystem, excellent documentation, and the largest community means the most available examples and tooling. Most expensive among the major providers at the frontier tier.

Anthropic

Maker of the Claude model family. Claude models are known for strong reasoning, long context windows (up to 200K tokens), and following complex instructions carefully. Anthropic offers the Messages API, which has a slightly different structure from OpenAI's but covers the same functionality. Strong emphasis on safety and predictable behavior makes it popular for enterprise applications.

Google Gemini

Google's AI API offering, based on the Gemini model family. Gemini 1.5 Pro offers extremely long context windows (up to 1M tokens) and strong multimodal capabilities (text, image, video, audio). Available through Google AI Studio (simpler) or Vertex AI (enterprise-grade). Competitive pricing especially for output tokens.

Open-source alternatives via Ollama and Together

Not every use case requires paying a cloud provider. Open-source models — Llama 3, Mistral, Qwen, Phi-3 — have closed much of the capability gap with frontier commercial models and can be run several ways:

  • Ollama: Runs models locally on your machine. Free after hardware cost, great for development and privacy-sensitive use cases. Point it at localhost and use the OpenAI-compatible API.
  • Together AI: Hosts open-source models in the cloud with an API. Often cheaper than OpenAI for models of similar capability tier. OpenAI-compatible endpoint makes switching easy.
  • Groq: Custom LPU hardware delivers extremely fast inference for open-source models. Great for latency-sensitive applications.
Choosing a starting provider

For learning and prototyping, start with OpenAI or Anthropic — their documentation is excellent and the community support is largest. Once you understand the patterns, migrating to alternative providers or open-source models is straightforward. This course focuses on concepts that apply across providers, using OpenAI examples for familiarity.

The request-response cycle in detail

Let's trace what actually happens when you make an AI API call, step by step. This mental model will serve you whenever you're debugging unexpected behavior.

Step 1: Your code builds a request. It assembles the URL, headers (including your API key), and a JSON body describing your parameters — model name, messages, temperature, max tokens.

Step 2: Your HTTP client sends the request. Under the hood, it opens a TCP connection to the server, sends the HTTP request over TLS (encrypted), and waits for a response.

Step 3: The API server authenticates you. It checks your API key against its database, verifies you're within your rate limits, and validates the request body against its schema.

Step 4: The model generates a response. The server routes your request to the model inference infrastructure. The model processes your input and generates output tokens one at a time. For non-streaming requests, the server accumulates all tokens before responding.

Step 5: The server sends the response. A JSON object arrives with the completion text, token usage counts, finish reason, and other metadata.

Step 6: Your code parses and uses the response. You extract the text from the nested JSON, check the finish reason, and use the output in your application.

Latency characteristics

AI API calls are slow compared to most web API calls — typically 1-10 seconds for non-streaming completions, depending on output length. This is dominated by model inference time, not network latency. Plan your UX accordingly: loading states, streaming (Module 6), and async processing are all important tools for managing this inherent latency.

Versioning and API stability

AI APIs evolve rapidly. Providers deprecate old models, introduce new ones, and occasionally change API schemas. The "v1" in most AI API URLs (e.g., /v1/chat/completions) is a versioning mechanism — it promises that requests to this version of the API will continue to work even as the provider introduces changes in future versions.

However, model behavior is not as stable as the API schema. Even with the same model name and parameters, model responses can change when providers update a model. Build your applications to handle variation in AI outputs rather than expecting deterministic consistency. Test regularly against production prompts and monitor output quality over time.

Best practices for API versioning: pin specific model versions where output consistency matters (e.g., gpt-4o-2024-08-06 rather than just gpt-4o), read provider changelogs and deprecation notices, and build your integration layer to make model switching easy.

API keys across providers: a practical overview

Each provider issues API keys through their developer portal. Here's how to get started with the main providers:

  • OpenAI: platform.openai.com → API keys section. Keys start with sk-
  • Anthropic: console.anthropic.com → API keys. Keys start with sk-ant-
  • Google Gemini: aistudio.google.com → "Get API key". For Vertex AI, use service account credentials.
  • Together AI: api.together.xyz → Settings → API Keys
  • Groq: console.groq.com → API Keys

You can and should get API keys from multiple providers. As you'll see in Module 10, a multi-provider strategy offers resilience, cost optimization, and model capability diversity. Getting keys now (most providers have free tiers) lets you follow along with examples across the course.

Mental model

Think of AI API providers as different cloud services for compute. Just as you might use AWS for some infrastructure and GCP for others based on pricing and features, you'll likely use different AI providers for different tasks. The good news: the underlying protocol (HTTP + JSON) and the conceptual model (request messages, receive completion) are the same across providers. Learning one deeply makes all others immediately accessible.

What you need to know before module 2

With the foundation in place, here's the conceptual checklist you should feel confident about:

  • An API is a contract between client and server, defining valid requests and responses
  • AI APIs use HTTP POST requests with JSON bodies to send prompts and receive completions
  • Authentication is handled via API keys sent in the Authorization header as Bearer tokens
  • Rate limits (RPM and TPM) are enforced by all providers — 429 errors require backoff and retry
  • HTTP status codes indicate what went wrong: 4xx = your fault, 5xx = their fault
  • The major providers are OpenAI, Anthropic, Google Gemini, and open-source options via Ollama/Together
Up next: writing your first real API call

In Module 2, you'll go from theory to working code. You'll set up your Python environment, securely manage your API key, and make your first real completion request — seeing exactly what the response object looks like and how to use it.