Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Working with LLM APIs

University of Central Florida
Arete Capital Partners

CAP-6640: Computational Understanding of Natural Language

Spencer Lyon

Prerequisites

Outcomes

References


Why APIs?

In Part 02, we explored how foundation models are customized through fine-tuning and alignment. But here’s a practical question: how do you actually use these models?

You can’t download GPT-5.4 — it’s a closed model with hundreds of billions of parameters running on OpenAI’s infrastructure. Same for Claude Opus 4.6 and Gemini 3.1 Pro. The only way to interact with these frontier models is through an API (Application Programming Interface): you send a request over the internet, the provider runs inference on their hardware, and you get a response back.

This turns out to be incredibly powerful. Instead of needing a GPU cluster to run a model, you need a few lines of Python and an API key. The trade-off is clear: you give up control over the model in exchange for instant access to the most capable systems in the world.

But there’s a catch — every provider has its own SDK, its own authentication scheme, its own request format. OpenAI uses one library, Anthropic another, Google yet another. In this lecture, we’ll solve that problem with two tools: LiteLLM as a unified API gateway and PydanticAI as our type-safe Python framework for talking to any model through a single interface.


The API Gateway Pattern

The Problem: SDK Sprawl

If you wanted to compare responses from GPT-5.4, Claude Sonnet 4.6, and Claude Haiku 4.5, you’d normally need to:

  1. Install three different Python packages (openai, anthropic, google-genai)

  2. Manage three different API keys

  3. Learn three different request/response formats

  4. Handle three different error types

That’s a lot of friction just to ask a question.

The Solution: LiteLLM Proxy

LiteLLM is an API gateway that sits between your code and the LLM providers. It exposes a single, OpenAI-compatible endpoint — meaning any tool that can talk to OpenAI can automatically talk to Claude, Gemini, or dozens of other providers. The proxy handles the translation.

The API gateway pattern: your code talks to one endpoint, and the gateway routes requests to the appropriate provider. Students authenticate with personal API keys; the gateway manages the actual provider credentials.

Figure 1:The API gateway pattern: your code talks to one endpoint, and the gateway routes requests to the appropriate provider. Students authenticate with personal API keys; the gateway manages the actual provider credentials.

For this course, we’ve set up a shared LiteLLM proxy with three models available:

Model NameProviderCapability Tier
gpt-5.4OpenAIFrontier
claude-sonnet-4-6AnthropicFrontier
claude-haiku-4-5AnthropicFast & affordable

Each of you has a personal API key that gives you access to all three models through a single URL. The proxy tracks your usage and enforces per-student budgets — so experiment freely, but be mindful of cost.


PydanticAI: Your LLM Framework

What Is PydanticAI?

PydanticAI is a Python framework for building LLM-powered applications, created by the team behind Pydantic (the data validation library you may know from FastAPI). Its key selling points:

Think of it as the “FastAPI of LLM development” — it handles the boilerplate so you can focus on what matters.

Connecting to Our Proxy

Let’s set up our connection. You’ll need your API key — create a file called .env in the project root (or the same directory as your notebook) with:

CAP6640_API_KEY="sk-your-personal-key-here"

The setup code below uses python-dotenv to load this file automatically, so you don’t need to set environment variables manually.

import os
from dotenv import load_dotenv
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider

# Load API key from .env file
load_dotenv()

# Course LiteLLM proxy — one URL for all models
PROXY_URL = "https://litellm.6640.ucf.spencerlyon.com"

def get_model(model_name: str) -> OpenAIChatModel:
    """Create a model connection through our LiteLLM proxy."""
    return OpenAIChatModel(
        model_name,
        provider=OpenAIProvider(
            base_url=PROXY_URL,
            api_key=os.environ["CAP6640_API_KEY"],
        ),
    )

That’s the entire setup. The get_model function creates a connection to any model available on our proxy. Let’s use it.

Your First API Call

# Create an agent with GPT-5.4
agent = Agent(
    get_model("gpt-5.4"),
    instructions="You are a concise NLP tutor. Answer in 2-3 sentences.",
)

result = await agent.run("What is tokenization in NLP?")
print(result.output)
Tokenization is the process of splitting text into smaller units called tokens, such as words, subwords, or characters. It helps NLP models turn raw text into pieces they can analyze, count, or convert into numerical representations.

Let’s unpack what happened:

  1. Agent is the core PydanticAI abstraction — it wraps a model with configuration (like a system prompt)

  2. instructions provides directives that shape the model’s behavior (PydanticAI’s recommended alternative to system_promptinstructions are excluded from message history between runs, while system_prompt is preserved)

  3. await agent.run(...) sends the user message to the model and waits for a response. We use await because PydanticAI’s API is asynchronous (see note below).

  4. result.output contains the model’s text response

Understanding Token Usage

Every API call consumes tokens — and tokens cost money. Let’s inspect the usage:

print(f"Input tokens:  {result.usage().input_tokens}")
print(f"Output tokens: {result.usage().output_tokens}")
print(f"Total tokens:  {result.usage().total_tokens}")
Input tokens:  32
Output tokens: 49
Total tokens:  81

Why does this matter? LLM providers charge per token, with output tokens typically costing 3-5x more than input tokens. A rough mental model:

The cost difference between model tiers is significant — which is why choosing the right model for your task matters.


Comparing Models

One of the most valuable things you can do with API access is compare models side by side. Different models have different strengths: some are more concise, some more creative, some faster, some cheaper.

Let’s ask all three models the same question and compare:

models = {
    "GPT-5.4": get_model("gpt-5.4"),
    "Claude Sonnet 4.6": get_model("claude-sonnet-4-6"),
    "Claude Haiku 4.5": get_model("claude-haiku-4-5"),
}

prompt = "Explain the difference between stemming and lemmatization in exactly 3 sentences."

for name, model in models.items():
    agent = Agent(model, instructions="You are a concise NLP instructor.")
    result = await agent.run(prompt)
    usage = result.usage()
    print(f"--- {name} ---")
    print(result.output)
    print(f"  [tokens: {usage.input_tokens} in / {usage.output_tokens} out]\n")
--- GPT-5.4 ---
Stemming reduces words to a crude base form by chopping off prefixes or suffixes, often without ensuring the result is a real word.  
Lemmatization reduces words to their dictionary base form, using vocabulary and often part-of-speech information to return valid words.  
For example, stemming might turn “studies” into “studi,” while lemmatization turns it into “study.”
  [tokens: 32 in / 84 out]

--- Claude Sonnet 4.6 ---
Stemming is a rule-based process that strips suffixes from words to reduce them to a root form, often producing non-real words (e.g., "running" → "runn"). Lemmatization, by contrast, uses vocabulary and morphological analysis to return a word to its true dictionary base form, called a lemma (e.g., "running" → "run"). While stemming is faster and simpler, lemmatization is more accurate and linguistically meaningful, making it preferable when precision matters.
  [tokens: 35 in / 114 out]

--- Claude Haiku 4.5 ---
**Stemming** removes suffixes from words using rule-based algorithms to reduce them to a root form, which may not be a valid word (e.g., "running" → "runn"). **Lemmatization** uses linguistic knowledge and vocabulary to convert words to their canonical dictionary form, ensuring the result is a real word (e.g., "running" → "run"). Lemmatization is more accurate but computationally expensive, while stemming is faster but produces less precise results.
  [tokens: 34 in / 109 out]

What to Look For

When comparing models, pay attention to:

Choosing the Right Model

There’s no single “best” model — it depends on your task:

Use CaseRecommended ModelWhy
Complex reasoning, analysisGPT-5.4 or Claude Sonnet 4.6Maximum capability
Simple classification, extractionClaude Haiku 4.5Fast and cheap
Creative writingExperiment!Style varies by model
High-volume processingClaude Haiku 4.5Cost-effective at scale

The general principle: use the cheapest model that meets your quality bar. Start with a fast model, evaluate its output, and only upgrade if needed.


A Taste of Structured Output

So far, our models have returned free-form text. That’s fine for conversation, but what if you need the output in a specific format — say, a Python dictionary or a JSON object?

Parsing free text is fragile. What if the model adds extra words? What if the format changes slightly between calls? This is where PydanticAI’s structured output shines.

The Idea

Instead of getting a string back, you define a Pydantic model describing the shape of the output you want. PydanticAI sends the schema to the LLM, validates the response, and returns a proper Python object — with type checking and all.

from pydantic import BaseModel, Field


class SentimentResult(BaseModel):
    """Structured output for sentiment analysis."""
    text: str = Field(description="The original text that was analyzed")
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(ge=0, le=1, description="Confidence score between 0 and 1")
    reasoning: str = Field(description="Brief explanation of the sentiment judgment")


agent = Agent(
    get_model("claude-sonnet-4-6"),
    output_type=SentimentResult,
    instructions="Analyze the sentiment of the given text.",
)

result = await agent.run("The new spaCy update is incredibly fast but the documentation is lacking.")
print(f"Sentiment:  {result.output.sentiment}")
print(f"Confidence: {result.output.confidence}")
print(f"Reasoning:  {result.output.reasoning}")
Sentiment:  neutral
Confidence: 0.85
Reasoning:  The text contains both a strong positive sentiment ("incredibly fast") and a negative sentiment ("documentation is lacking"). These opposing sentiments balance each other out, resulting in an overall neutral sentiment. The use of "but" explicitly signals a contrast between the praise and the criticism.

Notice what happened: result.output is not a string — it’s a SentimentResult object with typed fields. PydanticAI handled the schema conversion, the API call, the response parsing, and the validation automatically.

This is a preview of what we’ll explore much more deeply in Week 9, where we’ll cover structured output extraction, function calling, and building full data pipelines with LLMs.


Wrap-Up

Key Takeaways

What’s Next

In Week 9, we’ll go much deeper into the art and science of working with LLMs. Part 01 covers prompt engineering — techniques like zero-shot and few-shot prompting, chain-of-thought reasoning, and system prompt design. Part 02 explores structured outputs and function calling in depth, building full data extraction pipelines. And in the lab, you’ll put it all together in an API power workshop.