Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction to LLM APIs Lab

University of Central Florida
Arete Capital Partners

CAP-6640: Computational Understanding of Natural Language

Spencer Lyon

Prerequisites

Outcomes

References


Setup & Warm-Up

Let’s make sure everyone is connected to the course proxy and can reach all three models. We’ll reuse the get_model helper from Part 03.

import os
from dotenv import load_dotenv
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider

# Load API key from .env file
load_dotenv()

PROXY_URL = "https://litellm.6640.ucf.spencerlyon.com"

def get_model(model_name: str) -> OpenAIChatModel:
    """Create a model connection through our LiteLLM proxy."""
    return OpenAIChatModel(
        model_name,
        provider=OpenAIProvider(
            base_url=PROXY_URL,
            api_key=os.environ["CAP6640_API_KEY"],
        ),
    )

Generation Parameters

In Part 03, we used models with their default settings. But LLMs have several generation parameters that control how they produce text. The two most important are temperature and max_tokens.

Temperature: The Creativity Dial

Temperature controls the randomness of the model’s output. It adjusts the probability distribution over possible next tokens:

Let’s see this in action:

from pydantic_ai.settings import ModelSettings

prompt = "Write a one-sentence description of a haunted house."

for temp in [0.0, 0.7, 1.0]:
    agent = Agent(
        get_model("claude-haiku-4-5"),
        instructions="You are a creative writer.",
        model_settings=ModelSettings(temperature=temp),
    )
    print(f"--- Temperature {temp} ---")
    # Run the same prompt 3 times to see variation (or lack thereof)
    for i in range(3):
        result = await agent.run(prompt)
        print(f"  Run {i+1}: {result.output}")
    print()
--- Temperature 0.0 ---
  Run 1: A decaying Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished whispers echoing through empty halls where shadows move of their own accord.
  Run 2: A decaying Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished whispers echoing through empty halls where shadows move of their own accord.
  Run 3: A decaying Victorian mansion stands shrouded in perpetual fog, its windows glowing with spectral light as the anguished whispers of its former inhabitants echo through corridors where time itself seems to have stopped.

--- Temperature 0.7 ---
  Run 1: A decrepit Victorian mansion looms against the storm clouds, its broken windows like hollow eyes watching the living, while the anguished whispers of its tormented past echo through halls where time itself seems to have frozen in eternal darkness.
  Run 2: A decrepit Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished whispers echoing through empty halls where shadows move of their own accord.
  Run 3: A crumbling Victorian mansion stands shrouded in perpetual fog, its windows glowing with an eerie light as the anguished whispers of its former inhabitants echo through corridors where time itself seems to have stopped.

--- Temperature 1.0 ---
  Run 1: A Victorian mansion shrouded in perpetual mist harbors the restless spirits of its tragic past, their anguished whispers echoing through shadowed halls where time itself seems to have frozen in the moment of their doom.
  Run 2: A crumbling Victorian mansion stands shrouded in perpetual fog, its broken windows watching like hollow eyes as the tormented spirits of its former residents wander endless halls, forever trapped between the living world and whatever darkness claims them.
  Run 3: A decrepit Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished wails echoing through halls where time itself seems to have frozen in despair.

At temperature 0.0, all three runs should produce nearly identical output. At 1.0, each run will be noticeably different.

Max Tokens: The Length Cap

The max_tokens parameter sets an upper limit on how many tokens the model can generate in its response. This is useful for:

prompt = "Explain the transformer architecture."

for max_tok in [20, 50, 200]:
    agent = Agent(
        get_model("claude-haiku-4-5"),
        instructions="You are an NLP instructor.",
        model_settings=ModelSettings(max_tokens=max_tok),
    )
    result = await agent.run(prompt)
    usage = result.usage()
    print(f"--- max_tokens={max_tok} ---")
    print(f"  Output ({usage.output_tokens} tokens): {result.output}")
    print()
--- max_tokens=20 ---
  Output (20 tokens): # The Transformer Architecture

The Transformer is a deep learning model introduced in "Attention is

--- max_tokens=50 ---
  Output (50 tokens): # The Transformer Architecture

The Transformer is a neural network architecture designed for sequence-to-sequence tasks. Here's a comprehensive breakdown:

## Core Components

### 1. **Self-Attention Mechanism**
The heart of

--- max_tokens=200 ---
  Output (200 tokens): # The Transformer Architecture

## Overview
The Transformer is a deep learning model that revolutionized NLP by replacing recurrence with **self-attention**, enabling parallel processing and capturing long-range dependencies efficiently.

---

## Core Components

### 1. **Self-Attention Mechanism**
The heart of transformers—allows each token to attend to every other token in a sequence.

**How it works:**
- Input embeddings are projected into three matrices: **Query (Q)**, **Key (K)**, **Value (V)**
- Attention scores are calculated: `Attention(Q,K,V) = softmax(QK^T/√d_k)V`
- This computes how much each position should "focus on" other positions

**Key insight:** Tokens can directly access any other token, unlike RNNs that process sequentially.

### 2. **Multi-Head

Notice that max_tokens=20 will cut the response off mid-sentence — the model doesn’t know in advance how many tokens it gets. It’s a hard ceiling, not a soft suggestion.


The Model Showdown

Now let’s put everything together. In this mini-project, you’ll pick a piece of text and run three different NLP tasks across all three course models — building a comparison table of results.

Here’s a sample text to work with (or bring your own):

sample_text = """
OpenAI announced GPT-5.4 at their Spring 2026 developer conference in San Francisco.
The model achieves state-of-the-art performance on reasoning benchmarks, surpassing
its predecessor by 15% on the MMLU-Pro evaluation suite. Critics argue that the
environmental cost of training such large models remains a significant concern, while
supporters point to breakthroughs in scientific research enabled by the technology.
CEO Sam Altman described it as "the most capable model we've ever built."
"""

Let’s define three NLP tasks and run them across all models:

from pydantic import BaseModel, Field


# --- Task 1: Summarization ---
summarize_agent = {
    name: Agent(
        get_model(model_id),
        instructions="Summarize the given text in exactly one sentence.",
    )
    for name, model_id in [
        ("GPT-5.4", "gpt-5.4"),
        ("Sonnet 4.6", "claude-sonnet-4-6"),
        ("Haiku 4.5", "claude-haiku-4-5"),
    ]
}

print("=== Task 1: One-Sentence Summary ===\n")
for name, agent in summarize_agent.items():
    result = await agent.run(sample_text)
    print(f"  {name}: {result.output}\n")
=== Task 1: One-Sentence Summary ===

  GPT-5.4: OpenAI announced GPT-5.4 at its Spring 2026 developer conference in San Francisco, touting state-of-the-art reasoning performance and a 15% MMLU-Pro improvement over its predecessor, while critics raised environmental concerns and supporters highlighted scientific breakthroughs.

  Sonnet 4.6: OpenAI unveiled GPT-5.4 at its Spring 2026 developer conference, touting it as their most capable model yet with a 15% improvement on reasoning benchmarks, though environmental concerns over large-scale AI training persist.

  Haiku 4.5: OpenAI announced GPT-5.4 at their Spring 2026 conference, achieving state-of-the-art reasoning performance with a 15% improvement over its predecessor, though concerns about environmental costs persist alongside recognition of its scientific research benefits.

# --- Task 2: Entity Extraction (structured output) ---
class ExtractedEntities(BaseModel):
    people: list[str] = Field(description="Names of people mentioned")
    organizations: list[str] = Field(description="Names of organizations mentioned")
    locations: list[str] = Field(description="Names of locations mentioned")

print("=== Task 2: Entity Extraction ===\n")
for name, model_id in [("GPT-5.4", "gpt-5.4"), ("Sonnet 4.6", "claude-sonnet-4-6"), ("Haiku 4.5", "claude-haiku-4-5")]:
    agent = Agent(
        get_model(model_id),
        output_type=ExtractedEntities,
        instructions="Extract named entities from the given text.",
    )
    result = await agent.run(sample_text)
    ents = result.output
    print(f"  {name}:")
    print(f"    People:        {ents.people}")
    print(f"    Organizations: {ents.organizations}")
    print(f"    Locations:     {ents.locations}\n")
=== Task 2: Entity Extraction ===

  GPT-5.4:
    People:        ['Sam Altman']
    Organizations: ['OpenAI']
    Locations:     ['San Francisco']

  Sonnet 4.6:
    People:        ['Sam Altman']
    Organizations: ['OpenAI']
    Locations:     ['San Francisco']

  Haiku 4.5:
    People:        ['Sam Altman']
    Organizations: ['OpenAI']
    Locations:     ['San Francisco']

# --- Task 3: Sentiment Classification (structured output) ---
class SentimentResult(BaseModel):
    sentiment: str = Field(description="positive, negative, or mixed")
    confidence: float = Field(ge=0, le=1, description="Confidence score")
    reasoning: str = Field(description="One-sentence explanation")

print("=== Task 3: Sentiment Classification ===\n")
for name, model_id in [("GPT-5.4", "gpt-5.4"), ("Sonnet 4.6", "claude-sonnet-4-6"), ("Haiku 4.5", "claude-haiku-4-5")]:
    agent = Agent(
        get_model(model_id),
        output_type=SentimentResult,
        instructions="Classify the overall sentiment of the given text.",
    )
    result = await agent.run(sample_text)
    s = result.output
    usage = result.usage()
    print(f"  {name}: {s.sentiment} (confidence: {s.confidence})")
    print(f"    Reasoning: {s.reasoning}")
    print(f"    [tokens: {usage.input_tokens} in / {usage.output_tokens} out]\n")
=== Task 3: Sentiment Classification ===

  GPT-5.4: mixed (confidence: 0.94)
    Reasoning: The passage presents strong positive achievements and praise for the model alongside notable criticism about environmental costs, resulting in an overall mixed sentiment.
    [tokens: 281 in / 54 out]

  Sonnet 4.6: mixed (confidence: 0.92)
    Reasoning: The text presents both positive elements (state-of-the-art performance, scientific breakthroughs, CEO praise) and negative concerns (environmental cost of training large models), resulting in a balanced mixed sentiment.
    [tokens: 852 in / 115 out]

  Haiku 4.5: mixed (confidence: 0.75)
    Reasoning: The text presents both positive elements (performance achievements, scientific breakthroughs, CEO praise) and negative concerns (environmental cost criticisms), creating a balanced mixed sentiment overall.
    [tokens: 851 in / 107 out]


Reflection


Wrap-Up

Key Takeaways

What’s Next

In Week 9, we move from basic API calls to mastering them. Part 01 covers prompt engineering — systematic techniques like few-shot prompting and chain-of-thought reasoning that dramatically improve output quality. Part 02 dives deep into structured outputs and function calling, building full data extraction pipelines. And the lab puts it all together in an API power workshop.