Introduction to LLM APIs Lab
CAP-6640: Computational Understanding of Natural Language
Spencer Lyon
Prerequisites
L08.03: Working with LLM APIs — PydanticAI,
get_model()helper, LiteLLM proxy connection, basic API calls and structured output
Outcomes
Verify API connectivity and make calls to all three course models
Explore how generation parameters (temperature, max_tokens) affect model output
Build a mini text analysis tool that compares models on a real NLP task
Reason about model selection trade-offs based on empirical observation
References
Setup & Warm-Up¶
Let’s make sure everyone is connected to the course proxy and can reach all three models. We’ll reuse the get_model helper from Part 03.
import os
from dotenv import load_dotenv
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# Load API key from .env file
load_dotenv()
PROXY_URL = "https://litellm.6640.ucf.spencerlyon.com"
def get_model(model_name: str) -> OpenAIChatModel:
"""Create a model connection through our LiteLLM proxy."""
return OpenAIChatModel(
model_name,
provider=OpenAIProvider(
base_url=PROXY_URL,
api_key=os.environ["CAP6640_API_KEY"],
),
)Generation Parameters¶
In Part 03, we used models with their default settings. But LLMs have several generation parameters that control how they produce text. The two most important are temperature and max_tokens.
Temperature: The Creativity Dial¶
Temperature controls the randomness of the model’s output. It adjusts the probability distribution over possible next tokens:
Temperature 0.0: The model always picks the most likely token. Output is deterministic — run the same prompt twice, get the same answer. Best for factual tasks where you want consistency.
Temperature 0.5–0.7: A moderate amount of randomness. Good default for most tasks.
Temperature 1.0: Maximum randomness (the upper limit for most providers). More creative and varied output. Useful for brainstorming or creative writing.
Let’s see this in action:
from pydantic_ai.settings import ModelSettings
prompt = "Write a one-sentence description of a haunted house."
for temp in [0.0, 0.7, 1.0]:
agent = Agent(
get_model("claude-haiku-4-5"),
instructions="You are a creative writer.",
model_settings=ModelSettings(temperature=temp),
)
print(f"--- Temperature {temp} ---")
# Run the same prompt 3 times to see variation (or lack thereof)
for i in range(3):
result = await agent.run(prompt)
print(f" Run {i+1}: {result.output}")
print()--- Temperature 0.0 ---
Run 1: A decaying Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished whispers echoing through empty halls where shadows move of their own accord.
Run 2: A decaying Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished whispers echoing through empty halls where shadows move of their own accord.
Run 3: A decaying Victorian mansion stands shrouded in perpetual fog, its windows glowing with spectral light as the anguished whispers of its former inhabitants echo through corridors where time itself seems to have stopped.
--- Temperature 0.7 ---
Run 1: A decrepit Victorian mansion looms against the storm clouds, its broken windows like hollow eyes watching the living, while the anguished whispers of its tormented past echo through halls where time itself seems to have frozen in eternal darkness.
Run 2: A decrepit Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished whispers echoing through empty halls where shadows move of their own accord.
Run 3: A crumbling Victorian mansion stands shrouded in perpetual fog, its windows glowing with an eerie light as the anguished whispers of its former inhabitants echo through corridors where time itself seems to have stopped.
--- Temperature 1.0 ---
Run 1: A Victorian mansion shrouded in perpetual mist harbors the restless spirits of its tragic past, their anguished whispers echoing through shadowed halls where time itself seems to have frozen in the moment of their doom.
Run 2: A crumbling Victorian mansion stands shrouded in perpetual fog, its broken windows watching like hollow eyes as the tormented spirits of its former residents wander endless halls, forever trapped between the living world and whatever darkness claims them.
Run 3: A decrepit Victorian mansion shrouded in perpetual fog harbors the restless spirits of its tragic past, their anguished wails echoing through halls where time itself seems to have frozen in despair.
At temperature 0.0, all three runs should produce nearly identical output. At 1.0, each run will be noticeably different.
Max Tokens: The Length Cap¶
The max_tokens parameter sets an upper limit on how many tokens the model can generate in its response. This is useful for:
Controlling costs: Fewer output tokens = lower cost
Enforcing brevity: Force the model to be concise
Preventing runaway generation: Stop the model from producing a novel when you wanted a sentence
prompt = "Explain the transformer architecture."
for max_tok in [20, 50, 200]:
agent = Agent(
get_model("claude-haiku-4-5"),
instructions="You are an NLP instructor.",
model_settings=ModelSettings(max_tokens=max_tok),
)
result = await agent.run(prompt)
usage = result.usage()
print(f"--- max_tokens={max_tok} ---")
print(f" Output ({usage.output_tokens} tokens): {result.output}")
print()--- max_tokens=20 ---
Output (20 tokens): # The Transformer Architecture
The Transformer is a deep learning model introduced in "Attention is
--- max_tokens=50 ---
Output (50 tokens): # The Transformer Architecture
The Transformer is a neural network architecture designed for sequence-to-sequence tasks. Here's a comprehensive breakdown:
## Core Components
### 1. **Self-Attention Mechanism**
The heart of
--- max_tokens=200 ---
Output (200 tokens): # The Transformer Architecture
## Overview
The Transformer is a deep learning model that revolutionized NLP by replacing recurrence with **self-attention**, enabling parallel processing and capturing long-range dependencies efficiently.
---
## Core Components
### 1. **Self-Attention Mechanism**
The heart of transformers—allows each token to attend to every other token in a sequence.
**How it works:**
- Input embeddings are projected into three matrices: **Query (Q)**, **Key (K)**, **Value (V)**
- Attention scores are calculated: `Attention(Q,K,V) = softmax(QK^T/√d_k)V`
- This computes how much each position should "focus on" other positions
**Key insight:** Tokens can directly access any other token, unlike RNNs that process sequentially.
### 2. **Multi-Head
Notice that max_tokens=20 will cut the response off mid-sentence — the model doesn’t know in advance how many tokens it gets. It’s a hard ceiling, not a soft suggestion.
The Model Showdown¶
Now let’s put everything together. In this mini-project, you’ll pick a piece of text and run three different NLP tasks across all three course models — building a comparison table of results.
Here’s a sample text to work with (or bring your own):
sample_text = """
OpenAI announced GPT-5.4 at their Spring 2026 developer conference in San Francisco.
The model achieves state-of-the-art performance on reasoning benchmarks, surpassing
its predecessor by 15% on the MMLU-Pro evaluation suite. Critics argue that the
environmental cost of training such large models remains a significant concern, while
supporters point to breakthroughs in scientific research enabled by the technology.
CEO Sam Altman described it as "the most capable model we've ever built."
"""Let’s define three NLP tasks and run them across all models:
from pydantic import BaseModel, Field
# --- Task 1: Summarization ---
summarize_agent = {
name: Agent(
get_model(model_id),
instructions="Summarize the given text in exactly one sentence.",
)
for name, model_id in [
("GPT-5.4", "gpt-5.4"),
("Sonnet 4.6", "claude-sonnet-4-6"),
("Haiku 4.5", "claude-haiku-4-5"),
]
}
print("=== Task 1: One-Sentence Summary ===\n")
for name, agent in summarize_agent.items():
result = await agent.run(sample_text)
print(f" {name}: {result.output}\n")=== Task 1: One-Sentence Summary ===
GPT-5.4: OpenAI announced GPT-5.4 at its Spring 2026 developer conference in San Francisco, touting state-of-the-art reasoning performance and a 15% MMLU-Pro improvement over its predecessor, while critics raised environmental concerns and supporters highlighted scientific breakthroughs.
Sonnet 4.6: OpenAI unveiled GPT-5.4 at its Spring 2026 developer conference, touting it as their most capable model yet with a 15% improvement on reasoning benchmarks, though environmental concerns over large-scale AI training persist.
Haiku 4.5: OpenAI announced GPT-5.4 at their Spring 2026 conference, achieving state-of-the-art reasoning performance with a 15% improvement over its predecessor, though concerns about environmental costs persist alongside recognition of its scientific research benefits.
# --- Task 2: Entity Extraction (structured output) ---
class ExtractedEntities(BaseModel):
people: list[str] = Field(description="Names of people mentioned")
organizations: list[str] = Field(description="Names of organizations mentioned")
locations: list[str] = Field(description="Names of locations mentioned")
print("=== Task 2: Entity Extraction ===\n")
for name, model_id in [("GPT-5.4", "gpt-5.4"), ("Sonnet 4.6", "claude-sonnet-4-6"), ("Haiku 4.5", "claude-haiku-4-5")]:
agent = Agent(
get_model(model_id),
output_type=ExtractedEntities,
instructions="Extract named entities from the given text.",
)
result = await agent.run(sample_text)
ents = result.output
print(f" {name}:")
print(f" People: {ents.people}")
print(f" Organizations: {ents.organizations}")
print(f" Locations: {ents.locations}\n")=== Task 2: Entity Extraction ===
GPT-5.4:
People: ['Sam Altman']
Organizations: ['OpenAI']
Locations: ['San Francisco']
Sonnet 4.6:
People: ['Sam Altman']
Organizations: ['OpenAI']
Locations: ['San Francisco']
Haiku 4.5:
People: ['Sam Altman']
Organizations: ['OpenAI']
Locations: ['San Francisco']
# --- Task 3: Sentiment Classification (structured output) ---
class SentimentResult(BaseModel):
sentiment: str = Field(description="positive, negative, or mixed")
confidence: float = Field(ge=0, le=1, description="Confidence score")
reasoning: str = Field(description="One-sentence explanation")
print("=== Task 3: Sentiment Classification ===\n")
for name, model_id in [("GPT-5.4", "gpt-5.4"), ("Sonnet 4.6", "claude-sonnet-4-6"), ("Haiku 4.5", "claude-haiku-4-5")]:
agent = Agent(
get_model(model_id),
output_type=SentimentResult,
instructions="Classify the overall sentiment of the given text.",
)
result = await agent.run(sample_text)
s = result.output
usage = result.usage()
print(f" {name}: {s.sentiment} (confidence: {s.confidence})")
print(f" Reasoning: {s.reasoning}")
print(f" [tokens: {usage.input_tokens} in / {usage.output_tokens} out]\n")=== Task 3: Sentiment Classification ===
GPT-5.4: mixed (confidence: 0.94)
Reasoning: The passage presents strong positive achievements and praise for the model alongside notable criticism about environmental costs, resulting in an overall mixed sentiment.
[tokens: 281 in / 54 out]
Sonnet 4.6: mixed (confidence: 0.92)
Reasoning: The text presents both positive elements (state-of-the-art performance, scientific breakthroughs, CEO praise) and negative concerns (environmental cost of training large models), resulting in a balanced mixed sentiment.
[tokens: 852 in / 115 out]
Haiku 4.5: mixed (confidence: 0.75)
Reasoning: The text presents both positive elements (performance achievements, scientific breakthroughs, CEO praise) and negative concerns (environmental cost criticisms), creating a balanced mixed sentiment overall.
[tokens: 851 in / 107 out]
Reflection¶
Wrap-Up¶
Key Takeaways¶
What’s Next¶
In Week 9, we move from basic API calls to mastering them. Part 01 covers prompt engineering — systematic techniques like few-shot prompting and chain-of-thought reasoning that dramatically improve output quality. Part 02 dives deep into structured outputs and function calling, building full data extraction pipelines. And the lab puts it all together in an API power workshop.