NLP Glossary & Quick Reference
CAP-6640: Computational Understanding of Natural Language
Spencer Lyon
This glossary serves as a living reference document for key terminology encountered throughout the course. Terms are organized by the week in which they are first introduced. Use this page to quickly look up unfamiliar concepts.
Week 1: Foundations & Overview¶
- Natural Language Processing (NLP)
- The field of computer science and artificial intelligence concerned with enabling computers to understand, interpret, and generate human language. NLP bridges linguistics, computer science, and machine learning.
- corpus
- A large, structured collection of text used for linguistic analysis or training language models. Plural: corpora. Examples include Wikipedia dumps, news archives, or social media datasets.
- token
- A single unit of text, typically a word, subword, or character, that serves as input to an NLP model. The process of splitting text into tokens is called tokenization.
- tokenization
- The process of breaking raw text into smaller units called tokens. Different strategies exist: word-level, character-level, and subword-level (e.g., Byte Pair Encoding).
- structured data
- Data organized in a predefined format with clear relationships, typically stored in tables with rows and columns. Examples: databases, spreadsheets, CSV files. Contrast with unstructured data.
- unstructured data
- Data without a predefined format or organization, such as free-form text, images, audio, or video. Most human-generated content (emails, articles, social media posts) is unstructured. Contrast with structured data.
- rule-based system
- An NLP approach that relies on hand-crafted linguistic rules and patterns rather than learned statistical models. Common in early NLP systems. Example: using regular expressions to extract phone numbers.
- foundation model
- A large-scale model trained on broad data that can be adapted to many downstream tasks. Examples include BERT, GPT, and LLaMA. Also called pretrained models. See also: LLM.
- Large Language Model, LLM
- A type of foundation model specifically trained on text data, typically with billions of parameters. LLMs can generate, summarize, translate, and reason about text. Examples: GPT-4, Claude, LLaMA.
- machine translation
- The task of automatically translating text from one language to another. One of the oldest NLP applications, now dominated by neural approaches.
- sentiment analysis
- The task of determining the emotional tone or opinion expressed in text (e.g., positive, negative, neutral). Common in social media monitoring and customer feedback analysis.
- named entity recognition, NER
- The task of identifying and classifying named entities (people, organizations, locations, dates, etc.) in text. Example: extracting “UCF” as an organization from a sentence.
- chatbot
- A software application that simulates human conversation through text or voice. Modern chatbots are often powered by LLMs.
- question answering, QA
- The task of automatically answering questions posed in natural language, either from a given context (extractive QA) or from learned knowledge (generative QA).
- summarization
- The task of producing a shorter version of a document while preserving its key information. Can be extractive (selecting sentences) or abstractive (generating new text).
- information retrieval, IR
- The task of finding relevant documents or passages from a large collection given a query. Search engines are the most common IR application.
- SpaCy
- An open-source Python library for industrial-strength NLP. Provides efficient implementations of tokenization, POS tagging, NER, dependency parsing, and more via processing pipelines.
- dependency parsing
- The task of analyzing the grammatical structure of a sentence to determine how words relate to each other. Creates a tree structure showing syntactic dependencies between words.
- part-of-speech tagging, POS tagging
- The task of labeling each word in a sentence with its grammatical role (noun, verb, adjective, etc.). Essential for understanding sentence structure.
- Doc
- In SpaCy, the primary container object that holds processed text and all linguistic annotations. Created when text is processed through an
nlppipeline. - text classification
- The task of assigning predefined categories or labels to text documents. Examples include spam detection, sentiment analysis, and topic classification.
Week 2: Text Processing¶
- stemming
- A text normalization technique that reduces words to their root form by removing suffixes using heuristic rules. Example: “running” → “run”. Faster but less accurate than lemmatization.
- lemmatization
- A text normalization technique that reduces words to their dictionary form (lemma) using vocabulary and morphological analysis. Example: “better” → “good”. More accurate than stemming but slower.
- stop word
- A common word (e.g., “the”, “is”, “at”) often filtered out during text preprocessing because it carries little semantic meaning. Stop word lists are language-specific.
- normalization
- The process of transforming text into a standard, consistent format. Includes lowercasing, removing punctuation, expanding contractions, and applying stemming or lemmatization.
- regular expression, regex
- A sequence of characters defining a search pattern, used for text matching, extraction, and substitution. Essential for text cleaning and pattern-based tokenization.
- pipeline
- In NLP, a sequence of processing steps applied to text. In SpaCy, a pipeline consists of components (tokenizer, tagger, parser, NER) that process a document in order.
Week 3: Text Representation¶
- bag of words
- A text representation that treats a document as an unordered collection of words, ignoring grammar and word order. Each document becomes a vector of word counts. Also called BoW.
- TF-IDF
- A numerical statistic reflecting how important a word is to a document within a corpus. Combines term frequency (how often a word appears) with inverse document frequency (how rare it is across documents). Stands for Term Frequency–Inverse Document Frequency.
- sparse representation
- A vector representation where most values are zero. Bag of words and TF-IDF produce sparse vectors. Contrast with dense representation.
- dense representation
- A vector representation where most or all values are non-zero, typically with lower dimensionality than sparse representations. Word embeddings are dense.
- word embedding
- A dense vector representation of a word that captures semantic meaning. Words with similar meanings have similar vectors. See: Word2Vec, GloVe.
- Word2Vec
- A neural network-based method for learning word embeddings from text. Uses either CBOW (predict word from context) or Skip-gram (predict context from word) architectures.
- GloVe
- A method for learning word embeddings by factorizing word co-occurrence matrices. Captures both local and global statistical information. Stands for Global Vectors for Word Representation.
- BPE
- A subword tokenization algorithm that iteratively merges the most frequent character pairs. Used by GPT models. Balances vocabulary size with handling of rare words. Stands for Byte Pair Encoding.
- subword tokenization
- A tokenization approach that breaks words into smaller units (subwords). Handles rare and out-of-vocabulary words effectively. See: BPE, WordPiece, SentencePiece.
Week 4: Classical NLP Tasks¶
- Naive Bayes
- A probabilistic classifier based on Bayes’ theorem with strong independence assumptions between features. Despite its simplicity, often effective for text classification.
- SVM, Support Vector Machine
- A supervised learning algorithm that finds the optimal hyperplane separating classes. Effective for high-dimensional text data.
- precision
- The fraction of positive predictions that are correct. Precision = TP / (TP + FP). See also: recall, F1 score.
- recall
- The fraction of actual positives that are correctly identified. Recall = TP / (TP + FN). See also: precision, F1 score.
- F1 score
- The harmonic mean of precision and recall. F1 = 2 * (precision * recall) / (precision + recall). Balances both metrics.
- confusion matrix
- A table showing the counts of true positives, true negatives, false positives, and false negatives for a classifier. Used to compute precision, recall, and other metrics.
- sequence labeling
- The task of assigning a label to each token in a sequence, rather than a single label to an entire document. Key examples include POS tagging and NER.
- BIO tagging
- A labeling scheme for sequence labeling that marks each token as B (beginning of an entity), I (inside/continuation of an entity), or O (outside any entity). Enables the representation of multi-word entities at the token level.
- K-means clustering
- An unsupervised learning algorithm that partitions data into K groups by minimizing the distance from each point to its cluster centroid. For text, operates on TF-IDF or embedding vectors.
- topic modeling
- An unsupervised approach to discovering latent themes (topics) in a collection of documents. Each topic is a distribution over words, and each document is a mixture of topics. See: LDA.
- LDA, Latent Dirichlet Allocation
- A generative probabilistic model for topic modeling. Assumes each document is a mixture of topics and each topic is a distribution over words. Discovers latent themes from word co-occurrence patterns.
Week 5: Neural Networks for NLP¶
- neural network
- A computational model inspired by biological neurons, consisting of layers of interconnected nodes that learn to transform inputs into outputs through training.
- feed-forward network
- A neural network where information flows in one direction — from input through hidden layers to output — with no cycles or loops. The simplest type of neural network architecture.
- linear layer
- A neural network layer that applies a linear transformation , where (weights) and (bias) are learned parameters. Also called a dense or fully-connected layer. In PyTorch:
nn.Linear. - softmax
- A function that converts a vector of real numbers into a probability distribution by exponentiating each element and normalizing by the sum: . Used in attention mechanisms and output layers.
- activation function
- A mathematical function applied to a neuron’s output to introduce non-linearity. Common choices include ReLU, sigmoid, and tanh. Without activation functions, a multi-layer network would collapse to a single linear transformation.
- backpropagation
- An algorithm for computing gradients of the loss function with respect to network weights, enabling training via gradient descent. Works by applying the chain rule backward through the network layers.
- RNN, Recurrent Neural Network
- A neural network architecture that processes sequences by maintaining a hidden state that captures information from previous steps. Suffers from vanishing gradient problems on long sequences.
- LSTM, Long Short-Term Memory
- A type of RNN with gating mechanisms that allow it to learn long-range dependencies by controlling what information to remember or forget.
- GRU, Gated Recurrent Unit
- A simplified variant of LSTM with fewer parameters. Uses reset and update gates to control information flow.
- vanishing gradient
- A problem in deep networks where gradients become very small during backpropagation, preventing earlier layers from learning. Particularly problematic for RNNs on long sequences.
- sequence-to-sequence, seq2seq
- A model architecture that maps an input sequence to an output sequence, potentially of different length. Used for machine translation, summarization, etc.
- attention mechanism
- A technique allowing models to focus on relevant parts of the input when producing each part of the output. Foundation for transformer architectures.
Week 6: Transformers I¶
- transformer
- A neural network architecture based entirely on self-attention, without recurrence. Enables parallel processing and captures long-range dependencies efficiently.
- self-attention
- An attention mechanism where each position in a sequence attends to all other positions in the same sequence, computing relevance weights.
- multi-head attention
- Running multiple self-attention operations in parallel, allowing the model to attend to different aspects of the input simultaneously.
- positional encoding
- A technique for injecting sequence order information into transformer models, which otherwise have no inherent notion of position.
- encoder
- In a transformer, the component that processes the input sequence and produces contextual representations. Used for understanding tasks.
- decoder
- In a transformer, the component that generates output sequences, attending to both its own previous outputs and the encoder’s representations.
- residual connection
- A connection that adds the input of a layer directly to its output, helping gradients flow through deep networks. Standard in transformers.
- layer normalization
- A technique that normalizes activations across features for each example, stabilizing training in transformers.
Week 7: Transformers II & Hugging Face¶
- BERT, Bidirectional Encoder Representations from Transformers
- An encoder-only transformer model pretrained on masked language modeling. Published by Devlin et al. (2018). Excels at understanding tasks like classification and NER.
- RoBERTa
- A robustly optimized version of BERT (Liu et al., 2019). Improves on BERT by training longer, on more data, with dynamic masking, and dropping the Next Sentence Prediction objective.
- GPT, Generative Pre-trained Transformer
- A decoder-only transformer model trained on causal language modeling (next-token prediction). Excels at text generation. The architecture behind ChatGPT and most modern LLMs.
- T5, Text-to-Text Transfer Transformer
- An encoder-decoder transformer that frames all NLP tasks as text-to-text problems. Pretrained with span corruption. Versatile for many tasks.
- encoder-only
- A transformer architecture using only the encoder, with bidirectional attention. Best for understanding tasks like classification and extraction. Examples: BERT, RoBERTa.
- decoder-only
- A transformer architecture using only the decoder, with causal (left-to-right) attention. Best for generation tasks and, at scale, general-purpose NLP. Examples: GPT, LLaMA, Claude.
- masked language modeling, MLM
- A pretraining objective for encoder-only models. Randomly masks a fraction of input tokens and trains the model to predict them from the surrounding bidirectional context. Used by BERT and RoBERTa.
- causal language modeling
- A pretraining objective for decoder-only models. Trains the model to predict the next token given all previous tokens, using a causal (left-to-right) attention mask. Also called autoregressive language modeling.
- in-context learning
- The ability of a large decoder-only model to perform new tasks by conditioning on examples provided in the prompt, without any parameter updates. First demonstrated at scale by GPT-3. Also called few-shot prompting.
- Hugging Face
- A company and open-source platform providing tools for NLP, including the Transformers library, model hub, and datasets.
- pipeline (Hugging Face)
- A high-level API in Hugging Face Transformers that provides easy access to common NLP tasks with pretrained models.
Week 8: Foundation Models & LLM Landscape¶
- fine-tuning
- Adapting a pretrained model to a specific task by continuing training on task-specific data. Can be full fine-tuning or parameter-efficient.
- LoRA, Low-Rank Adaptation
- A parameter-efficient fine-tuning method that trains small, low-rank matrices instead of updating all model weights. Reduces memory and compute requirements.
- RLHF, Reinforcement Learning from Human Feedback
- A training technique that aligns LLMs with human preferences by training a reward model on human comparisons and optimizing the LLM using reinforcement learning.
- instruction tuning
- Training a model to follow natural language instructions by fine-tuning on instruction-response pairs.
- scaling laws
- Empirical findings showing that language model performance follows predictable power-law relationships as model size, dataset size, and compute are increased. Key papers: Kaplan et al. (2020), Hoffmann et al. (2022, “Chinchilla”).
- Chinchilla
- A 70B parameter language model from DeepMind (Hoffmann et al., 2022) that demonstrated compute-optimal training: for a fixed compute budget, model size and training data should be scaled roughly equally. Outperformed the much larger 280B Gopher model by training on more data.
- compute-optimal training
- The principle, established by the Chinchilla paper, that language models achieve the best performance for a given compute budget when model size and dataset size are scaled in proportion. Many early LLMs were “undertrained” — too many parameters, too little data.
- pretraining
- The first phase of training a foundation model, where the model learns general language patterns from a massive, unlabeled text corpus using self-supervised objectives like causal language modeling or masked language modeling.
- GPT-3
- A 175B parameter decoder-only LLM from OpenAI (Brown et al., 2020). Landmark model that first demonstrated in-context learning and few-shot prompting at scale, showing that a single model could perform diverse tasks without fine-tuning.
- ChatGPT
- An OpenAI product (launched November 2022) that wrapped GPT-3.5 in a conversational interface with RLHF alignment. Became the fastest-growing consumer application in history and brought LLMs into mainstream awareness.
- emergent capabilities
- Abilities that appear in LLMs only at sufficient scale and were not explicitly trained. Examples include in-context learning, chain-of-thought reasoning, code generation, and instruction following.
- chain-of-thought reasoning, CoT
- An emergent capability where LLMs solve multi-step problems by generating intermediate reasoning steps (“thinking out loud”) before producing a final answer. Can be elicited via prompting (e.g., “Let’s think step by step”).
- code generation
- An emergent capability where LLMs write working programs from natural language descriptions. Enabled by including source code in pretraining data.
- instruction following
- An emergent capability where LLMs understand and execute complex, multi-part natural language instructions. Enhanced by instruction tuning but observed even in base models at sufficient scale.
- LLaMA
- A family of open models released by Meta, beginning in February 2023 (7B–65B parameters). The original weights were leaked within a week of release, sparking an explosion of community-built derivatives (Alpaca, Vicuna, Koala, and 7,000+ others on Hugging Face). Meta embraced the movement with Llama 2 (July 2023), then Llama 3 and Llama 4. LLaMA’s leak and the community response catalyzed the modern open model ecosystem.
- Mixture-of-Experts, MoE
- A model architecture where only a subset of parameters (“experts”) are activated for each input token, routed by a learned gating network. Allows models to have very large total parameter counts while keeping per-token computation manageable. Used by Llama 4, DeepSeek V3, and Qwen 3.
- zero-shot
- Using a model on a task without providing any examples in the prompt. The model relies solely on its pretraining knowledge and instruction tuning.
- few-shot
- Providing a small number of examples in the prompt to guide the model’s behavior on a new task, without any parameter updates. Also called in-context learning.
- open model
- A model whose weights are publicly available for download and local deployment. Examples: Llama 4, DeepSeek V3, Qwen 3. Contrast with closed model. Licensing varies — “open weights” does not always mean “open source.”
- closed model
- A model accessible only through an API, with weights not publicly available. Examples: GPT-5, Claude, Gemini. Contrast with open model.
- QLoRA
- A fine-tuning technique that combines LoRA with 4-bit quantization of the base model. Enables fine-tuning of large models on consumer hardware by dramatically reducing memory requirements while maintaining most of the quality of full-precision LoRA.
- supervised fine-tuning, SFT
- The process of continuing to train a pretrained model on labeled instruction-response pairs using standard supervised learning. The first step of post-training, before preference alignment. See also: instruction tuning.
- DPO, Direct Preference Optimization
- An alignment technique that directly optimizes a language model on preference data without training a separate reward model. Simpler alternative to RLHF that achieves comparable results with less infrastructure complexity.
- RLAIF, Reinforcement Learning from AI Feedback
- A variant of RLHF where preference labels are generated by another LLM instead of human annotators. Reduces the cost and time of collecting human preference data while maintaining alignment quality.
- domain adaptation
- The process of specializing a general-purpose model for a specific field (e.g., medicine, law, finance) through continued pretraining on domain text, fine-tuning on domain tasks, or both.
- catastrophic forgetting
- A phenomenon where a neural network loses previously learned knowledge when trained on new data. A key risk of full fine-tuning that parameter-efficient methods like LoRA help mitigate.
- system prompt
- Instructions provided to an LLM that define its behavior, personality, or constraints for a conversation. Typically set by the application developer, not the end user.
- structured output
- Constraining an LLM to produce output in a specific format (e.g., JSON matching a schema) rather than free-form text. Enables reliable programmatic consumption of model responses.
Week 9: Prompt Engineering & Structured Outputs¶
- prompt engineering
- The practice of crafting input text (prompts) to elicit desired behavior from LLMs. Includes techniques like few-shot learning and chain-of-thought.
- chain-of-thought
- A prompt engineering technique that encourages models to show reasoning steps before giving a final answer, improving performance on complex tasks. Modern frontier models have internalized this capability via built-in thinking modes.
- function calling, tool use
- The ability of an LLM to invoke external functions or APIs based on natural language requests. Enables agents to take actions.
Week 10: Retrieval-Augmented Generation¶
- RAG, Retrieval-Augmented Generation
- A pattern that combines retrieval from external knowledge sources with LLM generation. Grounds responses in specific documents, reducing hallucination. Introduced by Lewis et al. (2020) at NeurIPS.
- vector database
- A database optimized for storing and searching high-dimensional vectors (embeddings). Used in RAG for similarity search. Examples: ChromaDB, FAISS, Pinecone, Weaviate.
- embedding model
- A model that converts text into dense vector representations for semantic similarity search. Used to encode documents and queries in RAG systems. Examples: sentence-transformers, OpenAI text-embedding models.
- chunking
- Splitting documents into smaller segments for indexing in a RAG system. Strategies include fixed-size, sentence-aware, recursive, semantic, and document-aware splitting. Chunk size and strategy significantly affect retrieval quality.
- similarity search
- Finding items (documents, passages) whose vector representations are closest to a query vector. The core retrieval mechanism in dense RAG systems, typically using cosine similarity or dot product distance.
- BM25
- Best Matching 25 — a classical information retrieval algorithm that ranks documents by term frequency and inverse document frequency, accounting for document length. A refinement of TF-IDF that remains competitive with neural methods for exact keyword matching.
- dense retrieval
- An information retrieval approach that uses learned embedding models to encode queries and documents as dense vectors, retrieving by nearest-neighbor search. Solves the vocabulary mismatch problem of keyword-based methods like BM25. See: Dense Passage Retrieval (DPR).
- hybrid search
- A retrieval strategy that combines sparse keyword matching (BM25) with dense retrieval (vector similarity) to capture both exact term matches and semantic meaning. Results are typically merged using reciprocal rank fusion (RRF). The production standard for enterprise RAG systems in 2025–26.
- reranking
- A second-stage retrieval step that reorders initial search results using a more sophisticated model (typically a cross-encoder) to improve precision. The “retrieve wide, rerank narrow” pattern — retrieve top-50 candidates, rerank to top-5 — is a common best practice.
- bi-encoder
- A retrieval model architecture where the query and document are encoded separately into vectors, then compared by cosine similarity. Fast at retrieval time (encode once, compare many) but less accurate than a cross-encoder because the model never sees query and document together. Used for first-pass retrieval in RAG systems. Example: sentence-transformers.
- cross-encoder
- A retrieval model architecture where the query and document are fed together through a transformer, producing a single relevance score. More accurate than a bi-encoder because it can attend to fine-grained interactions, but too slow for first-pass retrieval. Used for reranking in the “retrieve wide, rerank narrow” pattern.
- hallucination
- When an LLM generates plausible-sounding but factually incorrect or unsupported information. A key motivation for RAG, which grounds responses in retrieved documents.
- lost-in-the-middle
- A phenomenon where LLMs perform worse on information placed in the middle of long contexts compared to information at the beginning or end. A key limitation of the long-context alternative to RAG.
Week 11: LLM Evaluation & Benchmarks¶
- benchmark
- A standardized dataset and evaluation protocol for measuring model performance on specific tasks.
- BLEU, Bilingual Evaluation Understudy
- A metric for evaluating machine translation quality by comparing n-gram overlap between generated and reference text.
- ROUGE, Recall-Oriented Understudy for Gisting Evaluation
- A set of metrics for evaluating summarization by measuring n-gram overlap with reference summaries.
- BERTScore
- An evaluation metric that uses BERT embeddings to compute semantic similarity between generated and reference text.
- MMLU, Massive Multitask Language Understanding
- A benchmark testing LLMs on 57 academic subjects, from STEM to humanities.
- LLM-as-judge
- Using an LLM to evaluate the outputs of another LLM, often for subjective qualities like helpfulness or safety.
- red-teaming
- Adversarial testing of AI systems to find vulnerabilities, failure modes, or harmful outputs.
Week 12: Building LLM-Powered Agents¶
- agent
- An LLM with access to tools, running in a loop. The LLM decides which tools to call, observes the results, and repeats until the task is complete.
- ReAct, Reasoning and Acting
- An agent pattern that interleaves reasoning (thinking about what to do) with acting (using tools), improving task completion.
- memory (agent)
- The ability of an agent to store and retrieve information across interactions. Includes conversation history (short-term) and vector databases (long-term).
- planning
- An agent’s ability to break complex goals into subtasks and determine the sequence of actions needed to achieve them.
- reasoning
- An agent’s ability to think through problems, draw conclusions, and make decisions based on available information.
- dependency injection
- A design pattern where external state (databases, API clients, configuration) is passed into a component at runtime rather than hardcoded. In PydanticAI, implemented via
RunContext[DepsType], where a dataclass of dependencies is injected atrun()time and accessible in tools and system prompts. - agent loop
- The iterative cycle an agent executes: receive input → reason → call tools → observe results → repeat until a final answer is produced. In PydanticAI, implemented as a graph of
UserPromptNode→ModelRequestNode→CallToolsNodenodes.
Week 13: Agentic Architectures & Orchestration¶
- multi-agent system
- An architecture where multiple specialized agents collaborate to solve complex problems, each with distinct roles or capabilities.
- orchestration
- Coordinating the execution of multiple agents or workflow steps, managing dependencies and information flow.
- workflow
- A defined sequence of steps or processes that an agent or system follows to complete a task.
- LangGraph
- A framework for building stateful, graph-based agent workflows with support for cycles and conditional branching.
- human-in-the-loop
- A design pattern where human oversight or approval is required at certain points in an automated workflow.
Week 14: Ethics & Responsible AI¶
- bias (AI)
- Systematic errors in AI systems that lead to unfair outcomes for certain groups. Can originate from training data, model design, or deployment context.
- fairness
- The principle that AI systems should not discriminate against individuals or groups based on protected characteristics.
- PII, Personally Identifiable Information
- Data that can identify an individual, such as names, addresses, or social security numbers. Must be handled carefully in NLP applications.
- memorization
- When an LLM reproduces training data verbatim, potentially leaking private information or copyrighted content.
- alignment
- Ensuring AI systems behave according to human values and intentions. A core challenge in LLM development.
- EU AI Act
- European Union legislation establishing requirements for AI systems based on risk levels, including transparency, human oversight, and prohibited uses.
Quick Reference: Code Snippets¶
Tokenization with SpaCy¶
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Natural language processing is fascinating!")
# Access tokens
for token in doc:
print(token.text, token.pos_, token.lemma_)Stemming vs Lemmatization¶
from nltk.stem import PorterStemmer, WordNetLemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
words = ["running", "ran", "better", "studies"]
for word in words:
print(f"{word} -> stem: {stemmer.stem(word)}, lemma: {lemmatizer.lemmatize(word, pos='v')}")Hugging Face Pipeline¶
from transformers import pipeline
# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("I love this course!")
print(result) # [{'label': 'POSITIVE', 'score': 0.99}]
# Named Entity Recognition
ner = pipeline("ner", grouped_entities=True)
entities = ner("Spencer Lyon teaches at UCF in Orlando.")
print(entities)