Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Evolution and Practice of NLP

University of Central Florida
Arete Capital Partners

CAP-6640: Computational Understanding of Natural Language

Spencer Lyon

Prerequisites

Outcomes

References


A Brief History of NLP

Understanding where NLP came from helps us appreciate where it’s going. The history of the field is a story of big dreams, spectacular failures, and hard-won insights about what actually works.

The 1950s-1960s: The Dream Begins

The field of NLP began with remarkable optimism. In 1950, Alan Turing proposed his famous test: could a machine converse so convincingly that a human couldn’t tell it wasn’t human? The goal was set.

In 1954, the Georgetown-IBM experiment demonstrated automatic translation of 60 Russian sentences into English. The researchers boldly predicted that machine translation would be “solved” within 3-5 years.

They were... wrong.

By 1966, the ALPAC report delivered a devastating assessment: machine translation wasn’t anywhere close to working. Funding dried up. The first “AI Winter” had begun.

But that same year, something interesting happened. Joseph Weizenbaum created ELIZA, a simple chatbot that simulated a Rogerian psychotherapist:

Human: I am feeling sad today.
ELIZA: How long have you been feeling sad today?
Human: My mother doesn't understand me.
ELIZA: Tell me more about your family.

ELIZA used simple pattern matching — no real understanding. Yet people found it compelling, even knowing it was a machine. Weizenbaum was disturbed by how easily humans anthropomorphized his creation.

What this era established: The goal of machine language understanding, and the sobering realization that it was much harder than anyone expected.

The 1970s-1980s: Knowledge Engineering

If we can’t learn language from data, perhaps we can encode it manually. This era was defined by knowledge engineering — the painstaking process of hand-crafting rules, grammars, and knowledge bases for rule-based systems.

The poster child was SHRDLU (1970), Terry Winograd’s system that could understand and respond to commands about a simulated “blocks world”:

Human: Put the red block on the blue block.
SHRDLU: OK.
Human: What is on the blue block?
SHRDLU: The red block.
Human: Why did you put the red block there?
SHRDLU: Because you asked me to.

Impressive! But notice the constraint: a tiny world with just colored blocks. In this microworld, SHRDLU worked beautifully.

The problem? The real world isn’t a blocks world. Attempts to scale these approaches to unrestricted language failed. The rules became unwieldy, the exceptions multiplied, and the systems grew brittle — breaking on inputs their designers hadn’t anticipated.

By the late 1980s, another AI Winter had set in.

The 1990s-2000s: The Statistical Revolution

The breakthrough came from an unexpected direction: speech recognition research at IBM. Fred Jelinek famously quipped (paraphrased):

“Every time I fire a linguist, the performance of our speech recognizer goes up.”

The insight was radical: instead of encoding linguistic rules, let the data decide. Count how often words appear together. Calculate probabilities. Let statistics do the work.

N-gram models exemplify this approach. Consider the phrase:

“recognize speech” vs “wreck a nice beach”

These sound almost identical when spoken. How does a computer choose? By calculating which sequence of words is more probable in English:

No linguistic rules needed — just word frequency statistics from millions of documents.

This era also saw the rise of machine learning for NLP: Naive Bayes for text classification, Hidden Markov Models for part-of-speech tagging, statistical parsing algorithms.

The key insight: More data beats clever algorithms. A simple model trained on massive data often outperforms a complex model with limited data.

The 2010s: Deep Learning Changes Everything

The statistical revolution relied on hand-crafted features. Someone had to decide what to count — which word combinations, which patterns. Feature engineering was an art.

Deep learning changed that. Instead of hand-crafted features, neural networks learn their own representations from raw data.

The watershed moment was Word2Vec (2013). Tomas Mikolov and colleagues at Google showed that you could represent words as dense vectors — points in a high-dimensional space — learned purely from text.

The magic? These vectors captured meaning. Words with similar meanings clustered together. And you could do arithmetic:

king - man + woman ≈ queen
paris - france + italy ≈ rome

This wasn’t programmed. It emerged from patterns in billions of words.

Recurrent Neural Networks (RNNs) and their variants (LSTMs, GRUs) added the ability to process sequences — essential for language, which unfolds word by word.

The key insight: Learned representations beat hand-crafted features. Let the model discover what matters.

2017-Present: The Transformer Revolution

In 2017, a paper from Google with a provocative title changed everything: “Attention Is All You Need.”

The Transformer architecture abandoned the sequential processing of RNNs for a mechanism called self-attention that could look at all words in a sentence simultaneously, weighing their relevance to each other.

This enabled unprecedented parallelization and scaling. Models could now be trained on more data, with more parameters, faster than ever before.

What followed was an explosion:

Remember the Winograd Schema Challenge from the previous lecture?

"The trophy wouldn't fit in the suitcase because it was too big."

Modern LLMs solve these effortlessly. They’ve seen so much text that common-sense knowledge emerges from patterns.

The key insight: Scale + architecture + data = emergent capabilities. Things that seemed impossible become possible at sufficient scale.


NLP Applications in the Real World

Now that we understand where NLP came from, let’s survey where it’s used today. NLP is everywhere — often invisibly.

Core NLP Tasks

TaskDescriptionExamples
Machine TranslationConvert text between languagesGoogle Translate, DeepL
Sentiment AnalysisDetermine emotional toneBrand monitoring, stock sentiment
Named Entity RecognitionFind people, places, organizationsNews extraction, legal discovery
Question AnsweringAnswer questions from text or knowledgeSearch engines, customer support
SummarizationCondense long textNews digests, meeting notes
Chatbots/DialogueConversational interactionCustomer service, virtual assistants
Text ClassificationAssign categories to textSpam detection, topic tagging

Each of these is a multi-billion dollar industry. And each builds on the techniques we’ll learn in this course.

Industry Impact

Technology: Search engines are NLP at massive scale. Every Google query is a natural language understanding problem. Recommendations (“customers who liked X also liked Y”) often involve text analysis. Content moderation on social platforms is impossible without NLP.

Healthcare: Clinical notes are unstructured text. Extracting diagnoses, medications, and procedures requires NLP. Drug discovery involves reading millions of research papers. Patient chatbots handle routine questions.

Finance: Sentiment analysis moves markets. Hedge funds analyze news, earnings calls, and social media to predict stock movements. Compliance requires reviewing contracts for specific clauses. Fraud detection examines transaction descriptions.

Legal: Contract analysis extracts key terms and risks. Discovery in litigation involves searching millions of documents. Legal research means finding relevant precedents in case law.


Hands-On: SpaCy Basics

Let’s return to code and explore SpaCy’s core concepts in detail. In the previous lecture, we saw the “magic” — now let’s understand what’s actually happening.

The Doc Object

When you process text with SpaCy, you get back a Doc object — a container for all the linguistic annotations.

import spacy

nlp = spacy.load("en_core_web_sm")

# Processing text creates a Doc object
doc = nlp("Apple is looking at buying a U.K. startup for $1 billion")

A Doc is a sequence of Token objects. Each token has rich annotations:

print("=== TOKENS ===")
for token in doc:
    print(f"{token.text:12} | POS: {token.pos_:6} | DEP: {token.dep_:10} | HEAD: {token.head.text}")
=== TOKENS ===
Apple        | POS: PROPN  | DEP: nsubj      | HEAD: looking
is           | POS: AUX    | DEP: aux        | HEAD: looking
looking      | POS: VERB   | DEP: ROOT       | HEAD: looking
at           | POS: ADP    | DEP: prep       | HEAD: looking
buying       | POS: VERB   | DEP: pcomp      | HEAD: at
a            | POS: DET    | DEP: det        | HEAD: U.K.
U.K.         | POS: PROPN  | DEP: dobj       | HEAD: buying
startup      | POS: NOUN   | DEP: advcl      | HEAD: looking
for          | POS: ADP    | DEP: prep       | HEAD: startup
$            | POS: SYM    | DEP: quantmod   | HEAD: billion
1            | POS: NUM    | DEP: compound   | HEAD: billion
billion      | POS: NUM    | DEP: pobj       | HEAD: for

Let’s break down these attributes:

Named Entities

Named Entity Recognition identifies real-world objects: people, organizations, locations, dates, money, etc.

print("=== ENTITIES ===")
for ent in doc.ents:
    print(f"{ent.text:20} | {ent.label_:10} | {spacy.explain(ent.label_)}")
=== ENTITIES ===
Apple                | ORG        | Companies, agencies, institutions, etc.
U.K.                 | GPE        | Countries, cities, states
$1 billion           | MONEY      | Monetary values, including unit

The spacy.explain() function gives you human-readable descriptions of the labels.

Sentence Segmentation

SpaCy automatically splits text into sentences:

text = "NLP is fascinating. It powers many applications we use daily. Let's explore further."
doc = nlp(text)

print("=== SENTENCES ===")
for sent in doc.sents:
    print(f"- {sent.text}")
=== SENTENCES ===
- NLP is fascinating.
- It powers many applications we use daily.
- Let's explore further.

Exploring Token Attributes

Tokens have many more useful attributes. Let’s explore a few:

doc = nlp("The quick brown foxes are jumping over the lazy dogs.")

print("=== EXTENDED TOKEN INFO ===")
for token in doc:
    print(f"{token.text:10} | lemma: {token.lemma_:10} | is_stop: {token.is_stop} | is_alpha: {token.is_alpha}")
=== EXTENDED TOKEN INFO ===
The        | lemma: the        | is_stop: True | is_alpha: True
quick      | lemma: quick      | is_stop: False | is_alpha: True
brown      | lemma: brown      | is_stop: False | is_alpha: True
foxes      | lemma: fox        | is_stop: False | is_alpha: True
are        | lemma: be         | is_stop: True | is_alpha: True
jumping    | lemma: jump       | is_stop: False | is_alpha: True
over       | lemma: over       | is_stop: True | is_alpha: True
the        | lemma: the        | is_stop: True | is_alpha: True
lazy       | lemma: lazy       | is_stop: False | is_alpha: True
dogs       | lemma: dog        | is_stop: False | is_alpha: True
.          | lemma: .          | is_stop: False | is_alpha: False

These attributes become crucial for text preprocessing, which we’ll cover in depth next week.


Wrap-Up

What We Covered Today

  1. The history of NLP — from rule-based dreams to the transformer revolution

  2. Why each paradigm shift mattered — data beats rules, learned features beat hand-crafted ones, scale unlocks emergence

  3. NLP applications — everywhere in tech, healthcare, finance, legal

  4. SpaCy fundamentals — tokens, entities, sentences, and their attributes

What’s Next

In the next session, you’ll apply what you’ve learned in a hands-on lab. You’ll analyze text from different domains, discover what SpaCy gets right and wrong, and start building intuition for the challenges of real-world NLP.