The Evolution and Practice of NLP
CAP-6640: Computational Understanding of Natural Language
Spencer Lyon
Prerequisites
L01.01: What is NLP?
Basic Python and SpaCy setup complete
Outcomes
Trace the evolution of NLP from rule-based systems to modern LLMs
Understand why each paradigm shift occurred (not just what changed)
Identify major NLP application areas and their industry impact
Use SpaCy to perform tokenization, POS tagging, NER, and dependency parsing
Interpret SpaCy’s linguistic annotations on real text
References
Jurafsky & Martin, Speech and Language Processing (3rd ed. Draft), Chapters 1 and 2 (download here)
A Brief History of NLP¶
Understanding where NLP came from helps us appreciate where it’s going. The history of the field is a story of big dreams, spectacular failures, and hard-won insights about what actually works.
The 1950s-1960s: The Dream Begins¶
The field of NLP began with remarkable optimism. In 1950, Alan Turing proposed his famous test: could a machine converse so convincingly that a human couldn’t tell it wasn’t human? The goal was set.
In 1954, the Georgetown-IBM experiment demonstrated automatic translation of 60 Russian sentences into English. The researchers boldly predicted that machine translation would be “solved” within 3-5 years.
They were... wrong.
By 1966, the ALPAC report delivered a devastating assessment: machine translation wasn’t anywhere close to working. Funding dried up. The first “AI Winter” had begun.
But that same year, something interesting happened. Joseph Weizenbaum created ELIZA, a simple chatbot that simulated a Rogerian psychotherapist:
Human: I am feeling sad today.
ELIZA: How long have you been feeling sad today?
Human: My mother doesn't understand me.
ELIZA: Tell me more about your family.ELIZA used simple pattern matching — no real understanding. Yet people found it compelling, even knowing it was a machine. Weizenbaum was disturbed by how easily humans anthropomorphized his creation.
What this era established: The goal of machine language understanding, and the sobering realization that it was much harder than anyone expected.
The 1970s-1980s: Knowledge Engineering¶
If we can’t learn language from data, perhaps we can encode it manually. This era was defined by knowledge engineering — the painstaking process of hand-crafting rules, grammars, and knowledge bases for rule-based systems.
The poster child was SHRDLU (1970), Terry Winograd’s system that could understand and respond to commands about a simulated “blocks world”:
Human: Put the red block on the blue block.
SHRDLU: OK.
Human: What is on the blue block?
SHRDLU: The red block.
Human: Why did you put the red block there?
SHRDLU: Because you asked me to.Impressive! But notice the constraint: a tiny world with just colored blocks. In this microworld, SHRDLU worked beautifully.
The problem? The real world isn’t a blocks world. Attempts to scale these approaches to unrestricted language failed. The rules became unwieldy, the exceptions multiplied, and the systems grew brittle — breaking on inputs their designers hadn’t anticipated.
By the late 1980s, another AI Winter had set in.
The 1990s-2000s: The Statistical Revolution¶
The breakthrough came from an unexpected direction: speech recognition research at IBM. Fred Jelinek famously quipped (paraphrased):
“Every time I fire a linguist, the performance of our speech recognizer goes up.”
The insight was radical: instead of encoding linguistic rules, let the data decide. Count how often words appear together. Calculate probabilities. Let statistics do the work.
N-gram models exemplify this approach. Consider the phrase:
“recognize speech” vs “wreck a nice beach”
These sound almost identical when spoken. How does a computer choose? By calculating which sequence of words is more probable in English:
P(“recognize speech”) = pretty high (common phrase)
P(“wreck a nice beach”) = pretty low (weird phrase)
No linguistic rules needed — just word frequency statistics from millions of documents.
This era also saw the rise of machine learning for NLP: Naive Bayes for text classification, Hidden Markov Models for part-of-speech tagging, statistical parsing algorithms.
The key insight: More data beats clever algorithms. A simple model trained on massive data often outperforms a complex model with limited data.
The 2010s: Deep Learning Changes Everything¶
The statistical revolution relied on hand-crafted features. Someone had to decide what to count — which word combinations, which patterns. Feature engineering was an art.
Deep learning changed that. Instead of hand-crafted features, neural networks learn their own representations from raw data.
The watershed moment was Word2Vec (2013). Tomas Mikolov and colleagues at Google showed that you could represent words as dense vectors — points in a high-dimensional space — learned purely from text.
The magic? These vectors captured meaning. Words with similar meanings clustered together. And you could do arithmetic:
king - man + woman ≈ queen
paris - france + italy ≈ romeThis wasn’t programmed. It emerged from patterns in billions of words.
Recurrent Neural Networks (RNNs) and their variants (LSTMs, GRUs) added the ability to process sequences — essential for language, which unfolds word by word.
The key insight: Learned representations beat hand-crafted features. Let the model discover what matters.
2017-Present: The Transformer Revolution¶
In 2017, a paper from Google with a provocative title changed everything: “Attention Is All You Need.”
The Transformer architecture abandoned the sequential processing of RNNs for a mechanism called self-attention that could look at all words in a sentence simultaneously, weighing their relevance to each other.
This enabled unprecedented parallelization and scaling. Models could now be trained on more data, with more parameters, faster than ever before.
What followed was an explosion:
BERT (2018): Bidirectional understanding of text. State-of-the-art on virtually every NLP benchmark.
GPT-2 (2019): Generated such convincing text that OpenAI initially withheld it.
GPT-3 (2020): 175 billion parameters. Few-shot learning. The beginning of “prompt engineering.”
ChatGPT (2022): NLP goes mainstream. Your grandmother knows what a chatbot is now.
Remember the Winograd Schema Challenge from the previous lecture?
"The trophy wouldn't fit in the suitcase because it was too big."Modern LLMs solve these effortlessly. They’ve seen so much text that common-sense knowledge emerges from patterns.
The key insight: Scale + architecture + data = emergent capabilities. Things that seemed impossible become possible at sufficient scale.
NLP Applications in the Real World¶
Now that we understand where NLP came from, let’s survey where it’s used today. NLP is everywhere — often invisibly.
Core NLP Tasks¶
| Task | Description | Examples |
|---|---|---|
| Machine Translation | Convert text between languages | Google Translate, DeepL |
| Sentiment Analysis | Determine emotional tone | Brand monitoring, stock sentiment |
| Named Entity Recognition | Find people, places, organizations | News extraction, legal discovery |
| Question Answering | Answer questions from text or knowledge | Search engines, customer support |
| Summarization | Condense long text | News digests, meeting notes |
| Chatbots/Dialogue | Conversational interaction | Customer service, virtual assistants |
| Text Classification | Assign categories to text | Spam detection, topic tagging |
Each of these is a multi-billion dollar industry. And each builds on the techniques we’ll learn in this course.
Industry Impact¶
Technology: Search engines are NLP at massive scale. Every Google query is a natural language understanding problem. Recommendations (“customers who liked X also liked Y”) often involve text analysis. Content moderation on social platforms is impossible without NLP.
Healthcare: Clinical notes are unstructured text. Extracting diagnoses, medications, and procedures requires NLP. Drug discovery involves reading millions of research papers. Patient chatbots handle routine questions.
Finance: Sentiment analysis moves markets. Hedge funds analyze news, earnings calls, and social media to predict stock movements. Compliance requires reviewing contracts for specific clauses. Fraud detection examines transaction descriptions.
Legal: Contract analysis extracts key terms and risks. Discovery in litigation involves searching millions of documents. Legal research means finding relevant precedents in case law.
Hands-On: SpaCy Basics¶
Let’s return to code and explore SpaCy’s core concepts in detail. In the previous lecture, we saw the “magic” — now let’s understand what’s actually happening.
The Doc Object¶
When you process text with SpaCy, you get back a Doc object — a container for all the linguistic annotations.
import spacy
nlp = spacy.load("en_core_web_sm")
# Processing text creates a Doc object
doc = nlp("Apple is looking at buying a U.K. startup for $1 billion")A Doc is a sequence of Token objects. Each token has rich annotations:
print("=== TOKENS ===")
for token in doc:
print(f"{token.text:12} | POS: {token.pos_:6} | DEP: {token.dep_:10} | HEAD: {token.head.text}")=== TOKENS ===
Apple | POS: PROPN | DEP: nsubj | HEAD: looking
is | POS: AUX | DEP: aux | HEAD: looking
looking | POS: VERB | DEP: ROOT | HEAD: looking
at | POS: ADP | DEP: prep | HEAD: looking
buying | POS: VERB | DEP: pcomp | HEAD: at
a | POS: DET | DEP: det | HEAD: U.K.
U.K. | POS: PROPN | DEP: dobj | HEAD: buying
startup | POS: NOUN | DEP: advcl | HEAD: looking
for | POS: ADP | DEP: prep | HEAD: startup
$ | POS: SYM | DEP: quantmod | HEAD: billion
1 | POS: NUM | DEP: compound | HEAD: billion
billion | POS: NUM | DEP: pobj | HEAD: for
Let’s break down these attributes:
text: The actual word
pos_: Part-of-speech tagging tag (NOUN, VERB, ADJ, etc.)
dep_: Dependency parsing label (how this word relates to its head)
head: The word this token is syntactically attached to
Named Entities¶
Named Entity Recognition identifies real-world objects: people, organizations, locations, dates, money, etc.
print("=== ENTITIES ===")
for ent in doc.ents:
print(f"{ent.text:20} | {ent.label_:10} | {spacy.explain(ent.label_)}")=== ENTITIES ===
Apple | ORG | Companies, agencies, institutions, etc.
U.K. | GPE | Countries, cities, states
$1 billion | MONEY | Monetary values, including unit
The spacy.explain() function gives you human-readable descriptions of the labels.
Sentence Segmentation¶
SpaCy automatically splits text into sentences:
text = "NLP is fascinating. It powers many applications we use daily. Let's explore further."
doc = nlp(text)
print("=== SENTENCES ===")
for sent in doc.sents:
print(f"- {sent.text}")=== SENTENCES ===
- NLP is fascinating.
- It powers many applications we use daily.
- Let's explore further.
Exploring Token Attributes¶
Tokens have many more useful attributes. Let’s explore a few:
doc = nlp("The quick brown foxes are jumping over the lazy dogs.")
print("=== EXTENDED TOKEN INFO ===")
for token in doc:
print(f"{token.text:10} | lemma: {token.lemma_:10} | is_stop: {token.is_stop} | is_alpha: {token.is_alpha}")=== EXTENDED TOKEN INFO ===
The | lemma: the | is_stop: True | is_alpha: True
quick | lemma: quick | is_stop: False | is_alpha: True
brown | lemma: brown | is_stop: False | is_alpha: True
foxes | lemma: fox | is_stop: False | is_alpha: True
are | lemma: be | is_stop: True | is_alpha: True
jumping | lemma: jump | is_stop: False | is_alpha: True
over | lemma: over | is_stop: True | is_alpha: True
the | lemma: the | is_stop: True | is_alpha: True
lazy | lemma: lazy | is_stop: False | is_alpha: True
dogs | lemma: dog | is_stop: False | is_alpha: True
. | lemma: . | is_stop: False | is_alpha: False
lemma_: The base form of the word (“foxes” → “fox”, “jumping” → “jump”)
is_stop: Is this a common stop word (the, a, is, etc.)?
is_alpha: Does the token consist only of alphabetic characters?
These attributes become crucial for text preprocessing, which we’ll cover in depth next week.
Wrap-Up¶
What We Covered Today¶
The history of NLP — from rule-based dreams to the transformer revolution
Why each paradigm shift mattered — data beats rules, learned features beat hand-crafted ones, scale unlocks emergence
NLP applications — everywhere in tech, healthcare, finance, legal
SpaCy fundamentals — tokens, entities, sentences, and their attributes
What’s Next¶
In the next session, you’ll apply what you’ve learned in a hands-on lab. You’ll analyze text from different domains, discover what SpaCy gets right and wrong, and start building intuition for the challenges of real-world NLP.