This article explains how to utilize the WordNet lexical database within the NLTK (Natural Language Toolkit) framework.

We will cover the fundamental use of WordNet, including finding synonyms, antonyms, hypernyms, hyponyms, and holonyms for words. Additionally, we will explore how to determine the similarities between two words.

WordNet, a network of words, connects terms through various linguistic relationships such as synonyms, hypernyms, and hyponyms. It encompasses a vast collection of English vocabulary, where words are interconnected and organized into sets based on their meanings.

Nouns, verbs, adjectives, and adverbs are organized into groups of cognitive synonyms called synsets, with each synset representing a unique concept. These synsets are connected through various conceptual-semantic and lexical relationships.

WordNet is part of the NLTK corpus.

Loading WordNet Corpus

Here, we search for a specific word.

from nltk.corpus import wordnet as wn

print (wn.synsets('good'))
'''
Output:

[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('good.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('dear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.02')]
'''
    

The synsets function returns various forms of the specified word "good". A synset is a collection of synonyms for the given word that convey a similar meaning. Synsets are identified by a 3-part naming convention in the format: word.pos.nn.

The synsets function also accepts a second parameter, which specifies the part of speech (POS) tag for the word.

The part of speech (POS) tags are as follows: ‘a’ for adjectives (ADJ), ‘s’ for adjective satellites (ADJ_SAT), ‘r’ for adverbs (ADV), ‘n’ for nouns (NOUN), and ‘v’ for verbs (VERB). The tag ADJ_SAT stands for Adjective Satellite.

# print (wn.synsets('good', pos=wn.NOUN))
print (wn.synsets('good', pos='n'))
'''
Output:
        
[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01')]
'''
        
my_word = wn.synset('good.n.01') 
print (my_word.definition()) # Output: benefit
print (my_word.examples())
'''
Output:
        
['for your own good', "what's the good of worrying?"]
'''
        
my_word = wn.synset('good.n.02') 
print (my_word.definition()) # Output: moral excellence or admirableness
print (my_word.examples())
'''
Output:
        
['there is much good to be found in people']
'''
        
my_word = wn.synset('good.n.03') 
print (my_word.definition()) # Output: that which is pleasing or valuable or useful
print (my_word.examples())
'''
Output:
        
['weigh the good against the bad', 'among the highest goods of all are happiness and self-realization']
'''
        
my_word = wn.synset('good.a.01') 
print (my_word.definition()) # Output: having desirable or positive qualities especially those suitable for a thing specified
print (my_word.examples())
'''
Output:
        
['a good report card', 'when she was good she was very very good', 'a good knife is one good for cutting', 'this stump will make a good picnic table', 'a good check', 'a good joke', 'a good exterior paint', 'a good secretary', 'a good dress for the office']
'''
        
my_word = wn.synset('good.a.03') 
print (my_word.definition()) # Output: morally admirable
print (my_word.examples()) # Output: []
    

SYNONYMS & ANTONYMS

We can utilize the lemmas() function of the synset to obtain synonyms for that specific synset.

Synonyms

my_word = wn.synset('good.n.01') 
print (my_word.lemmas()) # Output: [Lemma('good.n.01.good')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[0].antonyms()) # Output: []

my_word = wn.synset('good.n.02') 
print (my_word.lemmas()) # Output: [Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[1].name()) # Output: goodness
    

Antonyms

First, we identify the synonyms of a given word using the lemmas() function. Then, we can determine the antonyms for each of those synonyms.

my_word = wn.synset('good.n.02')
print (my_word.lemmas()) # Output: [Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]

print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[0].antonyms()) # Output: [Lemma('evil.n.03.evil')]
print (my_word.lemmas()[0].antonyms()[0].name()) # Output: evil

print (my_word.lemmas()[1].name()) # Output: goodness
print (my_word.lemmas()[1].antonyms()) # Output: [Lemma('evil.n.03.evilness')]
print (my_word.lemmas()[1].antonyms()[0].name()) # Output: evilness

SIMILARITY BETWEEN TWO WORDS

NLTK offers various methods to measure similarity. These include:

  • Path Similarity: Provides a similarity score between two word senses by calculating the shortest path that links them within the is-a (hypernym/hyponym) hierarchy.
  • Leacock-Chodorow (LCH) Similarity: Returns a similarity score for two word senses based on the shortest path between them (similar to Path Similarity) and the maximum depth of the taxonomy where these senses exist.
  • Wu-Palmer (WUP) Similarity: Offers a similarity score by considering the depth of both word senses in the taxonomy and the depth of their closest shared ancestor (Least Common Subsumer).
  • Resnik (RES) Similarity: Provides a similarity score based on the Information Content (IC) of the closest shared ancestor (Least Common Subsumer) of the two word senses.
  • Jiang-Conrath (JCN) Similarity: Calculates a similarity score by combining the Information Content (IC) of both the closest shared ancestor (Least Common Subsumer) and the two input word senses.
  • Lin Similarity: Returns a similarity score by considering the Information Content (IC) of the closest shared ancestor (Least Common Subsumer) and the two input word senses.
print (wn.synsets('bad'))
'''
Output:

[Synset('bad.n.01'), Synset('bad.a.01'), Synset('bad.s.02'), Synset('bad.s.03'), Synset('bad.s.04'), Synset('regretful.a.01'), Synset('bad.s.06'), Synset('bad.s.07'), Synset('bad.s.08'), Synset('bad.s.09'), Synset('bad.s.10'), Synset('bad.s.11'), Synset('bad.s.12'), Synset('bad.s.13'), Synset('bad.s.14'), Synset('badly.r.05'), Synset('badly.r.06')]
'''

word_1 = wn.synset('good.n.01')
word_2 = wn.synset('bad.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.666666666667
print (word_2.wup_similarity(word_1)) # Output: 0.666666666667

word_1 = wn.synset('good.n.01')
word_2 = wn.synset('evil.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.25

word_1 = wn.synset('bad.n.01')
word_2 = wn.synset('evil.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.285714285714

print (wn.synsets('eat'))
'''
Output:

[Synset('eat.v.01'), Synset('eat.v.02'), Synset('feed.v.06'), Synset('eat.v.04'), Synset('consume.v.05'), Synset('corrode.v.01')]
'''

print (wn.synsets('sleep'))
'''
Output:

[Synset('sleep.n.01'), Synset('sleep.n.02'), Synset('sleep.n.03'), Synset('rest.n.05'), Synset('sleep.v.01'), Synset('sleep.v.02')]
'''

word_1 = wn.synset('eat.v.01')
word_2 = wn.synset('sleep.v.01')
print (word_1.wup_similarity(word_2)) # Output: 0.25


word_1 = wn.synset('dog.n.01')
word_2 = wn.synset('cat.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.857142857143
print (word_1.path_similarity(word_2)) # Output: 0.2
print (word_1.lch_similarity(word_2)) # Output: 2.02814824729

word_1 = wn.synset('ship.n.01')
word_2 = wn.synset('boat.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.909090909091
print (word_1.path_similarity(word_2)) # Output: 0.333333333333
print (word_1.lch_similarity(word_2)) # Output: 2.53897387106
    

HYPERNYMS, HYPONYMS, & HOLONYMS

All synsets are linked to one another through different semantic relationships. Some examples of these relationships are:

  • Hypernyms: Y is a hypernym of X if every X is a (kind of) Y.
  • Hyponyms: Y is a hyponym of X if every Y is a (kind of) X.
  • Holonyms: Y is a holonym of X if X is a part of Y.

In below example code, we can see the following:

  1. Canine is another term for Dog.
    According to the definition of a hypernym, Canine (Y) is a hypernym of Dog (X) because every Dog (X) is a type of Canine (Y).
  2. Basenji is a breed of hunting dog.
    Following the definition of a hyponym, Basenji (Y) is a hyponym of Dog (X) because every Basenji (Y) is a type of Dog (X).
  3. Canis is a genus in the Canidae family that includes species such as wolves, dogs, and coyotes. These species are known for their moderate to large size, strong skulls and teeth, long legs, and relatively short ears and tails.
    Based on the definition of a holonym, Canis (Y) is a holonym of Dog (X) because Dog (X) is a part of the Canis (Y) genus.
dog = wn.synset('dog.n.01')

print (dog.hypernyms())
'''
Output:
        
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
'''
        
print (dog.hyponyms())
'''
Output:
        
[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]
'''
        
print (dog.member_holonyms())
'''
Output:
        
[Synset('canis.n.01'), Synset('pack.n.06')]
'''