What Is Lemmatization?

While people with linguistic background might be familiar with the term “Lemmatization” or “Lemmatize“, for the rest of us, it is not one of the most common terms we use in our every day life.

What is Lemmatization? 

Lemmatization is closely related to stemming. In linguistics, it is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Putting an example to the definition, “computers” is an inflected form of “computer”, the same logic as “dogs” being an inflected form of “dog”.

In simple words, I would explain lemmatization as returning different forms of a single word to its root form. Given that the both examples that I gave are nouns, do not be confused that it is only applicable to nouns. Lemmatization works the same way for adjectives, action verbs, all the same. Such as:

  • Constructing – (Lemmatization) -> Construct
  • Extracts – (Lemmatization) -> Extract
  • Singing – (Lemmatization) -> Sing

It may seem pretty straightforward at a glance. But confusion sets in when dealing with words like “Worker” and “Speaker”. “Worker” is not an inflected form of “Work“, neither is “Speaker” an inflected form of “Speak”. That is because “Speaker” (noun), is someone who talks (especially someone who delivers a public speech or someone especially garrulous), while “Speak” (verb) is the act of giving a speech. In instances such as the above example, even though the word seemingly takes on the basic root of a certain word, it should not be confused as the inflected term.

The simple rule is to remember that Lemmatization changes the verb form, while keeping the meaning of the word the same.

Now that you’ve learn the basic concept of Lemmatization, try it out at  Twinword Lemmatizer API demo page. Simply copy and paste in text extracts and let the Lemmatizer API return the lemmatized form of the words!


Let us know in the comments below what roots you extracted !

Comment Below

Your email address will not be published. Required fields are marked *