Last week, we learnt about Lemmatization. (Post: What Is Lemmatization?) Today, we’ll be focusing on the actual application of the Lemmatizer API.
English is one of the most widely used language in the world, having over 335 millions native speakers. (Source: Ethnologue, 2014) Taking into consideration this number only represents people speaking English as a first language, the actual number of English speakers (including people adopting English as a second/third language) far exceeds the figure mentioned above.
Depending on where you are from, the English might be a little different. Accent and slang aside, there are times when a word can have two different spelling; the British English or the American English way of spelling. Listing a few examples:
How can we program the computer to recognize the same word with different spellings? The solution is the integration of the Lemmatizer API. Not just simply returning words to their root form, the Lemmatizer API also recognizes “Colour” = “Color”, “Organise” = “Organize”.
PS: Did you know that “is”, “was” and “were” are inflected forms of “be”?