Last week, we learnt about Lemmatization. (Post: What Is Lemmatization?) Today, we’ll be focusing on the actual application of the Lemmatizer API.
English is one of the most widely used language in the world, having over 335 millions native speakers. (Source: Ethnologue, 2014) Taking into consideration this number only represents people speaking English as a first language, the actual number of English speakers (including people adopting English as a second/third language) far exceeds the figure mentioned above.
Depending on where you are from, the English might be a little different. Accent and slang aside, there are times when a word can have two different spelling; the British English or the American English way of spelling. Listing a few examples:
How can we program the computer to recognize the same word with different spellings? The solution is the integration of the Lemmatizer API. Not just simply returning words to their root form, the Lemmatizer API also recognizes “Colour” = “Color”, “Organise” = “Organize”.
USES OF LEMMATIZER API
- Search Engines/Tools/Extension
Lemmatization is very useful for search software. For example searching for “big dogs” will trigger the search for “big dog”, “Theatre in San Francisco” will trigger the search for “Theater in San Francisco” and etc. Not just search engines, search tools are also commonly built with the same functionality. Our advance search chrome extension Twinword Finder, is one example that recognizes lemmatized terms, enabling you to conduct efficient search through web pages. (Read more about Twinword Finder here: [Press Release] Twinword Finder)
- Educational Software/Applications
The Lemmatizer API is also applicable for English learning software or applications. It could be for building an app that requires the learner to identify and recognize the different forms of a same word and match them to the root form. For instance, recognizing “goes”, “gone” and “went” as inflected forms of the root verb “go”.
- Text Analysis
With the rising importance of text analysis for businesses, developing a comprehensive text analytical tool is essential to make the former possible. Check out the demo page of our Lemmatizer API to get what I mean!
PS: Did you know that “is”, “was” and “were” are inflected forms of “be”?