Unstructured text content is rich with information, but it’s not always easy to find what’s relevant to you.
With the enormous amount of data that comes from social media, email, blogs, news and academic articles, it becomes increasingly hard to extract, categorize, and learn from that information.
Named Entity Recognition is an algorithm that extracts information from unstructured text data and categorizes it into groups. For example, if there’s a mention of “San Diego” in your data, named entity recognition would classify that as “Location.”
Both algorithms are accessible as API endpoints for seamless integration with your application or data science pipeline. These algorithms can be utilized with a few lines of code so you can take advantage of our scalable serverless architecture.
What is Named Entity Recognition?
Named Entity Recognition is a form of text mining that sifts through unstructured text data and locates noun phrases called named entities. Named entities can then be organized under predefined categories, such as “person,” “organization,” “location,” “number,” or “duration.”
There are a wide variety of use cases that all use name entity recognition. One such goal is to make information easier to locate. This is done by first locating named entities and then categorizing them under different labels. After that step, the data can be aggregated under those labels for easy information retrieval. For instance, job listing data could have categories like “organization” or “location.” Users could then search by those specific categories.
Another use of named entity recognition is to implement it as a first step in the information retrieval process. After named entities are discovered and categorized, patterns between the different entities can be found. For example, a category “Location” with the entity “San Diego,” and the category “Organization” with the entity “Democratic” could be used to discover demographic information of peoples association with political parties.
Why You Need Named Entity Recognition
You shouldn’t have to spend all your time setting up complicated environments or worrying about installing and managing dependencies to get a library to work.
By wrapping these libraries as API endpoints, developers have access to named entity recognition algorithms that perform well at scale. Using our implementations of Stanford CoreNLP and Apache OpenNLP saves developers time without having to stress about server maintenance or downtime.
How to Categorize Data Using Named Entity Recognition
For this example we will show how to use the Stanford CoreNLP named entity recognition algorithm with Scala, but you could call it using any of our supported clients.
To get started using the algorithm, you’ll need a free API key from Algorithmia.
Sample API Call:
import com.algorithmia._ import com.algorithmia.algo._ val input = """"Incredible, Chicago city officials estimate that there's a record 5 million people at Cubs parade"""" val client = Algorithmia.client("your_api_key") val algo = client.algo("algo://StanfordNLP/NamedEntityRecognition/0.1.1") val result = algo.pipeJson(input) System.out.println(result.asJsonString)
[ [ ["Chicago", "LOCATION"], ["5", "NUMBER"], ["million", "NUMBER"], ["Cubs", "ORGANIZATION"] ] ]
And there you go, your entities are properly categorized in only a few lines of code without having to install dependencies or be responsible for the environment the algorithm runs in. No provisioning servers or worrying about scalability since we take care of all that for you.
Let us know what you think @Algorithmia.