Algorithmia Blog

Introduction to Natural Language Processing (NLP)

Natural Language Processing Summary

The field of study that focuses on the interactions between human language and computers is called Natural Language Processing, or NLP for short. It sits at the intersection of computer science, artificial intelligence, and computational linguistics (Wikipedia).

“Nat­ur­al Lan­guage Pro­cessing is a field that cov­ers com­puter un­der­stand­ing and ma­nip­u­la­tion of hu­man lan­guage, and it’s ripe with pos­sib­il­it­ies for news­gath­er­ing,” Anthony Pesce said in Natural Language Processing in the kitchen. “You usu­ally hear about it in the con­text of ana­lyz­ing large pools of legis­la­tion or other doc­u­ment sets, at­tempt­ing to dis­cov­er pat­terns or root out cor­rup­tion.”

 

What is Natural Language Processing?

NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

“Apart from common word processor operations that treat text like a mere sequence of symbols, NLP considers the hierarchical structure of language: several words make a phrase, several phrases make a sentence and, ultimately, sentences convey ideas,” John Rehling, an NLP expert at Meltwater Group, said in How Natural Language Processing Helps Uncover Social Media Sentiment. “By analyzing language for its meaning, NLP systems have long filled useful roles, such as correcting grammar, converting speech to text and automatically translating between languages.”

NLP is used to analyze text, allowing machines to understand how human’s speak. This human-computer interaction enables real-world applications like automatic text summarization, sentiment analysis, topic extraction, named entity recognition, parts-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, and automated question answering.

NLP is characterized as a hard problem in computer science. Human language is rarely precise, or plainly spoken. To understand human language is to understand not only the words, but the concepts and how they’re linked together to create meaning. Despite language being one of the easiest things for humans to learn, the ambiguity of language is what makes natural language processing a difficult problem for computers to master.

What Can Developers Use NLP Algorithms For?

NLP algorithms are typically based on machine learning algorithms. Instead of hand-coding large sets of rules, NLP can rely on machine learning to automatically learn these rules by analyzing a set of examples (i.e. a large corpus, like a book, down to a collection of sentences), and making a statical inference. In general, the more data analyzed, the more accurate the model will be.

Open Source NLP Libraries

These libraries provide the algorithmic building blocks of NLP in real-world applications. Algorithmia provides a free API endpoint for many of these algorithms, without ever having to setup or provision servers and infrastructure.

A Few NLP Examples

NLP algorithms can be extremely helpful for web developers, providing them with the turnkey tools needed to create advanced applications, and prototypes.

Example Natural Language Processing Use Cases

NLP algorithms are typically based on machine learning algorithms. Instead of hand-coding large sets of rules, NLP can rely on machine learning to automatically learn these rules by analyzing a set of examples (i.e. a large corpus, like a book, down to a collection of sentences), and making a statical inference. In general, the more data analyzed, the more accurate the model will be.

Social media analysis is a great example of NLP use. Brands track conversations online to understand what customers are saying, and glean insight into user behavior.

“One of the most compelling ways NLP offers valuable intelligence is by tracking sentiment — the tone of a written message (tweet, Facebook update, etc.) — and tag that text as positive, negative or neutral,” Rehling said.

Build your own social media monitoring tool

  1. Start by using the algorithm Retrieve Tweets With Keyword to capture all mentions of your brand name on Twitter. In our case, we search for mentions of Algorithmia.
  2. Then, pipe the results into the Sentiment Analysis algorithm, which will assign a sentiment rating from 0-4 for each string (Tweet).

Similarly, Facebook uses NLP to track trending topics and popular hashtags.

“Hashtags and topics are two different ways of grouping and participating in conversations,” Chris Struhar, a software engineer on News Feed, said in How Facebook Built Trending Topics With Natural Language Processing. “So don’t think Facebook won’t recognize a string as a topic without a hashtag in front of it. Rather, it’s all about NLP: natural language processing. Ain’t nothing natural about a hashtag, so Facebook instead parses strings and figures out which strings are referring to nodes — objects in the network. We look at the text, and we try to understand what that was about.”

It’s not just social media that can use NLP to it’s benefit. Publishers are hoping to use NLP to improve the quality of their online communities by leveraging technology to “auto-filter the offensive comments on news sites to save moderators from what can be an ‘exhausting process’,” Francis Tseng said in Prototype winner using ‘natural language processing’ to solve journalism’s commenting problem.

Other practical uses of NLP include monitoring for malicious digital attacks, such as phishing, or detecting when somebody is lying.

Use NLP to build your own RSS reader

You can build a machine learning RSS reader in less than 30-minutes using the follow algorithms:

  1. ScrapeRSS to grab the title and content from an RSS feed.
  2. Html2Text to keep the important text, but strip all the HTML from the document.
  3. AutoTag uses Latent Dirichlet Allocation to identify relevant keywords from the text.
  4. Sentiment Analysis is then used to identify if the article is positive, negative, or neutral.
  5. Summarizer is finally used to identify the key sentences.

Recommended NLP Books for Beginners

NLP Tutorials

If you’re interested in learning more, this free introductory course from Stanford University will help you will learn the fundamentals of natural language processing, and how you can use it to solve practical problems.

Once you’ve gotten the fundamentals down, apply what you’ve learned using Python and NLTK, the most popular framework for Python NLP.

NLP Videos

NLP Courses