This introduction to Natural Language Processing, or NLP for short, describes a central problems of artificial intelligence. NLP is focused on the interactions between human language and computers, and it sits at the intersection of computer science, artificial intelligence, and computational linguistics.
What is Natural Language Processing?
NLP is used to analyze text, allowing machines to understand how human’s speak. This human-computer interaction enables real-world applications like automatic text summarization, sentiment analysis, topic extraction, named entity recognition, parts-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, and automated question answering.
NLP is characterized as a hard problem in computer science. Human language is rarely precise, or plainly spoken. To understand human language is to understand not only the words, but the concepts and how they’re linked together to create meaning. Despite language being one of the easiest things for humans to learn, the ambiguity of language is what makes natural language processing a difficult problem for computers to master.
Natural Language Processing Algorithms
NLP algorithms are typically based on machine learning algorithms. Instead of hand-coding large sets of rules, NLP can rely on machine learning to automatically learn these rules by analyzing a set of examples (i.e. a large corpus, like a book, down to a collection of sentences), and making a statical inference. In general, the more data analyzed, the more accurate the model will be.
Popular Open Source NLP Libraries:
- Apache OpenNLP: a machine learning toolkit that provides tokenizers, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, coreference resolution, and more.
- Natural Language Toolkit (NLTK): a Python library that provides modules for processing text, classifying, tokenizing, stemming, tagging, parsing, and more.
- Standford NLP: a suite of NLP tools that provide part-of-speech tagging, the named entity recognizer, coreference resolutionsystem, sentiment analysis, and more.
- MALLET: a Java package that provides Latent Dirichlet Allocation, document classification, clustering, topic modeling, information extraction, and more.
These libraries provide the algorithmic building blocks of NLP in real-world applications. Algorithmia provides a free API endpoint for many of these algorithms, without ever having to setup or provision servers and infrastructure.
A Few NLP Examples:
- Use Summarizer to automatically summarize a block of text, exacting topic sentences, and ignoring the rest.
- Generate keyword topic tags from a document using LDA (Latent Dirichlet Allocation), which determines the most relevant words from a document. This algorithm is at the heart of the Auto-Tag and Auto-Tag URL microservices.
- Sentiment Analysis, based on StanfordNLP, can be used to identify the feeling, opinion, or belief of a statement, from very negative, to neutral, to very positive. Often, developers with use an algorithm to identify the sentiment of a term in a sentence, or use sentiment analysis to analyze social media.
NLP algorithms can be extremely helpful for web developers, providing them with the turnkey tools needed to create advanced applications, and prototypes.
Natural Language Processing Tutorials
Recommended NLP Books
- Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
“This is a book about Natural Language Processing. By “natural language” we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. At one extreme, it could be as simple as counting word frequencies to compare different writing styles.
- Speech and Language Processing, 2nd Edition 2nd Edition
“An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this text takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. The authors cover areas that traditionally are taught in different courses, to describe a unified vision of speech and language processing.”
- Introduction to Information Retrieval
“As recently as the 1990s, studies showed that most people preferred getting information from other people rather than from information retrieval systems. However, during the last decade, relentless optimization of information retrieval effectiveness has driven web search engines to new quality levels where most people are satisfied most of the time, and web search has become a standard and often preferred source of information finding. For example, the 2004 Pew Internet Survey (Fallows, 2004) found that 92% of Internet users say the Internet is a good place to go for getting everyday information.” To the surprise of many, the field of information retrieval has moved from being a primarily academic discipline to being the basis underlying most people’s preferred means of information access.”
- Natural Language Processing With Python and NLTK p.1 Tokenizing words and Sentences
- NLP Fundamentals: What is Natural Language Processing?
- Stanford Natural Language Processing on Coursera
“This course covers a broad range of topics in natural language processing, including word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, and question answering, We will also introduce the underlying theory from probability, statistics, and machine learning that are crucial for the field, and cover fundamental algorithms like n-gram language modeling, naive bayes and maxent classifiers, sequence models like Hidden Markov Models, probabilistic dependency and constituent parsing, and vector-space models of meaning.”
- Stanford Machine Learning on Coursera
“Machine learning is the science of getting computers to act without being explicitly programmed. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself.”
- Udemy’s Introduction to Natural Language Processing
“This course introduces Natural Language Processing through the use of python and the Natural Language Tool Kit. Through a practical approach, you’ll get hands on experience working with and analyzing text. As a student of this course, you’ll get updates for free, which include lecture revisions, new code examples, and new data projects.”
- Certificate in Natural Language Technology
“When you talk to your mobile device or car navigation system – or it talks to you – you’re experiencing the fruits of developments in natural language processing. This field, which focuses on the creation of software that can analyze and understand human languages, has grown rapidly in recent years and now has many technological applications. In this three-course certificate program, we’ll explore the foundations of computational linguistics, the academic discipline that underlies NLP.”
Related NLP Topics
- Six Natural Language Processing Algorithms for Web Developers
- Getting Started With Natural Language Processing (NLP)
- A curated list of speech and natural language processing resources
- NLP research group at google
- General Introduction to NLP
- Natural language processing: an introduction
- Stanford CS 224D Video: Deep Learning for Natural Language Processing
- CS 388: Natural Language Processing
- COMS W4705: Natural Language Processing
- CS 674: Natural Language Processing
- CS918 Natural Language Processing
- Everything You Need to Know about Natural Language Processing