All posts in Algorithm Spotlight

From Crawling to Sprinting: Advances in Natural Language Processing 

Gif highlighting different Natural Language Processing branches e.g. prediction, translation

Natural language processing (NLP) is one of the fastest evolving branches in machine learning and among the most fundamental. It has applications in diplomacy, aviation, big data sentiment analysis, language translation, customer service, healthcare, policing and criminal justice, and countless other industries.

NLP is the reason we’ve been able to move from CTRL-F searches for single words or phrases to conversational interactions about the contents and meanings of long documents. We can now ask computers questions and have them answer. 

Algorithmia hosts more than 8,000 individual models, many of which are NLP models and complete tasks such as sentence parsing, text extraction and classification, as well as translation and language identification. 

Allen Institute for AI NLP Models on Algorithmia 

The Allen Institute for Artificial Intelligence (Ai2), is a non-profit created by Microsoft co-founder Paul Allen. Since its founding in 2013, Ai2 has worked to advance the state of AI research, especially in natural language applications. We are pleased to announce that we have worked with the producers of AllenNLP—one of the leading NLP libraries—to make their state-of-the-art models available with a simple API call in the Algorithmia AI Layer.

Among the algorithms new to the platform are:

  • Machine Comprehension: Input a body of text and a question based on it and get back the answer (strictly a substring of the original body of text).
  • Textual Entailment: Determine whether one statement follows logically from another
  • Semantic role labeling: Determine “who” did “what” to “whom” in a body of text

These and other algorithms are based on a collection of pre-trained models that are published on the AllenNLP website.  

Algorithmia provides an easy-to-use interface for getting answers out of these models. The underlying AllenNLP models provide a more verbose output, which is aimed at researchers who need to understand the models and debug their performance—this additional information is returned if you simply set debug=True.

The Ins and Outs of the AllenNLP Models 

Machine Comprehension: Create natural-language interfaces to extract information from text documents. 

This algorithm provides the state-of-the-art ability to answer a question based on a piece of text. It takes in a passage of text and a question based on that passage, and returns a substring of the passage that is guessed to be the correct answer.

This model could feature into the backend of a chatbot or provide customer support based on a user’s manual. It could also be used to extract structured data from textual documents, such as a collection of doctors’ reports could be turned into a table that says (for every report) the patient’s concern, what the patient should do, and when they should schedule a follow-up appointment.

Machine Comprehension screenshot

Screenshot from Machine Comprehension model on Algorithmia.


Entailment: This algorithm provides state-of-the-art natural language reasoning. It takes in a premise, expressed in natural language, and a hypothesis that may or may not follow up from. It determines whether the hypothesis follows from the premise, contradicts the premise, or is unrelated. The following is an example:

Input
The input JSON blob should have the following fields:

  • premise: a descriptive piece of text
  • hypothesis: a statement that may or may not follow from the premise of the text

Any additional fields will pass through into the AllenNLP model.

Output
The following output field will always be present:

  • contradiction: Probability that the hypothesis contradicts the premise
  • entailment: Probability that the hypothesis follows from the premise
  • neutral: Probability that the hypothesis is independent from the premise
Entailment

Screenshot from Entailment model on Algorithmia.


Semantic role labeling: This algorithm provides state-of-the-art natural language reasoning—decomposing a sentence into a structured representation of the relationships it describes.

The concept of this algorithm is considering a verb and the entities involved in it as its arguments (like logical predicates). The arguments describe who or what does the action of this verb, to whom or what it is done, etc.

Semantic role labeling

Screenshot from Semantic role labeling model on Algorithmia.

NLP Moving Forward

NLP applications are rife in everyday life, and applications will only continue to expand and improve because the possibilities of a computer understanding written and spoken human language and executing on it are endless. 

CTA for browsing NLP algorithms

Adding multilingual support to any algorithm: pre-translation in NLP

We often get asked about if we’re planning on adding any non-English NLP algorithms. As much as we would love to train NLP models on other languages, there aren’t many usable training datasets in these languages. And, due to the linguistic structure of these languages, training with pre-existing approaches doesn’t always give the best results.

Until better training sets can be generated, one passable solution is to translate the text to English before sending it to the algorithm. Read More…

Introduction to Character Recognition

This is easy to understand, right?

easy ocr

How about this? A bit harder?

moderate natural

Are you able to decipher this one at all?

hard natural
courtesy of Faris Algosaibi

The first example can be easily recognized by most character recognition algorithms. However, as your text gets progressively more complex, this seemingly simple task becomes more and more difficult for even the best machine learning algorithms to successfully complete. Read More…

Traveling Salesman by API

Traveling Salesman is one of the classic NP-Hard problems: finding the optimal solution can take a long time, but there are some great shortcuts available which come close! Algorithmia now brings you a fast, near-optimal way to find the fastest route through multiple cities, thanks to the power of Genetic Algorithms and easily-accessible APIs. Read More…