Natural language processing (NLP) is one of the fastest evolving branches in machine learning and among the most fundamental. It has applications in diplomacy, aviation, big data sentiment analysis, language translation, customer service, healthcare, policing and criminal justice, and countless other industries.
NLP is the reason we’ve been able to move from CTRL-F searches for single words or phrases to conversational interactions about the contents and meanings of long documents. We can now ask computers questions and have them answer.
Algorithmia hosts more than 8,000 individual models, many of which are NLP models and complete tasks such as sentence parsing, text extraction and classification, as well as translation and language identification.
Allen Institute for AI NLP Models on Algorithmia
The Allen Institute for Artificial Intelligence (Ai2), is a non-profit created by Microsoft co-founder Paul Allen. Since its founding in 2013, Ai2 has worked to advance the state of AI research, especially in natural language applications. We are pleased to announce that we have worked with the producers of AllenNLP—one of the leading NLP libraries—to make their state-of-the-art models available with a simple API call in the Algorithmia AI Layer.
Among the algorithms new to the platform are:
- Machine Comprehension: Input a body of text and a question based on it and get back the answer (strictly a substring of the original body of text).
- Textual Entailment: Determine whether one statement follows logically from another
- Semantic role labeling: Determine “who” did “what” to “whom” in a body of text
These and other algorithms are based on a collection of pre-trained models that are published on the AllenNLP website.
Algorithmia provides an easy-to-use interface for getting answers out of these models. The underlying AllenNLP models provide a more verbose output, which is aimed at researchers who need to understand the models and debug their performance—this additional information is returned if you simply set debug=True.
The Ins and Outs of the AllenNLP Models
Machine Comprehension: Create natural-language interfaces to extract information from text documents.
This algorithm provides the state-of-the-art ability to answer a question based on a piece of text. It takes in a passage of text and a question based on that passage, and returns a substring of the passage that is guessed to be the correct answer.
This model could feature into the backend of a chatbot or provide customer support based on a user’s manual. It could also be used to extract structured data from textual documents, such as a collection of doctors’ reports could be turned into a table that says (for every report) the patient’s concern, what the patient should do, and when they should schedule a follow-up appointment.
Entailment: This algorithm provides state-of-the-art natural language reasoning. It takes in a premise, expressed in natural language, and a hypothesis that may or may not follow up from. It determines whether the hypothesis follows from the premise, contradicts the premise, or is unrelated. The following is an example:
The input JSON blob should have the following fields:
- premise: a descriptive piece of text
- hypothesis: a statement that may or may not follow from the premise of the text
Any additional fields will pass through into the AllenNLP model.
The following output field will always be present:
- contradiction: Probability that the hypothesis contradicts the premise
- entailment: Probability that the hypothesis follows from the premise
- neutral: Probability that the hypothesis is independent from the premise
Semantic role labeling: This algorithm provides state-of-the-art natural language reasoning—decomposing a sentence into a structured representation of the relationships it describes.
The concept of this algorithm is considering a verb and the entities involved in it as its arguments (like logical predicates). The arguments describe who or what does the action of this verb, to whom or what it is done, etc.
NLP Moving Forward
NLP applications are rife in everyday life, and applications will only continue to expand and improve because the possibilities of a computer understanding written and spoken human language and executing on it are endless.
We often get asked about if we’re planning on adding any non-English NLP algorithms. As much as we would love to train NLP models on other languages, there aren’t many usable training datasets in these languages. And, due to the linguistic structure of these languages, training with pre-existing approaches doesn’t always give the best results.
Until better training sets can be generated, one passable solution is to translate the text to English before sending it to the algorithm. Read More…
This is easy to understand, right?
How about this? A bit harder?
Are you able to decipher this one at all?
courtesy of Faris Algosaibi
The first example can be easily recognized by most character recognition algorithms. However, as your text gets progressively more complex, this seemingly simple task becomes more and more difficult for even the best machine learning algorithms to successfully complete. Read More…
Your website publishes thousands of articles each day. Your writers create stories, embed images, and tag them for SEO purposes. It’s your job to share them out on social media… but you’re struggling to keep up with the volume.
After coming up with a snappy tagline, you still have to select the best image and crop it to different sizes for Facebook, Twitter, LinkedIn, and all the other networks. Using a batch image-cropper might remove something important from the photo — like Elon Musk’s face, or half of the car being featured — so you put in a lot of time cropping and resizing by hand.
What if you had an automated way of handling the image picking and cropping process? Well, there’s now an algorithm for that. Today we’ll talk about how we’ve managed to bring together many different algorithms into a single ensemble that can intelligently select, crop, and score images for social media sharing.
Traveling Salesman is one of the classic NP-Hard problems: finding the optimal solution can take a long time, but there are some great shortcuts available which come close! Algorithmia now brings you a fast, near-optimal way to find the fastest route through multiple cities, thanks to the power of Genetic Algorithms and easily-accessible APIs. Read More…