All posts by James Sutton

Document Classifier: use cases for your business

Image result for document classifier

Source: TDS

We recently went into detail about the Document Classifier algorithm in our spotlight. That’s all fine and good, but it’s not immediately clear what can you do with it.

In this post, we’ll focus on potential use cases. We’ll start with a quick refresher on what this algorithm does, and then look at concrete examples of real world problems that this algorithm can tackle – and why it makes sense for you to give it go. Read More…

Challenges productionizing embedding engines

what is an embedding

As many applied ML practitioners know, productionizing ML tools can be deceptively difficult.

At Algorithmia we’re always striving to make our algorithms the best in class, and we’ve recently made a series of performance and UX changes to our Document Classifier algorithm, and put work towards generalizing it to other problem spaces outside of NLP. These changes were dramatic; we reduced our lookup time from O(n) to O(log(n)) and drastically improved the user experience by reducing unnecessary clutter, but it was far from easy.

Read More…

Advanced grammar and Natural Language Processing with Syntaxnet

Parsey McParseface

Lets play a game: can you tell the difference between these two sentences?

“Most of the time, travellers worry about their luggage.”

“Most of the time travellers worry about their luggage.”

Whoa, remove the comma and all of a sudden we’re having an entirely different conversation!

The little nuances of language can be hard enough for a human to understand, let alone a computer! How could we possibly teach a computer to understand the difference?

Read More…

Introduction to Character Recognition

This is easy to understand, right?

easy ocr

How about this? A bit harder?

moderate natural

Are you able to decipher this one at all?

hard natural
courtesy of Faris Algosaibi

The first example can be easily recognized by most character recognition algorithms. However, as your text gets progressively more complex, this seemingly simple task becomes more and more difficult for even the best machine learning algorithms to successfully complete. Read More…