All posts by James Sutton

Challenges productionizing embedding engines

what is an embedding

As many applied ML practitioners know, productionizing ML tools can be deceptively difficult.

At Algorithmia we’re always striving to make our algorithms the best in class, and we’ve recently made a series of performance and UX changes to our Document Classifier algorithm, and put work towards generalizing it to other problem spaces outside of NLP. These changes were dramatic; we reduced our lookup time from O(n) to O(log(n)) and drastically improved the user experience by reducing unnecessary clutter, but it was far from easy.

Read More…

Advanced grammar and Natural Language Processing with Syntaxnet

Parsey McParseface

Lets play a game: can you tell the difference between these two sentences?

“Most of the time, travellers worry about their luggage.”

“Most of the time travellers worry about their luggage.”

Whoa, remove the comma and all of a sudden we’re having an entirely different conversation!

The little nuances of language can be hard enough for a human to understand, let alone a computer! How could we possibly teach a computer to understand the difference?

Read More…

Introduction to Character Recognition

This is easy to understand, right?

easy ocr

How about this? A bit harder?

moderate natural

Are you able to decipher this one at all?

hard natural
courtesy of Faris Algosaibi

The first example can be easily recognized by most character recognition algorithms. However, as your text gets progressively more complex, this seemingly simple task becomes more and more difficult for even the best machine learning algorithms to successfully complete. Read More…

Train a Machine to Turn Documents into Keywords, via Document Classification

Figuring out the meaning of a document was once a very hard problem for computers to solve… even for humans, understanding the complexity of natural language can be tricky!

Fortunately, there are some great tools that can help address those concerns. The Document Classifier turns your existing documents and associated keywords into a model which can be used to predict the most appropriate keywords for new blocks of text. Read More…