As many applied ML practitioners know, productionizing ML tools can be deceptively difficult.
At Algorithmia we’re always striving to make our algorithms the best in class, and we’ve recently made a series of performance and UX changes to our Document Classifier algorithm, and put work towards generalizing it to other problem spaces outside of NLP. These changes were dramatic; we reduced our lookup time from
O(log(n)) and drastically improved the user experience by reducing unnecessary clutter, but it was far from easy.
Lets play a game: can you tell the difference between these two sentences?
“Most of the time, travellers worry about their luggage.”
“Most of the time travellers worry about their luggage.”
Whoa, remove the comma and all of a sudden we’re having an entirely different conversation!
The little nuances of language can be hard enough for a human to understand, let alone a computer! How could we possibly teach a computer to understand the difference?
The new Open Images dataset gives us everything we need to train computer vision models, and just happens to be perfect for a demo! Tensorflow’s Object Detection API and its ability to handle large volumes of data make it a perfect choice, so let’s jump right in…
This is easy to understand, right?
How about this? A bit harder?
Are you able to decipher this one at all?
courtesy of Faris Algosaibi
The first example can be easily recognized by most character recognition algorithms. However, as your text gets progressively more complex, this seemingly simple task becomes more and more difficult for even the best machine learning algorithms to successfully complete. Read More…
Figuring out the meaning of a document was once a very hard problem for computers to solve… even for humans, understanding the complexity of natural language can be tricky!
Fortunately, there are some great tools that can help address those concerns. The Document Classifier turns your existing documents and associated keywords into a model which can be used to predict the most appropriate keywords for new blocks of text. Read More…