Part of making algorithms more discoverable is creating meta-data tags to classify them. Often sites will allow users to pick their own tags but what if the content had already been generated? This is the problem we faced when trying to tag all the algorithms in our API. Each algorithm had a description page and we believed that using some simple machine learning algorithms already in our API we could generate tags for each one.
By generating tag data, it becomes easy to classify documents, make recommendations, optimize SEO, etc. Below we show how we approached this task, using HackerNews as an example data source.
So how did we do this? Our secret sauce is that Algorithmia is designed to make it extremely easy to combine algorithms, to create a pipeline for processing and generating tags for almost any site.
- Given a site, pull the data and iterate over every link
- Extract the text from each linked page
- Run the text though topic analysis algorithm (such as Latent Dirichlet Allocation)
- Return tagged data
- Render tags next to links
All this really is, is a pipeline of algorithms (plus some clever front-end js :P). In most cases, this would require some serious code stitching and algorithm development, but most of the components already existed in the Algorithmia API, and the solution was extremely clean, check it out:
This also works for arbitrary webpages. Enter a URL below to automatically generate tags for it:
Once we had these its becomes easy to classify algorithms, make recommendations and relate one algorithm to another. We’ll leave that for the next post…
Get topic tags for your site dynamically – contact us !
Highly complex algorithm development is often at the core of innovative software products. The most famous algorithm of the last decade led to the development of the $396 billion dollar titan Google, which changed our lives forever with a simple search box.
Today, an estimated 80% of equity trading in the U.S. is done using automated algorithms. Even our phones use algorithms to figure out where we are, what we are doing, and what we might want to do next. A few days ago a company, DeepMind, that developed the next generation of Artificial Intelligence algorithms was bought for a record sum. These examples are just scratching the surface of what algorithms can do.
As the volumes of data we collect on a day-to-day basis grow, data architects are on a race to have more, better, and faster hardware; yet Moore’s Law is showing a slow down, indicating that throwing more hardware at a data problem is no longer going to fulfill the needs of organizations. As organizations are increasingly and continuously hungry for understanding on how they operate and how they can improve, they must concentrate more on the efficiency and quality of the underlying algorithms that compose suites of data analysis packages versus the hardware they run on.
The Current State of Algorithm Development
It’s safe to say that on any given day there are thousands of brilliant computer scientists developing the state of the art and pushing the limits of software, yet there lies a serious problem: Algorithms are being developed all the time, but are not getting into the hands of people and applications that could benefit from them.
- A vast majority of algorithmic developments occurs in one of two places: either private Research and Development programs (e.g., Microsoft Research, Google, etc.) or in one of more than 320+ academic institutions with computer science departments around the world. The output of these types of research and development is almost always private white papers or academic papers.
- Companies and application developers have a hard time taking advantage of advanced algorithms. Academic papers are hard to find and even harder to implement. Even in the best case, when published code is available, more time is spent setting up infrastructure and fighting with undocumented libraries and dependencies than actually doing any algorithm development.
For example, a while back I was researching Latent Dirichlet Allocation. In layman’s terms, it is an algorithm that allows for the extraction of topics from documents without an understanding of the language itself. A quick search on Google brought up a dozen research papers and a couple of libraries where this algorithm had been implemented. So where do I start? I guess I could go ahead and implement one of the libraries, but which one? Do I even know if this is going to work? Once I do figure it out will it work in my current system?
The Birth of Algorithmia
Algorithmia was born out of frustration with these problems and the current state of algorithm development and deployment. Algorithmia is moving away from developers working in isolation and toward providing a community for algorithm developers to share knowledge, test algorithms, and run them directly in their applications. What’s different is that every algorithm is live, so no more spending time finding the right libraries, compilers, and/or virtual machines—Algorithmia takes care of all of that for you. Find a useful algorithm in our API? Great, use that in your own applications.
Algorithmia is a live, crowd-sourced algorithm API. Our goal with Algorithmia is to make applications smarter, by building a community around algorithm development, where state-of-the-art algorithms are always live and accessible to anyone.
Intrigued? Come check us out.