An algorithmic approach to GitHub exploration
If you recall from our last blog post we showed you how we used some of the algorithms in Algorithmia to generate topic tags for any URL. Internally, we used the topic generation algorithm to generate tags based on the algorithm’s description and, later, use these tags as part of our recommender algorithm.
With today’s post we want to show you how easy it is to integrate these type of recommender algorithms (an algorithm already available in Algorithmia) into your own workflow by showing it in action with one of our favorite developer tools, GitHub.
As GitHub has grown to almost 50 million repositories since 2011 (per our calculations), discovering new repositories has become less and less straight forward when navigating outside the most popular or featured repositories. With this in mind, we thought of a way to make it easier to tackle some of this complexity using some of the algorithms available in Algorithmia.
So here you go, give us any GitHub respository (as long as it has a README) and we will recommend other repositories based on the information we extract from the README:
How we built this
- The first step was to figure out the URL for every repository in GitHub. This might seem like a daunting task but the folks over at Github have been nice enough to make their entire data set available on Google Big Query. You can head over there and generate a list of every public repository since 2011.
- The second step, was to check every public repository’s readme file and run it through Algorithmia’s topic analysis algorithm.
- Once the topic analysis algorithm returns a set of tags, we save these tags to start generating the data model we will later use for our recommender algorithm as a mapping from URL -> [tags].
- Repeat this a couple million times (not the easiest task, but we had a virtually unlimited sized cluster to parallelize this task thanks to the Algorithmia platform), and voila a tagged data model of the entire GitHub world.
- Finally, it’s time to start working with our recommender algorithm. To start, we need to point it to the data model we built and then send the algorithm the new set of tags that we generated from the URL you provided us with. With two inputs, the recommender algorithm returns a number of relevant repositories based on the tags automatically generated.
— Your friends at Algorithmia