Incorporating Datasets from in your Algorithms

We are happy to share a new set of algorithms from our partner  If you aren’t familiar with, they have an amazing marketplace of datasets available covering the wide spectrum of what’s available.  I can’t think of a better pairing for the datasets than with the many algorithms available in the directory!

It’s very easy to consume and publish new datasets via Algorithmia.  The team has published four helper utility algorithms that you can take advantage of in your own algorithms.  Since you can compose, chain, and pipe output to multiple algorithms together easily, you’ll have so many possibilities for processing datasets available from  You can find all of their new algorithms available on their organization page on

First-Time Configuration

Before you start incorporating datasets or publishing to your datasets, you must configure your Algorithmia account to store the credentials.  The easiest way is with the “ configure” helper algorithm that will store your specific credentials into a known location in your Hosted Data storage on  Call it once, and you’re all ready to go!

Using Open Datasets in Your Algorithm Pipeline

Once you have your credentials configured, you can simply call the “ query” helper algorithm to pull data from directly into any algorithm you create.

Let’s put together a quick NBA Annual Team Attendance History analysis function that takes advantage of several of the most popular Time Series algorithms available on  There’s an awesome dataset on with attendance metrics from Gabe Salzer where we can hone in on the Lakers.

import Algorithmia
import json

def apply(foo):
output = {}
client = Algorithmia.client()

input = {
  "dataset_key": "gmoney/nba-team-annual-attendance",
  "query": "SELECT home_total_attendance FROM `nba_team_annual_attendance` WHERE team='Lakers'",
  "query_type": "sql",
  "parameters": []

# load dataset
algo = client.algo("datadotworld/query")
dataset = algo.pipe(input).result["data"]

# process dataset
all_values = [d["home_total_attendance"] for d in dataset]
metrics = client.algo("TimeSeries/TimeSeriesSummary").pipe({"uniformData": all_values}).result

return metrics

This algorithm then returns a summary of Time Series metrics for the annual attendance of the Lakers:

  "correlation": 0.33052435441847194,
  "geometricMean": 766172.0799467923,
  "intercept": 747591.4338235295,
  "kurtosis": 15.675479207885754,
  "max": 778877,
  "mean": 767146.0625000001,
  "min": 626901,
  "populationVariance": 1322295814.9335918,
  "rmse": 34319.671109063434,
  "skewness": -3.9439454970253087,
  "slope": 2607.283823529394,
  "standardDeviation": 37555.94319495252,
  "var": 1410448869.262498

Very nice!

There is a really awesome interactive Time Series algorithms demo if you want to check out how they work with a other types of datasets

I’m impressed by the number of datasets already available in the directory.  We are excited to see how developers are going to leverage pairing the algorithms and microservices available from Algorithmia with the wealth of datasets at!  To show how much we appreciate members of the community, we have a special promo code to get 100,000 additional credits after signing up for Algorithmia: DATADOTWORLD

We’re looking forward to seeing how you make use of the new functions!

Head of Product at Algorithmia, empowering developers and data scientists to create production-ready algorithms and ensuring they are discoverable by developers around the globe building amazing apps & services in need of them.

More Posts - Website

Follow Me: