Algorithmia Blog - Deploying AI at scale

Building a Timeline of your Video: Automatically Identify Objects, Sequence Times, and Integrate with Timeline.js

When we implemented InceptionNet, a microservice to detect and label objects (features) in photos, we knew it would be helpful. Then, we built out VideoMetadataExtraction, a video pipeline which allows you to run feature-detection algorithms (and others) on an entire video. This allowed for some really powerful activities — like automatically scanning through home security footage to find all the cars of a specific make & model, or stripping out all the nudity-containing scenes of a movie to make a G-rated version.

Today, we’ll go further by showing you how to visualize all the features in your video, thanks to the VideoTagSequencer and Timeline.js, a beautiful JavaScript library for displaying timelines on the web.

If you want to skip directly to the demo, please do. but come on back for a full breakdown of the integration pipeline and code samples!

Fundamentals: what tools will we be using, and why?

Algorithmia provides a ton of algorithms, from simple utilities to complex machine learning tools, and exposes them as microservices. This means that, with just a few lines of boilerplate code, we can run some extremely powerful algos. For example, if we have a photo and want to get a list of all the features (real objects such as chairs or people) contained in the image, all we need to write is:

var input = "http://i.imgur.com/YKDmneL.jpg";
Algorithmia.client("your_api_key")
  .algo("algo://deeplearning/InceptionNet/1.0.4")
  .pipe(input)
  .then(function(output) {
    console.log(output);
  });

This calls the InceptionNet service, pipes in the image, and prints the results… and while this example is in JavaScript, Algorithmia can be run from just about any language. Everything runs in Algorithmia’s scalable cloud, so there’s no need need to buy or rent a whole machine to get some significant horsepower. There’s also no need to spend a ton of time building up expertise in machine learning, training and testing a model, etc. Algorithmia has done that already, allowing users to immediately take advantage of GPU-enabled deep learning without spending time rebuilding the wheel.

InceptionNet works well for an individual image, but in order to extract features from entire videos, we need to break the image into individual images, run InceptionNet on each frame, then combine the results. This sounds annoying… but fortunately, there’s an algorithm for that. VideoMetadataExtraction accepts a video, runs each frame of the video against an image algorithm that you specify, and returns the aggregated results:

var input = {  
   "input_file":"data://media/videos/kenny_test.mp4",
   "output_file":"data://.algo/temp/kenny_test_inception.json",
   "algorithm":"deeplearning/InceptionNet",
   "fps": 10,
   "advanced_input":"$SINGLE_INPUT"
};
Algorithmia.client("your_api_key")
  .algo("algo://media/VideoMetadataExtraction/0.5.6")
  .pipe(input)
  .then(function(output) {
    console.log(output);
  });

Great! This creates a file containing frame-by-frame lists of what objects are shown at each timepoint. For example, it tells us that at 1.2 seconds into the video, there’s a volcano. It also tells us that there is one at 1.3s, and at 1.4s, etc. This is a useful way to look at the data if we want to know what’s going on at each point in time, but it’s a cumbersome format for building timelines. For that, we want results which give time ranges for each feature: “there’s a volcano in this video from 1.2s until 3.2s”.

Normally this would require us to spend a day or two mucking about with the data format, finding all the edge cases, and coming up with some custom code to reprocess the output into a usable form… but of course, there’s an algorithm for that!

var input = {  
  "source":"data://.algo/media/VideoMetadataExtraction/temp/kenny_test_inception.json",
  "tag_key": "class",
  "confidence_key": "confidence",
  "traversal_path":{"tags": "$ROOT"},
  "minimum_confidence": 0.1,
  "minimum_sequence_length": 5
};
Algorithmia.client("your_api_key")
  .algo("algo://media/VideoTagSequencer/0.1.8")
  .pipe(input)
  .then(function(output) {
    console.log(output);
  });

We take the output from our call to VideoMetadataExtraction, send it into VideoTagSequencer with some algorithm-specific parameters (examples here), and get back a list of sequences which look like this:

{
  "tag": {"class": "volcano"}
  "sequences": [
    {
      "number_of_frames": 20,
      "start_time": 1.206,
      "stop_time": 3.218
    }
  ],
}

Perfect! Now we have the data in the format we need, and just need a way to visualize it.

Integrating with Timeline.js: the power of JSON and ready-to-run libraries

Timeline.js has a bunch of powerful features, allowing you to construct both a layered timeline and a slider-style view of each event. We’ll be using just a small part of that capability, but Timeline.js doesn’t seem to have a way to remove the components we don’t need… so we’ll just hide them using the simple-stupid CSS trick:

.tl-storyslider, .tl-menubar {
  display: none;
}

Next, we need to transform our output from VideoTagSequencer into a JSON block that Timeline.js understands. Fortunately, the spec and examples are pretty good. Using this, we can put together a fairly simple loop to iterate through our results and build out the required structure:

var events = [];
for (var i in sequencerResults) {
  var s = sequencerResults[i];
  for (var j in s.tag) {
    for (var k in s.sequences) {
      var event = {
        text: { text: s.tag[j] }
      };
      event.start_date = {
        year: Math.floor(s.sequences[k].start_time)
      };
      event.end_date = {
        year: Math.floor(s.sequences[k].stop_time)
      };
      events.push(event);
    }
  }
}

Since Timeline.js doesn’t work well with items which are only a few seconds apart, we’ll simply pretend that the seconds are years. While this doesn’t make informational sense, it does space things out properly on the timeline, and that’s all we care about right now. We add a bit of boilerplate code for embedding and initializing the timeline, some HTML to list the features in a nice table, and we’re done:

Try it yourself!

Now that you’ve seen how to build it, try out the demo and grab the source code, or build your own nifty tool from scratch, picking from any of our 3500+ algorithms and enjoying 5,000 free credits per month. If you’d like more examples, check out our collection of recipes as well. Enjoy!

Developer Evangelist for Algorithmia, helping programmers make the most efficient and productive use of their abilities.

More Posts