Introduction to Video Transform

At Algorithmia, we have strived to develop a variety of powerful and useful image transformation algorithms that utilize cutting-edge machine learning techniques. These are the building blocks which let any developer build more complex algorithms and solve harder problems, regardless of their preferred language and development platform.

Video Transform is a direct extension of this work. It allows users to transform videos on a frame-by-frame basis, using any existing or future image transformation algorithm on the Algorithmia marketplace.

What the heck is Video Transform?


Video Transform is an algorithm (written in Rust, but callable from any language) that utilizes ffmpeg to:

• split any video into batches of frames
• process each batch in parallel while utilizing a particular image processing algorithm
• combine the transformed images back into a video, re-attaching any audio/subtitle streams from the source video along the way.

When we say any video we mean any video! Ffmpeg is spectacular in its universality, combine this with the Smart Video Downloader and you can transform any hosted video with ease!

Why transforming videos matters

Here are some “in the wild” use cases:

• An Ecologist, looking to study movements of whales near the shore, has cameras placed around a beach and utilizes Salnet as a tool to help detect anomalies in the video streams.

• A tech-savvy artist develops their own Style Transfer model for Deep Filter and then applies it to a larger video project.

• A media company looking to create a live stream of its festival, while preserving the privacy of its guests, employs the Censor Face algorithm to automatically blur faces of participants in real time.

• Police looking to improve the resolution of security camera footage uses the Enhance Resolution algorithm to help catch a perpetrator of a crime.

As you may notice with these examples, They all utilize existing image transformation algorithms on the Algorithmia marketplace. These algorithms were not originally designed to be used for video processing, but they can be effortlessly used for video processing with Video Transform; compositional functionality like this is what Algorithmia is all about.

Sample videos

We started with a simple mp4 of a bus station… here’s the raw footage:

Running this through Salnet, we immediately see the “hotspots” that our eyes naturally focus on, including people and objects in motion:

Applying a Deep Filter gives us one of many possible stylized videos:

How Do I Use Video Transform?

It only takes a few lines of code to configure and start Video Transform:

import Algorithmia

client = Algorithmia.client('YOUR_API_KEY_HERE')

input = {  
result = client.algo('media/VideoTransform?timeout=3000').pipe(input).result

Note that we’ve specified timeout=3000 in the .algo() call, allowing it to run for up to 3000 seconds if needed. Once done, it simply returns the location of the generated video file:

    "output_file": "data://.algo/temp/altered_lounge_demo.mp4"

And just like that, you can transform any video using any image processing algorithm, with less LoC than a cup of coffee! In the next spotlight we’ll be looking at Transform’s sister algorithm, Video Metadata Extraction, and how you can leverage it for even more value.

Try running a Video Transform