Algorithmia Blog

Introduction to Video Tag Sequencing

Sifting through unlabelled videos can be difficult and time-consuming. Even for the most seasoned analyst, fatigue leads to mistakes. Whether you’re trying to detect anomalies in mission-critical infrastructure — or you just want to find all of the segments in your vacation videos that contain ducks — we have a microservice that can help reduce the workload.

What is the Video Tag Sequencer? How does it work?

The VideoTagSequencer is an algorithm takes the time series point data generated from VideoMetadataExtraction and converts it into an index of detected labels and sequences contained in the video. In a nutshell, it takes frame-by-frame results, and converts them into a list of time ranges at which each result occurs.

By creating an index of labelled sequences it’s possible to both search for particular labels (as with our video search demo) and to create special purpose detector algorithms like video nudity detection.

The algorithm parses the output JSON schema from a particular image processing algorithm instead of having them hard-coded in. This means that the algorithm is capable of working with the vast majority of existing and future image detection and classification algorithms, just like VideoMetadataExtraction and VideoTransform do!

What can I do with Video Tag Sequencer?

Algorithms like VideoMetadataExtraction are incredibly powerful and enable brand new use cases that weren’t possible before, but they can be cumbersome to use directly. Here some examples of where Video Tag Sequencing can be really beneficial:

Oil Pipeline Inspection

You work for an Oil pipeline inspection company and you’re using autonomous drones with thermal cameras to quickly and frequently scan pipelines for defects and leaks. By using a specialised Image classification algorithm with VideoMetadataExtraction it’s possible to quickly generate a bunch of data, but it isn’t an easy to understand. The VideoTagSequencer is capable of converting that huge volume of point data into an easily digestible and consumable form that analysts can use to quickly skip large portions of video so they can spend their time looking at the more problematic areas.

Age based content filtering

In a world where violence, nudity and sex are common in film, how can you enjoy content you love while knowing that your kids won’t be exposed to stuff that isn’t age appropriate?

By using algorithms such as Video Nudity Detection (which uses Video Tag Sequencer) it’s now possible to find exactly where in a film the problematic scenes are and remove them. Entire companies like VidAngel have sprung up to cater to the market, and now you have the power at your fingertips to do so as well.

How Do I Use use the Video Tag Sequencer?
Did the above examples inspire you to try it out? Well let’s get started with a quick example building on the demo example we built in our Introduction to Video Metadata Extraction blog post:

bus stop – raw footage from James Sutton on Vimeo.

From the metadata blogpost we ran this:

input = {
    "input_file":"data://media/videos/bus_stop.mp4",
    "output_file":"data://.my/extractions/bus_video_car_detection.json",
    "algorithm":"algo://LgoBE/CarMakeandModelRecognition/0.3.4",
    "advanced_input":{"$SINGLE_INPUT"}
}

result = client.algo('media/VideoMetadataExtraction?timeout=3000').pipe(input).result

Which output a point data file defined at ‘data://.my/extractions/bus_video_car_detection.json’.

Now we feed that that file directly into Video Tag Sequencer:

input = {  
   "source":"data://media/extractions/bus_video_car_detection.json",
   "tag_key":[  
      "body_style",
      "make",
      "model",
      "model_year"
   ],
   "confidence_key":"confidence",
   "traversal_path":"$ROOT",
   "minimum_confidence":0.45,
   "minimum_sequence_length":8
}
result = client.algo('media/VideoTagSequencer).pipe(input).result

Which has now provides us with the following sequences:

[  
   {  
      "sequences":[  
         {  
            "mean":0.8335714285714285,
            "mode":1,
            "number_of_frames":13,
            "start_time":16.725041666666666,
            "stop_time":17.22554166666667
         }
      ],
      "tag":{  
         "body_style":"Hatchback",
         "make":"Smart",
         "model":"Forfour",
         "model_year":"2014"
      }
   },
   {  
      "sequences":[  
         {  
            "mean":0.9266666666666666,
            "mode":0.98,
            "number_of_frames":11,
            "start_time":3.878875,
            "stop_time":4.295958333333333
         }
      ],
      "tag":{  
         "body_style":"Sedan",
         "make":"Audi",
         "model":"S6",
         "model_year":"2011"
      }
   },
   {  
      "sequences":[  
         {  
            "mean":0.7682857142857143,
            "mode":0.7,
            "number_of_frames":34,
            "start_time":8.425083333333333,
            "stop_time":9.801458333333333
         },
         {  
            "mean":0.736,
            "mode":0.78,
            "number_of_frames":9,
            "start_time":7.924583333333333,
            "stop_time":8.25825
         }
      ],
      "tag":{  
         "body_style":"Convertible",
         "make":"Bugatti",
         "model":"Veyron 16.4",
         "model_year":"2009"
      }
   }
]

And, just like that, we were able to take the labelled frame data from VideoMetadataExtraction and process it into a concise format that’s easy to read and understand. Inspired to give it a try? Take a look at our other video processing algorithms as well: