Algorithmia Blog

Introduction to Video Metadata Extraction

Last week we talked about how Video Transform was able to change the way users handled video transformation tasks. What’s even better than being able to Transform Videos at will? Getting actual, structured information out of them! This week we introduce you Video Transform’s sister, Video Metadata Extraction.

What’s the difference between Metadata Extraction and Transform?

Video Metadata Extraction is a Rust algorithm which functions very similarly to Video Transform, however instead of utilizing algorithms that transform images, it uses algorithms that classify or extract information from images, and returns the information in a structured, timestamped json array file.

This key difference unlocks a whole universe of potential, allowing us to extract any kind of information from any video, given we have the right image processing algorithm.

Why extracting information from videos matters

Video is a new frontier in machine learning. Image classification and extraction algorithms have transformed the way we use the web and have spawned new industries and now video is poised to do the same. Not convinced yet? Let’s introduce you to some examples:

Detecting and preventing extreme violence on live streams

Mugshot censored to preserve privacy with CensorFace

Facebook has been under under fire recently with its Facebook live video streaming service being used to broadcast and glorify heinous acts of sexual violence and murder. Recently they’ve decided to try and combat this by hiring 3,000 new moderators for its streaming service. However moderating extreme forms of violence can take its toll on even the sanest minds, leading to mental breakdowns and PTSD.

Imagine a world where Facebook was able to stream its live broadcast data into a violence detection algorithm on Algorithmia. This would reduce the workload for those moderators and let them focus on the more problematic videos, improving facebook’s response time and potentially saving lives. If you could save a life by using Video Metadata Extraction, what’s stopping you?

Preserving the ecology of the oceans

Courtesy of sntech.co.uk

When fishing with trawling nets, it’s so common to accidentally catch the wrong fish it has its own word, bycatch. Because of this, it’s mandated by most governments to record the amount of bycatch that’s harvested. On most trawlers there’s dedicated crew hired to document the types and amount of bycatch caught, however, they aren’t perfect and sometimes make mistakes, potentially leading to undetected ecological damage that can take decades to fix.

Imagine if there was a video processing tool that could count and detect the amount of bycatch accurately and transparently, providing governments the data they need to make informed decisions.

If you could help to save our oceans from overfishing using Video Metadata Extraction, shouldn’t you give it a shot?

Finding new advertising channels

Courtesy of socialsongbird.com

Brands have a difficult time engaging with millennials and younger generations because we are not that receptive to conventional advertising techniques which are hurting the bottom line for many businesses. Some advertising companies have figured out that using content creators to champion their products can capture audiences that are normally immune to conventional techniques and have been using it on sites like Instagram for years.

Video Metadata Transform can be used to extract valuable information that can be later used to capture similarities between videos and determine brand applicability on them. This process allows you to find relevant videos in unstructured environments like social media websites, so you can identify videos with similar content that caters to the interest of your target audience and demographic.

If you could improve your brand’s value among younger consumers and generate new revenue sources using Video Metadata Extraction, could you afford not to?

As you can see with just a few examples, there are powerful reasons to use Video Metadata Extraction. Not to mention that since it’s running on Algorithmia infrastructure, it’ll automatically scale to meet any demand.

How Do I Use Video Metadata Extraction?

Did the above examples inspire you to try it out? Here’s how you can get started with Video Metadata Extraction:

 

Input

input = {  
    "input_file":"data://media/videos/lounge_demo.mp4",
    "output_file":"data://.algo/temp/detected_objects.json",
    "algorithm":"algo://LgoBE/CarMakeandModelRecognition/0.3.4",
    "advanced_input":{"$SINGLE_INPUT"}
}

result = client.algo('media/VideoMetadataExtraction?timeout=3000').pipe(input).result

Output

[  
   ...,
   {  
      "data":[  
         {  
            "body_style":"SUV",
            "confidence":"0.47",
            "make":"Porsche",
            "model":"Macan",
            "model_year":"2014"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.08",
            "make":"Mercedes-Benz",
            "model":"GLE-Class",
            "model_year":"2015"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.06",
            "make":"Mercedes-Benz",
            "model":"GLC-Class",
            "model_year":"2016"
         }
      ],
      "timestamp":1.5015
   },
   {  
      "data":[  
         {  
            "body_style":"SUV",
            "confidence":"0.70",
            "make":"BMW",
            "model":"X4",
            "model_year":"2014"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.16",
            "make":"BMW",
            "model":"X6",
            "model_year":"2014"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.05",
            "make":"Porsche",
            "model":"Macan",
            "model_year":"2014"
         }
      ],
      "timestamp":1.5432083333333334
   },
   {  
      "data":[  
         {  
            "body_style":"SUV",
            "confidence":"0.29",
            "make":"BMW",
            "model":"X6",
            "model_year":"2014"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.25",
            "make":"BMW",
            "model":"X4",
            "model_year":"2014"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.11",
            "make":"Porsche",
            "model":"Macan",
            "model_year":"2014"
         }
      ],
      "timestamp":1.5849166666666668
   },
   {  
      "data":[  
         {  
            "body_style":"Hatchback",
            "confidence":"0.61",
            "make":"Infiniti",
            "model":"Q30",
            "model_year":"2016"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.32",
            "make":"Acura",
            "model":"MDX",
            "model_year":"2014"
         },
         {  
            "body_style":"SUV",
            "confidence":"0.02",
            "make":"Mercedes-Benz",
            "model":"GLE-Class",
            "model_year":"2015"
         }
      ],
      "timestamp":1.626625
   },
   {  
      "data":[  
         {  
            "body_style":"SUV",
            "confidence":"0.56",
            "make":"Acura",
            "model":"MDX",
            "model_year":"2014"
         },
   ...
]

The same number of LoC as Video Transform, easy right? Take a look at some of our image classification algorithms below for inspiration on what’s possible: