Algorithmia Blog - Deploying AI at scale

A Fast Way to Scrape Image URLs from Webpages

smart-image-download-extraction

Let’s say you’ve created an awesome application that colorizes images. Everybody loves it, but some users are getting errors.

You realize they’re trying to pass a URL to a webpage with an image on it, instead of a direct path to the image itself. Your app is expecting a .JPG, or .PNG.

Despite a great description and awesome documentation, users are having issues. They don’t realize that on sites like Facebook or Flickr, the URL to a photo is actually a link to a photo album. Not the file.

Without writing a custom web scraper, how could you make it easier for users to colorize their photos?

That’s where the Smart Image Downloader comes in. It supports links for Imgur, Dropbox, Twitter, Google Drive, 500px, and more. With a single API call, this algorithm can parse image links from webpages and resize them. The image files are stored in an Algorithmia data collection.

Smart Image Downloader makes it easy for your app to parse URLs from users and extract the correct image.

In our case, we used it to improve the user experience of our image colorization service. You might use it to simplify an image processing pipeline for creating thumbnails, or any number of other use cases.

How To Scrape Images

To get started using the algorithm, you’ll need a free API key from Algorithmia.

There are a few different input parameters that you can take advantage of when using the Smart Image Downloader. The only required one is the image URL. Be sure to check out the algorithm description page for more.

Sample API Call:

import Algorithmia

input = {
  "image": "https://en.wikipedia.org/wiki/Artificial_intelligence#/media/File:Kismet_robot_at_MIT_Museum.jpg",
  "resize": 600,
  "format": "png"
}

client = Algorithmia.client('your_api_key')
algo = client.algo('util/SmartImageDownloader/0.1.17')
print(algo.pipe(input))

The above is an example of extracting an image from Wikipedia and then automatically resizing the image to 600px wide. If you want a specific output dimension, you could set both the width and height to crop to that size. We’re also using the format parameter to save the images as a .PNG, .JPG, or .BMP.

Sample Output:

{
  "originalDimensions": [{"height": 1704, "width": 2272}],
  "resizedDimensions": [{"height": 450, "width": 600}],
  "savePath": [ "data://.algo/util/SmartImageDownloader/temp/e21a5451-7bfc-49bc-a1ab-a945e45869c2.png" ]
}

The algorithm returns JSON with the original dimension, new dimension, and the path for retrieving your image from the Algorithmia Data API.

If you’re interested in other image related posts, check out our guide for building an image pipeline with Amazon S3 or Dropbox. We also have a guide to scraping web data using the AnalyzeURL microservice.

Let us know what you think @Algorithmia.