Algorithmia Blog

Create an Amazon S3 Image Processing Pipeline in Python

Need to create a simple Amazon S3 image processing pipeline to batch edit images? Now that Algorithmia supports Amazon S3 integration, here’s a quick way to automatically create thumbnails with custom dimensions.

In this demo, we’ll use SmartThumbnail, a microservice that uses face detection to perfectly crop every photo to the same size without awkwardly cropped heads or faces.

While manually cropping just a handful of photos isn’t bad. Cropping hundreds or thousands of images in real-time would be extremely expensive, time consuming, and tedious.

So, instead of doing this by hand so that every face in every photo is perfectly preserved, we can run all the photos through SmartThumbnail. The output is both intuitive and expected, each and every time.

We’re going to connect to an Amazon S3 bucket, process the images in a folder, and then save a new thumbnail images back to the folder. We’ll be using Python for this tutorial, but this could easily be done in JavaScript/Node, Rust, Scala, Java, or Ruby.

Don’t use Amazon S3? Want to use Dropbox instead? No problem. Here’s our guide to creating a Dropbox image processing pipeline.

Ready? Let’s go.

Step 1: Create a Free Account and Install Client

You’ll need a free Algorithmia account for this tutorial. Use the promo code “s3” to get an additional 50,000 credits when you signup.

Next, make sure you have the latest Algorithmia Python client on your machine. Let’s do that really quick:

pip install algorithmia

and to check that installation was successful…

pip show algorithmia

The Algorithmia Python Client should be version 1.0.5.


Step 2: Add Amazon S3 Credentials

Now that you have an Algorithmia account with the latest client installed, let’s connect your Amazon S3 account so that Algorithmia’s microservices can read and write to it.

Once logged in, navigate to the Algorithmia Data Portal, where you manage all your data, collections, and connected services, like Dropbox and Amazon S3.

  1. Select Add New Data Source 
  2. Connect to Amazon S3.
  3. Add your AWS Access Key ID and your Secret Access Key
  4. Check the Write Access box, and click Connect to Amazon S3

 

Note: The best practice with Amazon Web Services is to create an AWS IAM identity, and then grant AmazonS3FullAccess permissions so that you can read/write to the bucket.  

Now, when we want to read/write data to Amazon S3 from an Algorithmia microservice, we refer to it as s3://*. Let’s get to the fun part, and write the code to process our images.


Step 3: Amazon S3 Image Processing

We’re going to write a simple Python script to initialize the Algorithmia client, set the API key, loop through all the files in a specified Amazon S3 bucket, process each image, and then save a new thumbnail image back to the bucket.

There are three things you’ll need here:

  1. Your Algorithmia API key, which can be found under Credentials on your Algorithmia Profile page
  2. The Amazon S3 bucket path you want to process. In our example below, we going to process the myimageassets bucket.
  3. And, the image size of your new thumbnail. In this example, we’re generating 300×300 thumbnails.
##############################
#Author: Diego Oppenheimer ###
#						   ###
# Algorithmia, Inc         ###
##############################


import Algorithmia


#Set your Algorithmia API Key
apiKey = 'YOUR API KEY GOES HERE'

#Initialize Algorithmia Python client
client = Algorithmia.client(apiKey)

#Pick Algorithm to use
algo = client.algo('opencv/SmartThumbnail/1.0.4')

#Set folder URI path
uri = "s3://myimageassets"

#Iterate over folder containing images in S3 
for f in client.dir(uri).list():
	
	#Check file type is an image
	if f.getName().lower().endswith(('.png','.jpg','.jpeg','.bmp','.gif')):
		#Image progress write
		print "Reading " + f.getName()

		#Define input for Algorithm + Parameters 
		input = [uri + '/' + f.getName(), uri + '/thumbnail_' + f.getName(), 300, 300, "FALSE"]
		
		#Call Algorithm
		output = algo.pipe(input)
		
		print "Thumbnailing: thumbnail_" + f.getName()
        
	else:
		print "File:" + f.getName() +  "is not a type that is supported."

print "Done processing..."

Above, we’re calling Algorithmia, and asking for a list of files in the bucket /myimageassets. We then iterate through all the files, checking to see if they’re a PNG, JPG, etc. If we find an image file, we’ll then pass it to the SmartThumbnail microservice, which processes the image.

To ensure images are perfectly cropped, SmartThumbnail uses face detection to ensure heads and faces are in the frame. It then crops the image to the desired dimension (in our case it’s a 300×300 thumbnail), and then writes it back in the same format (i.e. PNG, JPG, etc.) to the Amazon S3 bucket with the “thumbnail_” suffix. Get the Gist here.

Ready to process your images? Simply copy the above, change your settings, and save the file as processImages.py. Run it from the command line by typing:

python processImages.py

Pretty cool, right? There’s more than 2,000 microservices in the Algorithmia library you could use to process Amazon S3 files. For instance, you could batch convert files from one type to another, convert audio to text (speech recognition), automatically tag and update the metadata on images, detect and sort images of people smiling, and more.

You could easily create an Amazon Lambda function to watch for new images, and then run this script to automatically process images as they’re uploaded.

We’d love to hear what you think @Algorithmia.