Algorithmia

Algorithmia Now Supports Amazon S3 and Dropbox Integrations

Algorithmia Now Integrates with Amazon S3 and Dropbox We know how hard and frustrating it is to have data stored in one location when you need it in another. Wouldn’t it be better to not have to deal with uploading your data to a new service in order to work with it?

That’s why we’re excited to announce the Algorithmia Data Portal. This dedicated I/O hub – a starting point for reading and writing your data from any data source – makes it easy to connect with Amazon S3 and Dropbox to access your data where it’s at. Now, application developers can read their data from an external source, process it using the algorithmic microservices from Algorithmia, and then write the output where it’s needed. No DevOps required.

“Algorithms allow you to gain insights from your data, but without data, what do you need algorithms for?” Diego Oppenheimer says, Algorithmia CEO and co-founder. “That’s why we’re enabling Algorithmia users to access their data where it’s stored, removing the friction of using state-of-the-art algorithms to interpret, and extract insights from data.”

With the Algorithmia Data Portal, we’re addressing one of the core issues of data portability and interoperability. Algorithmia can now retrieve your data on-demand, removing the need for developers to ship or on-board their data in advance. We believe that ingesting, processing, and writing data should be as simple as an API call. 

To demonstrate, we’ve created an image processing pipeline in 10-lines of Python code. Simply by connecting to an existing data source, like Amazon S3 or Dropbox, we can easily batch process an entire folder of images. In this demo, we make an API call to our data source to list the files, then make another API call to Algorithmia’s SmartThumbnail microservice. The service processes each image, and then writes a new file to a folder in the same data source.

Here’s the image processing pipeline tutorial for Amazon S3, and for Dropbox.

Learn how to connect, configure, and read/write from your data in a few easy steps with our data portal guides below. If you have any questions or suggestions, please get in touch by email or @Algorithmia.

Don’t use Amazon S3 or Dropbox? No problem. Algorithmia also offers a free hosted data service for storing large files, preserving state, creating collections, and more.


Data Portal Guides

For the algorithm developer, Algorithmia hosted data is perfect for storing trained machine learning models and instantly turning them into a live, scalable API. For more, check out our guides for hosting your NLTK and scikit-learn models on Algorithmia.


Documentation

For Application Developers

For Algorithm Developers


More About Algorithmia

We’ve created an open marketplace for algorithms and algorithm development, making state-of-the-art algorithms accessible and discoverable by everyone. On Algorithmia, algorithms run as microservices, creating the building blocks of algorithmic intelligence developers can use to create, share, and remix at scale.

By making algorithms composable, interoperable, and portable, algorithms can be written in any supported language, and then made available to application developers where the code is always “on,” and available via a simple REST API.

Application developers can access the API via our clients for Java, Scala, Python, JavaScript, Node.js, Ruby, Rust, CLI, and cURL. Our AWS Lambda blueprint is perfect for those working on event-driven and IoT-type projects.

Algorithmia is the largest marketplace for algorithms in the world, with more than 18,000 developers leveraging 2,000+ algorithms.

Create an Amazon S3 Image Processing Pipeline in Python

Amazon S3 Image Processing with Algorithmia

Need to create a simple Amazon S3 image processing pipeline to batch edit images? Now that Algorithmia supports Amazon S3 integration, here’s a quick way to automatically create thumbnails with custom dimensions.

In this demo, we’ll use SmartThumbnail, a microservice that uses face detection to perfectly crop every photo to the same size without awkwardly cropped heads or faces.

While manually cropping just a handful of photos isn’t bad. Cropping hundreds or thousands of images in real-time would be extremely expensive, time consuming, and tedious.

So, instead of doing this by hand so that every face in every photo is perfectly preserved, we can run all the photos through SmartThumbnail. The output is both intuitive and expected, each and every time.

We’re going to connect to an Amazon S3 bucket, process the images in a folder, and then save a new thumbnail images back to the folder. We’ll be using Python for this tutorial, but this could easily be done in JavaScript/Node, Rust, Scala, Java, or Ruby.

Don’t use Amazon S3? Want to use Dropbox instead? No problem. Here’s our guide to creating a Dropbox image processing pipeline.

Ready? Let’s go.

Step 1: Create a Free Account and Install Client

You’ll need a free Algorithmia account for this tutorial. Use the promo code “s3” to get an additional 50,000 credits when you signup.

Next, make sure you have the latest Algorithmia Python client on your machine. Let’s do that really quick:

pip install algorithmia

and to check that installation was successful…

pip show algorithmia

The Algorithmia Python Client should be version 1.0.5.


Step 2: Add Amazon S3 Credentials

Now that you have an Algorithmia account with the latest client installed, let’s connect your Amazon S3 account so that Algorithmia’s microservices can read and write to it.

Once logged in, navigate to the Algorithmia Data Portal, where you manage all your data, collections, and connected services, like Dropbox and Amazon S3.

Connect a new Data Source

  1. Select Add New Data Source 
  2. Connect to Amazon S3.
  3. Add your AWS Access Key ID and your Secret Access Key
  4. Check the Write Access box, and click Connect to Amazon S3

Connect to Amazon S3 with Algorithmia

 

Note: The best practice with Amazon Web Services is to create an AWS IAM identity, and then grant AmazonS3FullAccess permissions so that you can read/write to the bucket.  

Now, when we want to read/write data to Amazon S3 from an Algorithmia microservice, we refer to it as s3://*. Let’s get to the fun part, and write the code to process our images.


Step 3: Amazon S3 Image Processing

We’re going to write a simple Python script to initialize the Algorithmia client, set the API key, loop through all the files in a specified Amazon S3 bucket, process each image, and then save a new thumbnail image back to the bucket.

There are three things you’ll need here:

  1. Your Algorithmia API key, which can be found under Credentials on your Algorithmia Profile page
  2. The Amazon S3 bucket path you want to process. In our example below, we going to process the myimageassets bucket.
  3. And, the image size of your new thumbnail. In this example, we’re generating 300×300 thumbnails.
##############################
#Author: Diego Oppenheimer ###
#						   ###
# Algorithmia, Inc         ###
##############################


import Algorithmia


#Set your Algorithmia API Key
apiKey = 'YOUR API KEY GOES HERE'

#Initialize Algorithmia Python client
client = Algorithmia.client(apiKey)

#Pick Algorithm to use
algo = client.algo('opencv/SmartThumbnail/0.1.15')

#Set folder URI path
uri = "s3://myimageassets"

#Iterate over the Dropbox folder containing images
for f in client.dir(uri).list():
	
	#Check file type is an image
	if f.getName().lower().endswith(('.png','.jpg','.jpeg','.bmp','.gif')):
		#Image progress write
		print "Reading " + f.getName()

		#Define input for Algorithm + Parameters 
		input = [uri + '/' + f.getName(),300,300]
		
		#Call Algorithm
		output = algo.pipe(input)
		
		print "Thumbnailing: thumbnail_" + f.getName()
        #Save the new image to the folder
		client.file(uri + '/thumbnail_' + f.getName()).put(output.result)
	else:
		print "File:" + f.getName() +  "is not a type that is supported."

print "Done processing..."

Above, we’re calling Algorithmia, and asking for a list of files in the bucket /myimageassets. We then iterate through all the files, checking to see if they’re a PNG, JPG, etc. If we find an image file, we’ll then pass it to the SmartThumbnail microservice, which processes the image.

To ensure images are perfectly cropped, SmartThumbnail uses face detection to ensure heads and faces are in the frame. It then crops the image to the desired dimension (in our case it’s a 300×300 thumbnail), and then writes it back in the same format (i.e. PNG, JPG, etc.) to the Amazon S3 bucket with the “thumbnail_” suffix. Get the Gist here.

Ready to process your images? Simply copy the above, change your settings, and save the file as processImages.py. Run it from the command line by typing:

python processImages.py

Processed S3 Images

Pretty cool, right? There’s more than 2,000 microservices in the Algorithmia library you could use to process Amazon S3 files. For instance, you could batch convert files from one type to another, convert audio to text (speech recognition), automatically tag and update the metadata on images, detect and sort images of people smiling, and more.

You could easily create an Amazon Lambda function to watch for new images, and then run this script to automatically process images as they’re uploaded.

We’d love to hear what you think @Algorithmia.

Create a Dropbox Image Processing Pipeline in Python

Before and after example of Dropbox image processing

Need to create a Dropbox image processing pipeline to batch edit images? Now that Algorithmia supports Dropbox integration, here’s a quick way to automatically create thumbnails with custom dimensions.

In this demo, we’ll use SmartThumbnail, a microservice that uses face detection to perfectly crop every photo to the same size without awkwardly cropped heads or faces.

While manually cropping photos isn’t bad when it’s just a few. Cropping hundreds or thousands of images in real-time would be extremely expensive, time consuming, and tedious.

So, instead of doing this by hand so that every face in every photo is perfectly preserved, we can run all the photos through SmartThumbnail. The output is both intuitive and expected, each and every time.

We’re going to connect to your Dropbox, process the images in a folder, and then save a new thumbnail images back to the folder. We’ll be using Python for this tutorial, but this could easily be done in JavaScript/Node, Rust, Scala, Java, or Ruby.

Don’t use Dropbox? Want to use Amazon S3 instead? No problem. Here’s our guide to creating an Amazon S3 image processing pipeline.

Ready? Let’s go.

Step 1: Create a Free Account and Install Client

You’ll need a free Algorithmia account for this tutorial. Use the promo code “dropbox” to get an additional 50,000 credits when you signup.

Next, make sure you have the latest Algorithmia Python client on your machine. Let’s do that really quick:

pip install algorithmia

and to check that installation was successful…

pip show algorithmia

The Algorithmia Python Client should be version 1.0.5.


Step 2: OAuth Dropbox

Now that you have an Algorithmia account with the latest client installed, let’s OAuth your Dropbox so that Algorithmia’s microservices can read and write to it.

Once logged in, navigate to the Algorithmia Data Portal, where you manage all your data, collections, and connected services, like Dropbox and Amazon S3.

Connect a new Data Source

  1. Select Add New Data Source 
  2. Connect to Dropbox. A new Dropbox window will open.
  3. Click Allow to grant Algorithmia access to your Dropbox files and folders
  4. Click Manage Dropbox
  5. For this tutorial, Algorithmia will need both READ and WRITE access to your Dropbox. Check Write Access and then Save

Manage Dropbox Settings

Now, when we want to READ or WRITE data to Dropbox from an Algorithmia microservice, we refer to it as dropbox://*. Let’s get to the fun part, and write the code to process our Dropbox images.


Step 3: Dropbox Image Processing

We’re going to write a simple Python script to initialize the Algorithmia client, set the API key, loop through all the files in a specified Dropbox folder, process each image, and then save a new thumbnail image back to Dropbox.

There are three things you’ll need here:

  1. Your Algorithmia API key, which can be found under Credentials on your Algorithmia Profile page.
  2. The Dropbox folder path you want to process. In our example below, we are going to process the /Camera Uploads folder.
  3. And, the image size of your new thumbnail. In this example, we’re generating 300×300 thumbnails.
##############################
#Author: Diego Oppenheimer ###
#						   ###
# Algorithmia, Inc         ###
##############################


import Algorithmia


#Set your Algorithmia API Key
apiKey = 'YOUR API KEY GOES HERE'

#Initialize Algorithmia Python client
client = Algorithmia.client(apiKey)

#Pick Algorithm to use
algo = client.algo('opencv/SmartThumbnail/0.1.15')

#Set folder URI path
uri = "dropbox://Camera Uploads"

#Iterate over the Dropbox folder containing images
for f in client.dir(uri).list():
	
	#Check file type is an image
	if f.getName().lower().endswith(('.png','.jpg','.jpeg','.bmp','.gif')):
		#Image progress write
		print "Reading " + f.getName()

		#Define input for Algorithm + Parameters 
		input = [uri + '/' + f.getName(),300,300]
		
		#Call Algorithm
		output = algo.pipe(input)
		
		print "Thumbnailing: thumbnail_" + f.getName()
        #Save the new image to the folder
		client.file(uri + '/thumbnail_' + f.getName()).put(output.result)
	else:
		print "File:" + f.getName() +  "is not a type that is supported."

print "Done processing..."

Above, we’re calling Algorithmia, and asking for a list of files in the directory /Camera Uploads. We then iterate through all the files, checking to see if they’re a PNG, JPG, etc. If we find an image file, we’ll then pass it to the SmartThumbnail microservice, which processes the image.

To ensure images are perfectly cropped, SmartThumbnail uses face detection to ensure heads and faces are in the frame. It then crops the image to the desired dimension (in our case it’s a 300×300 thumbnail), and then writes it back in the same format (i.e. PNG, JPG, etc.) to the Dropbox folder with the “thumbnail_” suffix. Get the Gist here.

Ready to process your images? Simply copy the above, change your settings, and save the file as processImages.py. Run it from the command line by typing:

python processImages.py

The Dropbox image processing pipeline in action

Pretty cool, right? There’s more than 2,000 microservices in the Algorithmia library you could use to process Dropbox files. For instance, you could batch convert files from one type to another, convert audio to text (speech recognition), automatically tag and update the metadata on images, detect and sort images of people smiling, and more.

You could easily create an Amazon Lambda function to watch your Dropbox for new images, and then run this script to automatically process images as they’re uploaded.

We’d love to hear what you think @Algorithmia.

Twitter’s Machine Learning Plans, Is Deep Learning Magic?, Net Neutrality Upheld, and More

Twitter, Machine Learning, and Net NeutralityIssue 14
This week we check in on
Twitter’s machine learning plans, ask if deep learning is magic or just math, celebrate the latest net neutrality ruling, and share some videos to watch. Plus, what we’re reading and a few things for you to try at home. 

Not a subscriber? Join the Emergent // Future newsletter here.


Twitter Dives Into Machine Learning 📡

You Might Have Heard: Twitter announced Monday that it acquired the AI company Magic Pony Technology.

The London-based company uses machine learning and neural networks to identify features in video, enhance imagery, and create graphics for virtual and augmented reality.

In other words, Twitter’s big bet is that the algorithms will improve video streaming for Vine and Periscope by automatically filling in patchy video feeds, and increasing the resolution of pixelated video and images.

The new tech might come in handy this Fall when Twitter begins providing free, live streaming video of the NFL’s Thursday Night Football games to more than 800 Million users worldwide on mobile phones, tablets, PCs and connected TVs.

Magic Pony is the third machine learning acquisition by Twitter, joining Madbits, and Whetlab. Sources say the deal was worth $150M and includes a team of 11 PhDs with expertise in computer vision, machine learning, high-performance computing, and computational neuroscience.

PLUS: Google opened a dedicated machine learning research center in Zurich to focus on machine intelligence, natural language processing, and machine perception.


Deep Learning Isn’t Magic 🔮

Deep learning isn’t a dangerous magic genie, and it’s far from magic, Oren Etzioni says, CEO of the Allen Institute for Artificial Intelligence and a computer scientist at the University of Washington.

Google, Facebook, Microsoft and others continue to push AI into everyday online services while pundits describe deep learning as an imitation of the human brain. It’s really just simple math, Oren says, executed on an enormous scale.

And, amazingly, the artificial intelligence we were promised is finally coming as deep learning algorithms can now recognize images, voice, text, and more.

There is almost nothing we can think of that cannot be made new, different, or interesting by infusing it with some extra IQ,” Kevin Kelly writes.

It’s not all peaches and cream. Oxford philosopher and author Nick Bostrom warns that artificial intelligence is a greater threat to humanity than climate change.

So, what’s next for AI? The best minds in AI weigh in on what life will look like in the age of the machines.

Confused About AI? This two-part series on the AI revolution and the road to superintelligence is an excellent primer. (Part 1) (Part 2)

PLUS: A guide to staying human in the machine age


Net Neutrality: Can You Hear Me Now? 💬

A federal appeals court has ruled that high-speed internet can be defined as a utility, putting it on par with other essential services like power and the phone.

The ruling clears the way for rigorous policing of broadband providers like AT&T, Comcast, and Verizon, limiting their ability to shape the user experience by blocking content, prioritizing paid content, or creating fast and slow lanes on the internet.

“This is an enormous win for consumers,” said Gene Kimmelman, president of the public interest group Public Knowledge. “It ensures the right to an open internet with no gatekeepers.”

With the win, the internet will remain a platform for innovation, free expression, and economic growth, Tom Wheeler, chairman of the F.C.C., said in a statement.

But, hold the celebration, the fight for net neutrality isn’t over.

AT&T is already vowing to fight the decision, and expects this to ultimately be decided by the Supreme Court. 🙈

 


Videos to Watch 📺

 


What We’re Reading 📚

  • Serverless Architectures. Unlike traditional architectures, serverless is run in stateless compute containers that are event-triggered, ephemeral, and fully managed by a 3rd party. Think of this as “Functions as a service / FaaS.” AWS Lambda is one of the most popular implementations of FaaS. (Martin Fowler)
  • What’s Next for Artificial Intelligence. The best minds in the business on what life will look like in the age of the machines. (WSJ)
  • A Dozen Things I’ve Learned from Elon Musk About Business and Investing. Elon Musk is a classic missionary founder who is more interested in changing the world and creating enduring businesses than just the financial rewards that may flow to him from the product or service. (25iq)
  • The Forrest Gump of the Internet. Ev Williams became a billionaire by helping to create the free and open web. Now, he’s betting against it. (The Atlantic)
  • A Simple Explanation of How Image Recognition Works.Are you tired of reading endless news stories about deep learning and not really knowing what that means? Let’s change that! This time, learn how to write programs that recognize objects in images using deep learning. (Medium)

Try This At Home 🛠


Emergent Future is a weekly, hand-curated dispatch exploring technology through the lens of artificial intelligence, data science, and the shape of things to come. Subscribe here.

Improving Nudity Detection and NSFW Image Recognition

Detecting Nudity In ImagesLast June we introduced isitnude.com, a demo for detecting nudity in images based on our Nudity Detection algorithm. However, we wouldn’t be engineers if we didn’t think we could improve the NSFW image recognition accuracy.

The challenge, of course, is how do you do that without a labeled dataset of tens of thousands of nude images? To source and manually label thousands of nude images would have taken months, and likely years of post-traumatic therapy sessions. Without this kind of labeled dataset, we had no way of utilizing computer vision techniques such as CNNs (Convolutional Neural Networks) to construct an algorithm that could detect nudity in images.

The Starting Point for NSFW Image Recognition

The original algorithm used nude.py at its core to identify and locate skin in an image. It looked for clues in an image, like the size and percentage of skin in the image, to classify it as either nude or non-nude.

We built upon this by combining nude.py with OpenCV’s nose detection, and face detection algorithms. By doing this, we could draw bounding polygons for the face and nose in an image, and then get the skin tone range for each polygon. We could then compare the skin tone from inside the box to the skin found outside the box.

OpenCV Face Detection AlgorithmAs a result, if we found a large area of skin in the image that matched the skin tone of the person’s face/nose, we could return that there was nudity in the image with high confidence.

Additionally, this method allowed us to limit false positives, and check for nudity when there were multiple people in an image. The downside to this method was that we could only detect nudity when a human face was present.

Detecting nudity using Algorithmia

All told, our Nudity Detection algorithm was ~60% accurate using the above method.

Improving Nudity Detection In Images

The first step in improving our ability to detect nudity in images was to find a pre-trained model that we could work with. After some digging, one of our algorithm developers found the Illustration2Vec algorithm. It’s trained on anime to extract feature vectors from illustrations. The creators of the Illustration2Vec scraped 1.2M+ images of anime and the associated metadata, creating four categories of tags: general, copyright, character, and ratings.

This was a good starting point. We then created the Illustration Tagger microservice on Algorithmia, our implementation of the Illustration2Vec algorithm.

With the microservice running, we could simply call the API endpoint, and pass it an image. Below we pass an image of Justin Trudeau, Canada’s prime minister, to Illustration Tagger like so:

{
  "image": "https://upload.wikimedia.org/wikipedia/commons/9/9a/Trudeaujpg.jpg"
}

Testing Nudity Detection Illustration Tagger

And, the microservice will return the following JSON:

{
  "rating": [
    {"safe": 0.9863441586494446},
    {"questionable": 0.011578541249036789},
    {"explicit": 0.0006071273819543421}
  ],
  "character": [],
  "copyright": [{"real life": 0.4488498270511627}],
  "general": [
    {"1boy": 0.9083328247070312},
    {"solo": 0.8707828521728516},
    {"male": 0.6287103891372681},
    {"photo": 0.45845481753349304},
    {"black hair": 0.2932664752006531}
  ]
}

Side-note: Illustration2Vec was developed to make a keyword-based search engine to help novice drawers find reference images to base new work on. The pre-trained caffe model is available through Algorithmia, and available under the MIT License.

In all, Illustration2Vec can classify 512 tags, across three different rating levels (“safe,” “questionable,” and “explicit.”).

The important part here is that the ratings were used to … wait for it … label nudity and sexual implications in images. Why? Because anime tends to have lots of nudity and sexual themes.

Because of that, we were able to piggyback off their pre-trained model.

We used a subset of the 512 tags to create a new microservice called Nudity Detection I2V, which acts as a wrapper for Illustration Tagger, but only returns nude/not-nude with corresponding confidence values.

The same Trudeau image when passed to Nudity Detection I2V returns the following:

{
  "url": "https://upload.wikimedia.org/wikipedia/commons/9/9a/Trudeaujpg.jpg", 
  "confidence": 0.9829585656606923, 
  "nude": false 
}

For comparison, here’s the output of the famous Lenna photo from Nudity Detection, Nudity Detection I2V, and Illustration Tagger.

Image Recognition and NSFW

Nudity Detection

{
  "nude": "true", 
  "confidence": 1
}

Nudity Detection I2V

{
  "url": "http://www.log85.com/tmp/doc/img/perfectas/lenna/lenna1.jpg", 
  "confidence": 1, 
  "nude": true
}

Illustration Tagger

{ 
  "character": [], 
  "copyright": [{"original": 0.24073825776577}], 
  "general": [ 
    {"1girl": 0.7676181197166442},
    {"nude": 0.7233675718307494},
    {"photo": 0.5793498158454897},
    {"solo": 0.5685935020446777},
    {"breasts": 0.38033962249755865},
    {"long hair": 0.24592463672161105}
  ], 
  "rating": [ 
    {"questionable": 0.7255685329437256},
    {"safe": 0.23983716964721682},
    {"explicit": 0.032572492957115166}
  ] 
}

For every image analyzed using Nudity Detection I2V, we check to see if any of these nine tags are returned: “explicit”, “questionable”, “safe”, “nude”, “pussy”, “breasts”, “penis”, “nipples”, “puffy nipples”, or “no humans.”

To capture nudity related information from the above tags, we labelled a small ~100 image dataset and pushed our data to Excel. Once there, we ran Excel’s Solver plugin to neatly fit weights to each tag, maximizing our new detector’s accuracy across our sample set.

One interesting observation from the linear fit operation is that some tags have less correlation with nudity (like breasts), than others (like nipples). Logically this makes sense, since exposed breasts aren’t necessarily nude, but if you can see nipples, it is.

By using this method, we were able to increase the accuracy rate from ~60% to ~85%.

Conclusion

Constructing a nudity detection algorithm from scratch is tough. Building a large unfiltered nudity dataset for CNN training is expensive and potentially damaging to one’s mental health. Our solution was to find a pre-trained model that “almost” fit our needs, and then fine tune the output to become an excellent nudity detector.

Positive Accuracy

72.73%

83.64%

Negative Accuracy

48.39%

87.10%

Overall Accuracy

60.56%

85.37%

 

As you can see from the above results, we’ve definitely succeeded in improving our accuracy. Don’t just take our word for it, try out Nudity Detection I2V.

In a future post, we’ll demonstrate how to use an ensemble to combine the two methods for even greater accuracy.

Emergent // Future Report: Flying Cars, Understanding AI and Machine Learning, WWDC News, and More

Flying Cars, Understanding AI and Machine Learning, WWDC News, and MoreIssue 13
This week we check in on
Larry Page’s flying-car project, get a primer in AI, ML, and Deep Learning, take a look at all the Apple WWDC news, and check out the Vive VR headset. Plus, what we’re reading and a few things for you to try at home.

Not a subscriber? Join the Emergent // Future newsletter here.


We Were Promised… Flying Cars? 🚗

You Might Have Heard: Larry Page has a secret flying-car factory.

Not only that, but he’s invested over $100M into the flying-car startup Zee.Aero, and is also an investor in the competing firm, Kitty Hawk.

Flying cars have always been a part of our collective imagination, for better or worse, but for flying cars to become a reality they need to be able to fly autonomously.

“With ultralight composites and better battery tech we may actually be drawing near a basic functional design,” TechCrunch writes. “In a few decades it may seem absurd that we drove our own cars for a century, with the dead from traffic accidents totaling in the millions.”


A Primer on AI, and Machine Learning 🤖

This video presentation by a16z Deal and Research head Frank Chen, walks through the basics of AI, deep learning, and machine learning, explaining how we got here, why now, and what the next breakthrough is.

Every company is now a data company, capable of using machine learning in the cloud to deploy intelligent apps at scale, thanks to three machine learning trends: data flywheels, the algorithm economy, and cloud-hosted intelligence.

Despite playing catchup to Google and Microsoft, Facebook wants to dominate in AI and machine learning. They’ve tripled their investment in processing power for research, and hired up more than 150 people.

Meanwhile, a patent for “Face Detection Using Machine Learning” was just granted. 😳

PLUS: A few considerations when setting up deep learning hardware.


News from WWDC 📱🖥⌚️

The big news from Apple WWDC this year is that they’re opening up Siri to app developers in an effort to keep pace with Amazon, Google, Facebook, and Microsoft, all of which are betting that voice commands and chatbots will be one of the next big computing paradigms.

HomeKit, the Apple Internet of Things platform, got a big updateand a dedicated app to control all the devices in your home, andfeatures access to connected cameras from the lock screen, geofencing, automation, Siri commands, and Apple Watch support.

One More Thing: Apple added facial and object recognition to the iPhone and iPad.

The computer vision tech runs natively on the device, and doesn’t require you to upload all the images to the cloud. With the Photos app, you can now recognize faces and objects across your photo library to help you to find a specific photo with ease.

In addition, the app has a new feature called Memories, which bundles photos according to events and places.

PLUS: The 13 biggest announcements from Apple WWDC 2016


Vive VR Headset Now Shipping 🎮

The VR headset from HTC started shipping last week in 24 countries, with delivery in two to three days. Head to a Microsoft, GameStop, and Micro Center store for a demo before plopping down $799.

But, should you buy an HTC Vive right now?

Still unsure? Here’s an HTC Vive vs Oculus Rift comparison.

tl;dr
The Rift has better sound and ergonomics; The Vive has better controllers and smoother tracking of head and hand motions.

“Both of these devices are good enough to deliver a great VR experience to millions of people. This is the beginning of something big.”

PLUS: VR rollercoasters are coming, and they look amazing.

ALSO: Augmented reality startup Blippar unveils its “visual browser,” an app for recognizing real-world objects using machine learning.


What We’re Reading 📚

  • What are the odds we are living in a computer simulation?Citing the speed with which video games are improving, Elon Musk suggested that the development of simulations “indistinguishable from reality” was inevitable. The likelihood that we are living in “base reality,” Musk concluded, was just “one in billions.” (The New Yorker)
  • Ray Kurzweil’s four big insights for predicting the future.“How we eat, work, play, communicate, and travel are deeply affected by the development of new technology. But what is the underlying engine that drives technological progress? Does technological change progress at a steady rate? Can we predict what’s coming in 5 or 10 years?” (Singularity Hub)
  • Mastering Programming. “The theme here is scaling your brain. The journeyman learns to solve bigger problems by solving more problems at once. The master learns to solve even bigger problems than that by solving fewer problems at once.” (Kent Beck)
  • How big data and poker-playing bots are blurring the line between man and machine. “Science, mathematics, and gambling have long been intertwined, and thanks to advances in big data and machine learning, our sense of what’s predictable is growing, crowding out the spaces formerly ruled by chance.” (Kernal Mag)
  • A New Theory Explains How Consciousness Evolved. “If the wind rustles the grass and you misinterpret it as a lion, no harm done. But if you fail to detect an actual lion, you’re taken out of the gene pool.” (The Atlantic)
  • Jessica Livingston’s Pretty Complete List on How Not to Fail. “Nothing else you do will matter if you’re not making something people want. You can be the best spokesperson, the best fundraiser, the best programmer, but if you aren’t building a product that satisfies a real need, you’ll never succeed.” (The Macro)

Try This At Home 🛠


Emergent Future is a weekly, hand-curated dispatch exploring technology through the lens of artificial intelligence, data science, and the shape of things to come. Subscribe here.

Powered by Algorithmia

Machine Learning Trends and the Future of Artificial Intelligence 2016

The Future of AI and Machine Learning Trends 2016

Every company is now a data company, capable of using machine learning in the cloud to deploy intelligent apps at scale, thanks to three machine learning trends: data flywheels, the algorithm economy, and cloud-hosted intelligence.

That was the takeaway from the inaugural Machine Learning / Artificial Intelligence Summit, hosted by Madrona Venture Group* last month in Seattle, where more than 100 experts, researchers, and journalists converged to discuss the future of artificial intelligence, trends in machine learning, and how to build smarter applications.

With hosted machine learning models, companies can now quickly analyze large, complex data, and deliver faster, more accurate insights without the high cost of deploying and maintaining machine learning systems.

“Every successful new application built today will be an intelligent application,” Soma Somasegar said, venture partner at Madrona Venture Group. “Intelligent building blocks and learning services will be the brains behind apps.

Below is an overview of the three machine learning trends leading to a new paradigm where every app has the potential to be a smart app.


Data Flywheels

Digital data and cloud storage follow Moore’s law: the world’s data doubles every two years, while the cost of storing that data declines at roughly the same rate. This abundance of data enables more features, and better machine learning models to be created.

“In the world of intelligent applications, data will be king, and the services that can generate the highest-quality data will have an unfair advantage from their data flywheel — more data leading to better models, leading to a better user experience, leading to more users, leading to more data,” Somasegar says.

Digital data growth and cost of storageFor instance, Tesla has collected 780 million miles of driving data, and they’re adding another million every 10 hours.

This data is feed into Autopilot, their assisted driving program that uses ultrasonic sensors, radar, and cameras to steer, change lanes, and avoid collisions with little human interaction. Ultimately, this data will be the basis for their autonomous, self-driving car they plan to release in 2018.

Compared to Google’s self-driving program, which has amassed just over 1.5 million miles of driving data. Tesla’s data flywheel is in full effect.


The Algorithm Economy

All the data in the world isn’t very useful if you can’t leverage it. Algorithms are how you efficiently scale the manual management of business processes.

“Everything at scale in this world is going to be managed by algorithms and data,” says Joseph Sirosh, CVP of Data Group and Machine Learning at Microsoft. In the near-future, “every business is an algorithmic business.”

This creates an algorithm economy, where algorithm marketplaces function as the global meeting place for researchers, engineers, and organizations to create, share, and remix algorithmic intelligence at scale. As composable building blocks, algorithms can be stacked together to manipulate data, and extract key insights.

Intelligent App Stack, Artificial Intelligence and Machine LearningIn the algorithm economy, state-of-the-art research is turned into functional, running code, and made available for others to use. The intelligent app stack illustrates the abstraction layers, which form the building blocks needed to create intelligent apps.

“Algorithm marketplaces are similar to the mobile app stores that created the ‘app economy,'” Alexander Linden, research director at Gartner said. “The essence of the app economy is to allow all kinds of individuals to distribute and sell software globally without the need to pitch their idea to investors or set up their own sales, marketing and distribution channels.”


Cloud-Hosted Intelligence

For a company to discover insights about their business, using algorithmic machine intelligence to iteratively learn from their data is the only scalable way. It’s historically been an expensive upfront investment with no guarantee of a significant return.

“Analytics and data science today are like tailoring 40-years ago,” Sirosh said. “It takes a long time and a tremendous amount of effort.”

For instance, an organization needs to first collect custom data, hire a team of data scientists, continually develop the models, and optimize them to keep pace with the rapidly changing and growing volumes of data — that’s just to get started.

The Machine Learning Trend at GoogleWith more data becoming available, and the cost to store it dropping, machine learning is starting to move to the cloud, where a scalable web service is an API call away. Data scientists will no longer need to manage infrastructure or implement custom code. The systems will scale for them, generating new models on the fly, and delivering faster, more accurate results.

“When the effort to build and deploy machine learning models becomes a lot less — when you can ‘mass manufacture’ it — then the data to do that becomes widely available in the cloud,” Sirosh said.

Emerging machine intelligence platforms hosting pre-trained machine learning models-as-a-service will make it easy for companies to get started with ML, allowing them to rapidly take their applications from prototype to production.

“As companies adopt the microservices development paradigm, the ability to plug and play different machine learning models and services to deliver specific functionality becomes more and more interesting,” Somasegar said.

When open source machine learning and deep learning frameworks running in the cloud, like Scikit-Learn, NLTK, Numpy, Caffe, TensorFlow, Theano, or Torch, companies will be able to easily leverage pre-trained, hosted models to tag images, recommend products, and do general natural language processing tasks.


Recap of Machine Learning Trends

“Our world view is that every company today is a data company, and every application is an intelligent application,” Somasegar said. “How can companies get insights from huge amounts of data and learn from that? That’s something that has to be brought up with every organization in the world.”

As the data flywheels begin to turn, the cost to acquire, store, and compute that data will continue to drop.

This creates an algorithm economy, where the building blocks of machine intelligence live in the cloud. These pre-trained, hosted machine learning models make it possible for every app to tap into algorithmic intelligence at scale.

The confluence of data flywheels, the algorithm economy, and cloud-hosted intelligence means:

  • Every company can now be a data company
  • Every company can now access algorithmic intelligence
  • Every app can now be an intelligent app

“We have come a long way,” Matt McIlwain says, Madrona Venture Group managing director. “But we still have a long way to go.”


*Disclosure: Madrona Venture Group is an investor in Algorithmia

Emergent // Future Report: Internet Trends Report, Facebook’s AI Tool, Elon Musk, Blade Runner, and More

Emergent // Future Weekly Tech Roundup: Internet Trends Report, Facebook DeepText, Elon Musk, Blade RunnerIssue 12
This week we recap Mary Meeker’s latest
Internet Trends Report, and check in on Facebook’s latest AI project. Prepare to be shocked by Elon Musk’s wildest ideas yet (spoiler: you live in a video game, and we’re going to Mars!), and a researcher used a neural network to recreate Blade Runner. Plus, what we’re reading and a few things for you to try at home. Not a subscriber? Join the Emergent // Future newsletter here.


Internet Trends Report 📈

You Might Have Heard: Mary Meeker released her annual Internet Trends report.

Now in its 21st year (!), the 213-slide deck covers all sides of the internet economy, including the rise of messaging apps, voice assistants, and more. This essential report is the fastest way to learn everything going on in tech.

tl;dr

  1. The internet now reaches 3 billion users – or about 42% of the world’s population
  2. Smartphone adoption is slowing; Android is increasing marketshare
  3. Mobile video is rapidly growing, led by Snapchat and Facebook Live

Facebook, AI, and DeepText 🔮

Facebook announced DeepText, their AI system built to understand the meaning and sentiment of all text posts on the platform, with the goal of building a better search engine.

“With this new project, Facebook is essentially building the capacity to track all the information put into the network, just as Google crawls the entire web for information and indexes it,”Quartz writes.

DeepText will analyze thousands of posts per second across 20 languages with near-human accuracy.

“The gap between the AI haves and have-nots is widening,”TechCrunch writes.

“If every News Feed post looks interesting, you’ll spend more time on Facebook, you’ll share more text there, DeepText will get smarter, and the Facebook AI feedback wheel will spin faster and faster towards becoming the perfect content recommendation engine.”

Everybody’s Talking About: Elon Musk 🚀

The tech billionaire says “there’s a billion to one chance we’re living in base reality.”

Meaning, our existence is probably really just a video game.

Simulation or not, Musk said he plans to send people to Mars as early as 2024 during an interview with Kara Swisher and Walt Mossberg at Code Conference.

But that’s not all, here’s 7 other not-so-crazy crazy things Elon Musk believes.

WATCH: The full Elon Musk interview

PLUS: These are the vehicles that will being taking you to space

Sci-Fi Serious: Blade Runner 🤖

A London researcher trained a computer to watch the movie Blade Runner. Then, using a neural network, had the computer attempt to reconstruct its impression of the movie in order based on what it had seen.

Essentially creating a new film from the “memories” the neural net had formed – the computer’s interpretation of the film through the eyes of an AI.

But that’s not the weird part.

Warner Bros., which owns the Blade Runner copyright, issued a DMCA takedown notice (!) for an “apparently real” film.

In other words, Vox writes: “Warner had just DMCA’d an artificial reconstruction of a film about artificial intelligence being indistinguishable from humans, because it couldn’t distinguish between the simulation and the real thing.” 😳

To its credit, Warner later rescinded the DMCA request.

The researcher, Terence Broad, has posted a detailed post about autoencoding Blade Runner, and reconstructing films with artificial neural networks.

PLUS: ‘2001’ rendered in the style of Picasso using deep neural networks


What We’re Reading 📚

  • Former NASA chief Daniel Goldin unveiled his startup, KnuEdge, after ten years in stealth. He’s raised $100M to build “neural chips,” built on biological principles about how the brain efficient use of power to gets computing done. It’ll make data centers more efficient in a hyperscale age. (VentureBeat)
  • Alan Kay dropped by Hacker News. Here are some of our favorite posts from the famed computer scientist on Dijkstra, Erlang, the relationship between OO and functional programming, his reading list, and more. (The Macro)
  • Cortana arrives on Xbox One for preview members. Cortana will appear in the Xbox One dashboard, and you’ll be able to access the voice assistant through the Kinect sensor or a headset. As a result, Microsoft is altering the way you activate Xbox voice commands to just “Hey Cortana.” (The Verge)
  • The Barbell Effect of Machine Learning. “On one hand, it will democratize basic intelligence through the commoditization and diffusion of services such as image recognition and translation into software broadly. On the other, it will concentrate higher-order intelligence in the hands of a relatively small number of incumbents that control the lion’s share of their industry’s data.” (Medium)
  • How To Design A Good API and Why it Matters. (YouTube)

Try This At Home 🛠


Emergent Future is a weekly, hand-curated dispatch exploring technology through the lens of artificial intelligence, data science, and the shape of things to come. Subscribe here.

Powered by Algorithmia

Algorithmia Earns Gartner Data Science Cool Vendor Award

Gartner Names Algorithmia Top Cool Vendor Data Science 2016

Algorithmia is excited to announce that it has been selected as a Gartner Cool Data Science vendor for 2016.

Gartner’s annual Cool Vendor report identifies the five most innovative, impactful, and intriguing data science vendors help analytics leaders solve hard problems, and automate and scale their advanced analytics.

Algorithmia provides an open marketplace for algorithms, where developers can create, share, and remix the building blocks of algorithmic intelligence at scale.

On Algorithmia, algorithms run as serverless microservices, where a function can be written once — in any programming language — and then made available as part of a universal API, hosted on scalable infrastructure in the cloud.

This type of platform enables analytics leaders to quickly extract value from their datasets, by leveraging existing algorithms, and creating and hosting new models with ease.

For instance, a developer could create a simple, composable, and fully automated economic development report using predictive algorithms from Algorithmia.

Gartner is the world’s leading information technology research and advisory company.

Read the Gartner Cool Vendors in Data Science 2016 report here.

How We Turned Our GitHub README Model Into a Microservice

Hosting our GitHub Machine Learning ModelA few posts ago we talked about our data science approach to processing and analyzing a GitHub README.

We published the algorithm on our platform, and released our project as a Jupyter Notebook so you could try it for yourself. If you want to skip ahead, here’s the demo we made that grades your GitHub README.

It was a fun project that definitely motivated us to improve our own README’s, and we hope it inspired you, too (I personally got a C on one of mine, ouch)!

In this post, we’d like to take a step back to talk about how we used Algorithmia to host our scikit-learn model as an algorithm, and discuss the benefits associated with using our platform to deploy your own algorithm.

Why Host Your Model on Algorithmia?

There are many advantages to using Algorithmia to host your models.

The first is that you can transform your machine learning model into an algorithm that becomes a scalable, serverless microservice in just a few minutes, written in the language of your choice. Well, as long as that choice is: Python, Java, Rust, Scala, JavaScript or Ruby. But don’t worry, we are always adding more!

Second, you can monetize your proprietary models by collecting royalties on the usage. This way your agonizingly late nights slamming coffee and turning ideas into code won’t be for nothing. On the other hand, if altruism and transparency are your thing (it’s ours!) then you can open-source your projects and publish your algorithms for free.

Either way, the runtime (server) costs are being billed to the user who calls your algorithm. Learn more about pricing and permissions here.


A Step-By-Step Guide to Hosting Your Model

Now, on to the process of how we hosted the GitHub README analyzer model, and made an algorithm! This walkthrough is designed as an introduction to model and algorithm hosting even if you’ve never used Algorithmia before. For the GitHub ReadMe project, we developed, trained, and pickled our models locally. The following steps assume you’ve done the same, and will highlight the activities of getting your trained model onto Algorithmia.

Step 1: Add Your Data

The Algorithmia Data API is used when you have large data requirements or need to preserve state between calls. It allows algorithms to access data from within the same session, but ensures that your data is safe.

To use the Data API, log into your Algorithmia account and create a data collection via the Data Collections page. To get there, click the “Manage Data” link from the user profile icon dropdown in the upper right-hand corner.

Once there, click on “Add Collection” under the “My Collections” section on your data collections page

. After you create and name your data collection, you’ll want to set the read and write access to make it public, or lock it down as private. For more information about the four different types of data collections and permissions checkout our Data API docs.

Besides setting your algorithms permissions, this is also where you load either your data or your models, or both. In our GitHub README example we needed to load our pickled model rather than raw data. Loading your data or model is as easy as clicking the box ‘Drop files here to upload’ or drag and dropping them from your desktop.

You’ll need the path to your files in the next step. For example:
data://username/collection_name/data_model.pkl.zip

Uploading your trained model

Upload your model

Our recommendation is to preload your model before the apply() function (details below). This ensures the model is downloaded, and loaded into memory without causing a timeout when working with large model files. We support up to 4GB of memory per 5-minute session.

When preloading the model like this, only the initial call will load the model into memory, which can take several seconds with large files.

From there, only the apply() function will be called, which will return data much faster.

Step 2: Create an Algorithm

Now that we have our pickled model in a collection, we’ll create our algorithm and set our dependencies.

To add an algorithm, simply click “Add Algortithm” from the user profile icon. There, you will give it a name, pick the language, select permissions, and make the code either open or closed source.

Create your algorithm and set permissions.

Create your algorithm and set permissions.

Once you finish that step, go to your profile from the user profile icon where your algorithm will be listed by name. Click on the name and you’ll be taken to the algorithm’s page that you’ll eventually publish.

There is a purple tag that says “Edit Source.” Click that and it’ll open the editor where you can add your model and update your dependencies for the language of your choice. If you have questions about adding your dependencies check out our Algorithm Developer Guides.

Your new algorithm!

Step 3: Load Your Model

After you’ve set the dependencies, now you can load the pickled model you uploaded in step 1. You’ll want import the libraries and modules you’re using at the top of the file and then create a function that loads the data.

Here’s an example in Python:

import Algorithmia
import pickle
import zipfile

def loadModel():
    # Get file by name
    client = Algorithmia.client()
    # Open file and load model
    file_path = 'data://.my/colllections_name/model_file.pkl.zip'
    model_path = client.file(file_path).getFile().name
    # unzip compressed model file
    zip_ref = zipfile.Zipfile(model_path, ‘r’)
    zip_ref.extract('model_file.pkl’)
    zip_ref.close()
    # load model into memory
    model_file = open(‘model_file.pkl’, ‘rb’)
    model = pickle.load(model_file)
    model_file.close()
    return model

def apply(input):
    client = Algorithmia.client()
    model = loadModel()
    # Do something with your model and return your output for the user
    return some_data

Step 4: Publish Your Algorithm

Now, all you have to do is save/compile, test and publish your algorithm! When you publish your algorithm, you also set the permissions for public or private use, and whether to make it royalty-free or charge a per-call royalty. Also note that you can set permissions for the algorithm version to say whether internet is required or if your algorithm is not allowed to call other algorithms.

Publish your algorithm!

Publish your algorithm!

If you need more information and detailed steps to creating and publishing your algorithm follow these instructions to getting your algorithm on the Algorithmia platform check out our detailed guide to publishing your first algorithm.

Now that you’ve hosted your model and published your algorithm, tell people about it! It’s exciting to see your hard work utilized by others. If you need some inspiration check out the GitHub ReadMe demo, and our use cases.

If you have more questions about building your algorithm check out our Algorithmia Developer Center or get in touch with us.