Azure Blob and Google Cloud Storage
In an effort to constantly improve products for our customers, this month we introduced two additional data providers into Algorithmia’s data abstraction service: Azure Blob Storage and Google Cloud Storage. This update allows algorithm developers to read and write data without worrying about the underlying data source. Additionally, developers who consume algorithms never need to worry about passing sensitive credentials to an algorithm since Algorithmia securely brokers the connection for them.
How Easy is it?
By creating an Algorithmia account, you automatically have access to our Hosted Data Source where you can store your data or algorithm output. If you have a Dropbox, Azure Blob Storage, Google Cloud Storage, or an Amazon S3 account, you can configure a new data source to permit Algorithmia to read and write files on your behalf. All data sources have a protocol and a label that you will use to reference your data.
We create these labels because you may want to add multiple connections to the same data provider account and they will each need a unique label for later reference in your algorithm. You might want to have multiple connections to the same source so you can set different access permissions to each connection, such as read from one file and write to a different folder.
These providers are available now in addition to Amazon S3, Dropbox, and the Algorithmia Hosted Data service. These options will provide our users with even more flexibility when incorporating Algorithmia’s services into their infrastructures.
Learn more about how Algorithmia enables data connection on our site.
We’d love to know which other data providers developers are interested in, and we’ll keep shipping new providers in future releases. Get in touch if you have suggestions!
Sometimes the best advertising is a small, nondescript company name etched onto an equally nondescript door in a back alley, only accessible by foot traffic. Lucky for us, Paul Borza of TalentSort—a recruiting search engine that mines open-source code and ranks software engineers based on their skills—was curious about Algorithmia when he happened to walk by our office near Pike Place Market one day.
“It’s funny how I stumbled on Algorithmia. I was waiting for a friend of mine in front of
The Pink Door, but my friend was late so I started walking around. Next door I noticed a cool logo and the name ‘Algorithmia.’ Working in tech, I thought it must be a startup so I looked up the name and learned that Algorithmia was building an AI marketplace. It was such a coincidence!”
Paul Needed an Algorithm Marketplace
“Two weeks before I had tried monetizing my APIs on AWS but had given up because it was too cumbersome. So rather than waste my time with bad development experiences, I was willing to wait for someone else to develop a proper AI marketplace; then I stumbled upon Algorithmia.”
Paul Found Algorithmia
“I went home that day and in a few hours I managed to publish two of my machine learning models on Algorithmia. It was such a breeze! Publishing something similar on AWS would have taken at least a week.”
We asked Paul what made his experience using Algorithmia’s marketplace so easy:
“Before I started publishing algorithms, I wanted to see if Algorthmia fit our company’s needs. The “Run an Example” feature was helpful in assessing the quality of an algorithm on the website; no code required. I loved the experience as a potential customer.”
“To create an API, I started the process on the Algorithmia website. Each API has its own git repository with some initial boilerplate code. I cloned that repository and added my code to the empty function that was part of the boilerplate code, and that was it! The algorithm was up and running on the Algorithmia platform. Then I added a description, a default JSON example, and documentation via Markdown.”
“The beauty of Algorithmia is that as a developer, you only care about the code. And that’s what I wanted to focus on: the code, not the customer sign-up or billing process. And Algorithmia allowed me to do that.”
Paul is Smart; Be like Paul
Paul’s algorithms are the building blocks of TalentSort; they enable customers to improve their recruiting efficiency. The models are trained on 1.5 million names from more than 30 countries and have an accuracy rate of more than 95 percent at determining country of origin and gender. Also, the algorithms don’t call into any other external service, so there’s no data leakage. Try them out in the Algorithmia marketplace today:
Paul’s relentless curiosity led him to Algorithmia’s marketplace where his tools became part of more than 7,000 unique algorithms available for use now.
At Algorithmia, we’ve always been maniacally focused on the deployment of machine learning models at scale. Our research shows that deploying algorithms is the main challenge for most organizations exploring how machine learning can optimize their business.
In a survey we conducted this year, more than 500 business decision makers said that their data science and machine learning teams spent less than 25% of their time on training and iterating models. Most organizations get stuck deploying and productionizing their machine learning models at scale.
The challenge of productionizing models at scale comes late in the lifecycle of enterprise machine learning but is often critical to getting a return on investment on AI. Being able to support heterogeneous hardware, conduct versioning of models, and run model evaluations is underappreciated until problems crop up from not having taken these steps.
At the AWS re:Invent conference in Las Vegas this week, Amazon announced several updates to SageMaker, its machine learning service. Notable were mentions of forthcoming forecast models, a tool for building datasets to train models, an inference service for cost savings, and a small algorithm marketplace to—as AWS describes—“put [machine learning] in the hands of every developer.”
“What AWS just did was cement the notion that discoverability and accessibility of AI models are key to success and adoption at the industry level, and offering more marketplaces and options to customers is what will ultimately drive the advancement
–Kenny Daniel, CTO, Algorithmia
Amazon and other cloud providers are increasing their focus on novel uses for machine learning and artificial intelligence, which is great for the industry writ large. Algorithmia will continue to provide users seamless deployment of enterprise machine learning models at scale in a flexible, multi-cloud environment.
Deploying at Scale
For machine learning to make a difference at the enterprise level, deployment at scale is critical and making post-production deployment of models easy is mandatory. Algorithmia has four years of experience putting customer needs first, and we focus our efforts on providing scalability, flexibility, standardization, and extensibility.
We are heading toward a world of standardization for machine learning and AI, and companies will pick and choose the tools that will make them the most successful. We may be biased, but we are confident that Algorithmia is the best enterprise platform for companies looking to get the most out of their machine learning models because of our dedication to post-production service.
Being Steadfastly Flexible
Users want to be able to select from the best tools in data labeling, training, deployment, and productionization. Standard, customizable frameworks like PyTorch and TensorFlow and common file formats like ONNX increase flexibility for users for their specific needs. Algorithmia has been preaching and executing on this for years.
Standard, customizable frameworks increase flexibility for users for their specific needs. Algorithmia has been preaching this for years.
–Kenny Daniel, CTO, Algorithmia
For at-scale enterprise machine learning, companies need flexibility and modular applications that easily integrate with their existing infrastructure. Algorithmia hosts the largest machine learning model marketplace in the world, with more than 7,000 models, and more than 80,000 developers use our platform.
“I expect more AI marketplaces to pop up over time and each will have their strengths and weaknesses. We have been building these marketplaces inside the largest enterprises, and I see the advantages of doing this kind of build-out to accelerate widespread
–Diego Oppenheimer, CEO, Algorithmia
It is Algorithmia’s goal to remain focused on our customers’ success, pushing the machine learning industry forward. We encourage you to try out our platform, or better yet, book a demo with one of our engineers to see how Algorithmia’s AI layer is the best in class.
At Algorithmia, we have much to be thankful for—it’s even one of our core tenets. So in light of Thanksgiving, we have compiled a list of all that we’re particularly appreciative of this year. Some of our staff are thankful for the little things—snacks and a dog-friendly office—and some are glad of more practical things—the freedom to develop skills and experience for career development. Regardless, Algorithmia is eternally grateful for our customers and contributors.
“I’m thankful for the flexibility in where we live and when and how we get our work done!”
–Stephanie, Developer Advocate
“I am thankful that I work at a company that has a great set of values that we live by. One of them is actually, “We are thankful”! We are thankful for every single one of our users, customers, and contributors. We do not exist without them and always strive to make their experiences better.”
–Jonah-Kai, Head of Growth Marketing
“I’m thankful for working with some incredibly talented people.”
–Besir, Algorithm Engineer
“I’m thankful for how helpful and supportive my team is.”
–Adnaan, Back End Engineer
“I’m thankful for board game night.”
–James, Product Designer
“I’m thankful that I get to work on complex and creative projects!”
–Whitney, Content Marketing Manager
“I’m thankful for the growth mindset and intellectually curious culture!”
–Ken, Enterprise Sales Development Rep
“I am thankful for the team’s willingness to jump in and fix problems, always. I call it a winner attitude.”
“I am thankful for the remote friendly culture.”
–Rowell, Senior Platform Engineer
“I’m thankful for interesting, challenging, and creative opportunities every day.”
–Jon, Developer Advocate
“I’m thankful for the awesome views from the devpit (even if the blinds are down more often than not).”
–Ryan, Front End Engineering Lead
As we look toward the end of the year, we are also thankful to have the opportunity to give back to others and help underserved communities.
If your neural nets are getting larger and larger but your training sets aren’t, you’re going to hit an accuracy wall. If you want to train better models with less data, I’ve got good news for you.
Dataset augmentation – the process of applying simple and complex transformations like flipping or style transfer to your data – can help overcome the increasingly large requirements of Deep Learning models. This post will walk through why dataset augmentation is important, how it works, and how Deep Learning fits in to the equation.
It’s hard to build the right dataset from scratch
“What’s wrong with my dataset?!?”
Don’t worry, we didn’t mean to insult you. It’s not your fault: it’s Deep Learning’s fault. Algorithms are getting infinitely more complex, and neural nets are getting deeper and deeper. More layers in neural nets means more parameters that your model is learning from your data. In some of the recent more state of the art models we’ve seen, there can be more than 100 million parameters learned during training:
When your model is trying to understand a relationship this deeply, it needs a lot of examples to learn from. That’s why popular datasets for models like these might have something like 10,000 images for training. That size of data is not at all easy to come by.
Even if you’re using simpler or smaller types of models, it’s challenging to organize a dataset large enough to train effectively. Especially as Machine Learning gets applied to newer and newer verticals, it’s becoming harder and harder to find reliable training data. If you wanted to create a classifier to distinguish iPhones from Google Pixels, how would you get thousands of different photos?
Finally, even with the right size training set, things can still go awry. Remember that algorithms don’t think like humans: while you classify images based on a natural understanding of what’s in the image, algorithms are learning that on the fly. If you’re creating a cat / dog classifier and most of your training images for dogs have a snowy background, your algorithm might end up learning the wrong rules. Having images from varied perspectives and with different contexts is crucial.
Dataset augmentation can multiply your data’s effectiveness
For all of the reasons outlined above, it’s important to be able to augment your dataset: to make it more effective without acquiring loads of more training data. Dataset augmentation applies transformations to your training examples: they can be as simple as flipping an image, or as complicated as applying neural style transfer. The idea is that by changing the makeup of your data, you can improve your performance and increase your training set size.
For an idea of just how much this process can help, check out this benchmark that NanoNets ran in their explainer post. Their results showed an almost 20 percentage point increase in test accuracy with dataset augmentation applied.
It’s safer for us to assume the cause of this accuracy boost was a bit more complicated than just dataset augmentation, but the message is clear: it can really help.
Before we dive into what you might practically do to augment your data, it’s worth noting that there are two broad approaches to when to augment it. In offline dataset augmentation, transforms are applied en masse to your dataset before training. You might, for example, flip each of your images horizontally and vertically, resulting in a training set with twice as many examples. In online dataset augmentation, transforms are applied in real time as batches are passed into training. This won’t help with size, but is much quicker for larger training sets.
How basic dataset augmentation works
Basic augmentation is super simple, at least when it comes to images: just try to imagine all the things you could do in photoshop with a picture! A few of the simple and popular ones include:
- Flipping (both vertically and horizontally)
- Zooming and scaling
- Translating (moving along the x or y axis)
- Adding Gaussian noise (distortion of high frequency features)
Most of these transformations have fairly simple implementations in packages like Tensorflow. And though they might seem simple, combining them in creative ways across your dataset can yield impressive improvements in model accuracy.
One issue that often comes up is input size requirements, which are one of the most frustrating parts of neural nets for practitioners. If you shift or rotate an image, you’re going to end up with something that’s a different size, and that needs to be fixed before training. Different approaches advocate filling in empty space with constant values, zooming in until you’ve reached the right size, or reflecting pixel values into your empty space. As with any preprocessing, testing and validating is the best way to find a definitive answer.
Deep Learning for dataset augmentation
Moving from the simple to the complex, there are some more interesting things than just flips and rotations that you can do to your dataset to make it more robust.
Neural Style Transfer
Neural networks have proven effective in transferring stylistic elements from one image to another, like “Starry Stanford” here:
Source: Artists and Machine Intelligence
You can utilize pretrained nets that transfer exterior styles onto your training images as part of a dataset augmentation pipeline.
Generative Adversarial Networks
A new type of algorithm called GANs have been stealing headlines lately for their ability to generate content (of all types) that’s actually pretty good. Using these types of algorithms, researchers were able to apply image-to-image translation and get some interesting results. Here are a few of the images they worked on:
Source: UC Berkeley
Although it’s not entirely computationally feasible right now, it’s clear that this kind of technology can open doors for much more sophisticated dataset augmentation.
Google recently released a paper outlining a framework for AutoAugment, or using Machine Learning to augment your dataset. This is Machine Learning to improve Machine Learning: Machine Learning-ception. The idea is that the right augmentations depend on your dataset, and can be learned through a model: even though the actual augmentations themselves are pretty simple.
On Algorithmia, our Image Augmentation algorithm implements the ideas put forth in the paper. It’s as easy as sending a directory of images to our API endpoint and getting the augmented images back. Check it out and consider implementing it as part of your Machine Learning pipeline.