Algorithmia Blog

Hardware for Machine Learning

https://www.anandtech.com/show/10864/discrete-desktop-gpu-market-trends-q3-2016

If you’re trying to create value in your company through machine learning, you need to be using the best hardware for the task. With CPUs, GPUs, ASICs, and TPUs, things can get kind of confusing.

For most of computing history there was only one type of processor. But the growth of deep learning has led to two new entrants into the field: GPUs and ASICs. This post will walk through the different types of compute chips, where they’re available, and which ones are the best to boost your performance.

Overview and Introduction

Chips are integral to your computer because they’re the brain: processors deal with all of the instructions that other hardware and software throw around. When we’re talking specifically about machine learning, the processor has the role of executing the logic in a given algorithm. If we’re performing Gradient Descent to optimize the cost function, the processing unit is directing and executing it. That means running the basic mathematical computations (matrix multiplication) that drive the algorithm.

The CPU (Central Processing Unit)

The OG processing unit is the CPU, which was first developed by Intel in the early 1970s (pictured below).

As time went on, CPUs grew in speed and capability. For some context, according to Computer Hope, “the first microprocessor was the Intel 4004 that was released on 15 November 1971, and had 2,300 transistors and performed 60,000 operations per second. The Intel Pentium processor has 3,300,000 transistors and performs around 188,000,000 instructions per second.”

http://www.singularity.com/charts/page61.html

Most processors were designed with one core (one CPU), which meant it could only perform one operation at once. IBM released the first dual-core processor in 2001, which was able to “focus” on two tasks at once. Since then, more and more CPUs have been crammed into microprocessors: some modern supercomputers can have more than 40.

Even with recent advances, the fact remains that most computers only have a few cores at most. CPUs are designed for complex computations: they’re very good at rapidly parsing through a detailed and intertwined set of commands. And for most of the tasks a computer needs to do, like swimming aimlessly through a sea of Chrome tabs, that’s exactly what you want. But with machine learning, things can get a bit different.

Machine Learning Poses a New Type of Challenge for Processing

The strength of the CPU is executing a few complex operations very efficiently, and machine learning presents the opposite challenge. Most of the computation in the training process is matrix multiplication, which is a simple but broad task—the calculations are very small and easy, but there are a ton of them. Effectively, the CPU is often overpowered but understaffed.

Advances in data storage are some of the major drivers of the explosion of machine learning over the past decade, and they’ve also compounded this problem. Today we’re training algorithms on more data than ever before, which means more and more small calculations that max out our CPUs.

A far better-optimized chip for machine learning is actually another major processor that’s mass manufactured—something that only has the core complexity to do basic operations, but it can do them at scale all at the same time. Luckily, that chip has already been sitting in our computers for years, and it’s called a GPU.

GPUs Have Risen to the Occasion

GPUs, or Graphics Processing Units, have been around in gaming applications since the early 1970s. The late 80s saw GPUs being added into consumer computers, and by 2018 they’re absolutely standard. What makes a GPU unique is how it handles commands—it’s the exact opposite of a CPU.

GPUs utilize parallel architecture: while a CPU is excellent at handling one set of very complex instructions, a GPU is very good at handling many sets of very simple instructions.

A few years ago, groups in the machine learning community started to realize that these architectural features—that GPUs are excellent for parallel processing of simple operations—might lend well to using them for algorithms. Over time, GPUs started to show massive improvements over CPUs for training models, often in the 10x ballpark for speed. And the stock price of Nvidia, the most well-known manufacturer of these kinds of chips, shows as much (from Google Finance):

https://blogs.nvidia.com/blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/

Nvidia isn’t the only manufacturer of GPUs, but it’s certainly the default one. There are other vendors like AMD, but the software that’s used to integrate with the chips is far behind Nvidia software. Cuda, Nvidia’s platform, is a usable platform for machine learning applications.

The degree to which GPUs have become popular is hard to overstate. They’re in high demand right now, for the original video game applications as well as for machine learning (and even cryptocurrency mining), and prices have been skyrocketing. The price for a standard Nvidia GPU manufactured last year is now higher than it was when released. Algorithmia is the only major vendor that supports serverless execution (FaaS) on GPUs.

ASICs: The Black Horse

ASICs, or Application Specific Integrated Circuits, are the next level of chip design—it’s a processor designed specifically for one type of task. The chip is built to be very good at executing a specific function or type of function.

An awesome and relevant example is cryptocurrency mining, which people seem to attack with endless creativity. If you need an introduction to cryptocurrency, head over to Coindesk’s Blockchain 101 section. But for our purposes, all you need to know is this: mining these currencies on your computer involves brute-force guessing of a specific number. It’s a pretty simple function, but the faster you do it, the higher chance you have at winning; and there’s no variation: you just keep guessing until you get it right.

Google has also gotten into the ASICs game, but with a focus on machine learning. It’s called a TPU (Tensor Processing Unit), and it’s a Google-designed and manufactured chip specifically made for machine learning with Tensorflow, Google’s open-source deep learning framework. TPUs are much faster than the best CPUs and GPUs for programming neural nets, Google claims, but there has been some dispute as to how accurate that figure really is. A third-party benchmark was recently released and found that TPUs can be significantly more efficient than comparable GPUs.   

As machine learning becomes more and more integrated into all the applications we use on a daily basis, expect more research to be done on how to create chips tailored for these tasks.

How to Access And Use CPUs, GPUs, and TPUs

Moving from the theoretical and architectural, in 2018 it’s finally possible to access all of these kinds of chips for your machine learning applications.

CPUs

CPUs make up the bulk of modern public cloud offerings. If you train and run models on normal AWS (Amazon), GCP (Google), or Azure (Microsoft) instances, they’ll be using CPUs. Unless you’re using specific deep learning frameworks that target GPUs on your machine, running algorithms locally will also use your CPUs.

GPUs

Because of the skyrocketing demand and relatively modest supply, GPUs weren’t always that easy to access. Thankfully, the major cloud platforms have bought up enough of them to make it possible today. AWS allows for a GPU-only setup (p3 instance), as well as Google and Microsoft. Algorithmia’s Serverless AI Layer integrates both GPUs and CPUs, and makes it easy to deploy on either.

ASICs

The only widely available ASIC for machine learning applications is Google’s TPU program. As of recently, TPUs are available to the public for compute.

Which Type of Processor is Best For You?

As with anything, the answer is it depends. Projects and products involving machine learning often have varying priorities, ranging from speed to accuracy to reliability. As a general rule, if you can get your hands on a state-of-the-art GPU, it’s your best bet for fast machine learning.

GPU compute will usually be about 4 times as expensive as CPU compute, so if you’re not getting 4 times improved speed, or if speed is less of a priority than cost, you might want to stick with CPUs. Additionally, your training needs will be different than your implementation needs. A GPU may be necessary for training a large Neural Net, but it’s possible that a CPU is more than powerful enough to use the model once built.

TPUs are still experimental, so we’ll need more data before making judgements about tradeoffs.

Further Reading

What’s The Difference Between The CPU And GPU? – “GPUs can handle graphics better because graphics include thousands of tiny calculations that need to be conducted. Instead of sending those tiny equations to the CPU, which could only handle a few at a time, they’re sent to the GPU, which can handle many of them at once.

The History of The Modern Graphics Processor – “While 3D graphics turned a fairly dull PC industry into a light and magic show, they owe their existence to generations of innovative endeavor. Over the next few weeks (this is the first installment on a series of four articles) we’ll be taking an extensive look at the history of the GPU, going from the early days of 3D consumer graphics, to the 3Dfx Voodoo game-changer, the industry’s consolidation at the turn of the century, and today’s modern GPGPU.

The History of Nvidia’s GPUs – “Nvidia formed in 1993 and immediately began work on its first product, the NV1. Taking two years to develop, the NV1 was officially launched in 1995. An innovative chipset for its time, the NV1 was capable of handling both 2D and 3D video, along with included audio processing hardware.

Will ASIC Chips Become The Next Big Thing in AI? – “When Google announced its second generation of ASICs to accelerate the company’s machine learning processing, my phone started ringing off the hook with questions about the potential impact on the semiconductor industry. Would the other members of the Super 7, the world’s largest data centers, all rush to build their own chips for AI? How might this affect NVIDIA, a leading supplier of AI silicon and platforms, and potentially other companies such as AMD, Intel, and the many startups that hope to enter this lucrative market? Is it game over for GPUs and FPGAs just when they were beginning to seem so promising? To answer these and other questions, let us get inside the heads of these Goliaths of the Internet and see what they may be planning.

Papers

Deep Machine Learning on GPUs – “With the increasing use of GPUs in science and the resulting computational power, tasks which were too complex a few years back can now be realized and executed in a reasonable time. Deep Machine learning is one of these tasks.

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs – “Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive parallel computing capability of GPUs make them as one of the ideal platforms to accelerate CNNs and a number of GPU-based CNN libraries have been developed.

Serving deep learning models in a serverless platform – “In this work we evaluate the suitability of a serverless computing environment for the inferencing of large neural network models. Our experimental evaluations are executed on the AWS Lambda environment using the MxNet deep learning framework. Our experimental results show that while the inferencing latency can be within an acceptable range, longer delays due to cold starts can skew the latency distribution and hence risk violating more stringent SLAs.

Scaling Deep Learning on GPU and Knights Landing Clusters – “The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs.

Tutorials

How To Train TensorFlow Models Using GPUs – “GPUs can accelerate the training of machine learning models. In this post, explore the setup of a GPU-enabled AWS instance to train a neural network in TensorFlow.

Deep Learning CNNs in TensorFlow with GPUs – “In this tutorial, you’ll learn the architecture of a convolutional neural network (CNN), how to create a CNN in TensorFlow, and provide predictions on labels of images. Finally, you’ll learn how to run the model on a GPU so you can spend your time creating better models, not waiting for them to converge.

Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning – “With a good, solid GPU, one can quickly iterate over deep learning networks, and run experiments in days instead of months, hours instead of days, minutes instead of hours. So making the right choice when it comes to buying a GPU is critical. So how do you select the GPU which is right for you? This blog post will delve into that question and will lend you advice which will help you to make choice that is right for you.

Using GPUs With TensorFlow (From Google)