Google recently announced that it set three performance records in the latest MLPerf benchmark consortium (maximum likelihood performance inference for the uninitiated). MLPerf v0.6 results are now available on GitHub — where Nvidia also took home a few new records. These types of benchmark practices act as valuable information for AI experts looking to understand key performance features in hardware and cloud resources.
Google decided to benchmark its new Google Cloud TPU (tensor processing unit) Pods — which went into public beta a few months ago. While Google has been using its tensor processing units (TPU) for in-house AI workloads since 2015, this newest version (v3) broke into public beta a few months back. And it's safe to say that these pods are monsters. With a single pod containing 1,000 tensor processing units, these liquid-cooled AI workhorses bring over 100 petaflops of computing power to the table.
It's not surprising that Google took home some benchmark records, even against Nvidia's TPU powerhouse, the Nvidia DGX SuperPOD. This new wave of tensor unit technology is rapidly changing the way data scientists measure and utilize AI, especially at research levels. And, with Nvidia, Google, and Intel putting pods up for the benchmark consortium, there was sure to be some record-breaking action.
Google's Cloud TPU Pods v3 outperformed on-premise systems at large-scale AI processing in three key areas. In two areas (Transformer and SSD), Google's TPU Pods were 84% faster at training models.
These three performance areas were:
- ResNet-50 image classification
- English to German machine translation of the Transformer model
- Another model training method with object detection
Understanding TPU and GC TPU Pods
Let's take a breather and discuss tensor processing units and why they're essential for large-scale cloud computing & AI in 2019. There's a common thread linking Google Maps, Google Search, Google Cloud, and all other Google-y things together, and that's tensor processing units (TPUs).
When Google was first starting to develop complex neural networks, they tried a bunch of different "stuff" to see what worked well with large-scale AI and neural computing. CPUs and microprocessors weren't up for the task, and GPUs had their own set of specific issues. So Google decided to build its unique solution. AI accelerator integrated circuits were developed that leveraged Google's TensorFlow AI framework — an open-source library Google uses for machine learning.
We won't dive too deep into the structure. But, there's a great post on Google detailing some of their TPU structure, which has been used by Nvidia and other AI competitors.
The important thing to remember is that TPUs outperform CPUs by around 15 - 30x and GPUs by approximately 30 - 80x. Google Cloud TPU Pods have over 1,000 of these TPUs stacked together. That's 100 petaflops of computing power. Google is serving them up to businesses looking to iterate ML faster, boost their neural network capabilities, and leverage AI computing at-scale.
What Does This All Mean?
Let's be honest — AI is becoming a monstrous resource hog. The amount of computing power we go through at Bitvore is incredible. We can only imagine what some multinational organizations leveraging AI must employ.
Luckily, these massive computing operations that let us create extremely complex neural networks and AI programs (like AI that can curate news) are becoming cheaper and more efficient. This enables us (and you) to build and deliver better products. The minds at Google and Nvidia are driving much of that. For some context on how fast the TPU Pods are, two years ago, Nvidia took 8 hours to complete the ResNet-50 image classification model. Google just bested it in under 80 seconds.
Note: Nividia calls their inferencing processors GPUs. Frankly, I'm not sure that a TPU pod is 80x faster than a GPU. Are we comparing apples to apples?
Google's TPU Pods (v3) are looking like a desirable option for those who are running ML/AI workloads on local GPU clusters. The pure speed, low price, and crazy availability times of modern ML cloud computing are simply stunning. TPU technology is also in an appealing spot. It's inexpensive and outperforms GPU and CPUs.
To be clear, TPUs are not suitable for all machine learning jobs. Google is still running plenty of GPU and CPU clusters for a reason. However, if running a deep learning job that utilizes TensorFlow, you should at least take Google or Nvidia for a test drive. TPU Pods are breaking records and helping shape modern AI computing while they're at it.
Are we even close?
Despite the power of these clusters of devices, the world is still relatively far away from the ability to continuously train, deploy, and execute models. There are two parts to speed: reducing training time for large AI projects which can take days or weeks for very big training sets and the actual inferencing when make a prediction.
Think of the parallel years ago. Software developers used to edit their source code and then invoke a compiler. Compilation often took minutes or longer for smaller subsystems. For a complete system compilation for more massive software like avionics or a complex banking application, the compilation of the system could take hours.
When compiler technology progressed to the point that calculations occurred in seconds, in the background, and continuously, it changed how people developed software. AI training and inferencing is at the same point compilers were years ago--still not fast enough.
Every change to the training set should rebuild the AI model. Every combination of data to test and validate the model against dozens of other models must happen simultaneously. Now, even with the fastest hardware and software, it takes a long time when training and validation really should only take seconds in the background.
Because of the time training takes, most machine learning developers don't train and validate by shuffling through all combinations in the data to avoid training biases. Instead, they'll train on one or two splits for one or two training runs. Developers need to train numerous models in parallel with various training parameters and multiple combinations of the training data. Having the time or computing resources to do that is very rare.
Even with one or two models, the ability to automatically choose the best performing one and to deploy it is still a complicated task. After the model is deployed to the cloud servers or the edge devices, inferencing should be able to be done thousands of times a second instead of one or a dozen per minute even on the smallest edge devices.
While these new TPU/GPU clusters are impressive, I'm still hoping for the day that the software and hardware can continuously crank out and compare models for large training sets instantly and thoroughly. Until then, I'll take all the speedup I can get.
If you would like to learn more about Bitvore AI and it's capabilities, check out our whitepaper Using AI-Processed News Datasets to Perform Predictive Analytics.