It’s no big deal for IBM to update its Power Systems. It has done so repeatedly. Earlier this week, Power dropped its latest edition, featuring the brand new Power9 processor and related systems. But it is a big deal that this generation has been tweaked to perform an increasingly sought-after leading-edge task: processing super fast the huge amount of data involved in artificial intelligence (AI) operations.
Long balanced at the cusp of commodity servers based on Intel’s x86 architecture and IBM’s own mainframes, which provide industrial-grade computing to the largest enterprises, Power’s positioning has sometimes been less obvious. IBM touts Power’s flexibility. It is designed for rapid scale-out of primarily Linux workloads, serving the needs of fast-growing but budget-conscious customers. It’s PurePower configuration is aimed at converged infrastructure, where storage, networking and compute proportions can be configured dynamically within the same system. In Linux clusters, it can be devoted to high-performance computing in applications like genomics, finance, computational chemistry, and oil and gas exploration. And the firm also has an enterprise version, optimized for resiliency, availability, security, and performance in big commercial Linux installations, covering both “systems of record” and “systems of engagement.” So, Power already has a number of jobs.
With this version, IBM adds one more chore to the pile: enterprise-scale AI. The very Power9 processor that runs the Power Systems AC922 was built for AI. In collaboration with AI industry leaders, the company has brought data acceleration, advanced analytics, and deep learning to the foreground. “Designed for the AI era” is how IBM describes its latest Power Systems generation. The “AC” in the product name stands for “accelerated computing,” emphasizing that acceleration is a key brick in the AI edifice. In an established partnership with nVidia, IBM has incorporated massive parallelism into its design. The ability to scale this capability puts enterprise-class AI within reach of the data scientists who want it most.
At the lowest level, a combination of high-throughput cores, large caches, and extreme I/O (both on chip and externally to subsystems like memory, graphics, and open system buses) allows huge parallel workloads to process like banshees. The Power9 chips themselves are based on a 14nm process node and have 17 layers of metal, and the interconnect bus can push 7 terabytes of data per second. Both scale-up and scale-out designs are engineered right into the processor, along with native NVLink, PCIe 4.0, and high-speed paths to other potential partner devices.
The redesigned cores can be custom configured for improved efficiency and workload alignment. 64-bit execution slices can be doubled up to 128, allowing for such optimization. Core sizes and counts are also flexible. Scale-out configurations designed for two-socket systems with can be set up with 24 smaller cores per chip for Linux workloads or 12 larger cores for PowerVM workloads. Scale-out configuration designed for four-socket systems are built with the larger cores, but with the potential for adding more buffered memory.
And speaking of memory, one of the featured aspects of this flexible system is memory coherence. Across all its levels of cache and among all its various subsystems, nothing would work well without memory coherence, which is a native, low-level utility in the Power9.
Acceleration is achieved through 192 GB/s of full-duplex PCIe 4.0 bandwidth and 300GB/s full-duplex 25G Link bandwidth, which can operate in concert. In addition to PCIe devices and nVidia graphics processing units (GPUs), custom ASICs and FPGA devices have this bandwidth at their disposal as well through various CAPI interfaces. This broad application of heterogeneous computing enables efficient programming models for complex analytic acceleration and demanding cognitive applications.
Air-cooled versions of this latest generation of Power Systems are generally available now, and early customers include the heavyweights you’d expect for such computing capacity: the CORAL collaboration of Oak Ridge, Argonne, and Livermore laboratories; the National Nuclear Security Administration; and the Office of Science at the U.S. Department of Energy. The aggregated CORAL system itself is likely to become the most powerful supercomputer in the world when completed in 2018.
At the back end of next year, water-cooled versions will come out with two levels of GPU support. The four-GPU version will enable faster data movement. The six-GPU version will provide a balance of compute and data throughput.
Of course, hardware is only part of the story. The co-optimization of hardware and software specifically for machine learning, deep learning, and AI is what makes the new Power Systems tailored for these jobs. IBM has worked closely with software leaders in these areas pull it all together.
With all these capabilities and performance, the AC922 offers the fastest way to deploy accelerated databases and deep-learning frameworks. In deep learning, the AC922 will allow for faster training times and model iteration and training on larger and more complex data sets.
Because of its rapid end-to-end data flow, the AC922 can deliver insights — fished from the vast “data lake” through machine-learning, advanced-analytic, cognitive, and deep-learning modules — fast enough for decision makers to act on them in near real time. Such capacity can stop fraud before it takes place, predict terror attacks, prevent diseases, and personalize customers experiences.
In a briefing with analysts, IBM signaled that throughput, compute density, and bandwidth performance will only increase in future Power generations. So, important big-data insights will be ever closer at hand for decision makers in the coming years.