AMD is making the most of TSMC's 7nm process advantage over Intel. AMD today charted out its plans for the next few years of product development, with an array of new CPUs and GPUs in the development pipeline. On the GPU front are two new datacenter-oriented GPUs: the Radeon Instinct MI60 and MI50. Based on the Vega architecture and built on TSMC's 7nm process, the cards are aimed not primarily at graphics (despite what one might think given that they're called GPUs) but rather at machine learning, high-performance computing, and rendering applications. MI60 will come with 32GB of ECC HBM2 (second-generation High-Bandwidth Memory) while the MI50 gets 16GB, and both have a memory bandwidth up to 1TB/s. ECC is also used to protect all internal memory within the GPUs themselves. The cards will also support PCIe 4.0 (which doubles the transfer rate of PCIe 3.0) and direct GPU-to-GPU links using AMD's Infinity Fabric. This will offer up to 200GB/s of bandwidth (three times more than PCIe 4) between up to 4 GPUs. The cards will support a wide range of data types for computation; for neural networks and machine learning, there are half-precision 16-bit floating point and 4- and 8-bit integer support; for HPC workloads, there's single (32-bit) and double (64-bit) precision floating point. AMD claims that the MI60 will be the fastest double-precision accelerator at up to 7.4TFLOPS, with the MI50 not far behind at 6.7TFLOPS. The cards also include built-in support for virtualization, allowing one card to be securely shared between multiple virtual machines. This makes it easier for cloud operators to offer GPU-accelerated virtual machines. The MI60 will ship to datacenter customers by the end of the year; MI50 is coming a little later but should be available by the end of Q1 2019. On the CPU side of things, AMD talked extensively about the forthcoming Zen 2 architecture. The goal of the original Zen architecture was to get AMD, at the very least, competitive with what Intel had to offer. AMD knew that Zen would not take the performance lead from Intel, but the pricing and features of its chips made them nonetheless attractive, especially in workloads that highlighted certain shortcomings of Intel's parts (fewer memory channels, less I/O bandwidth). Zen 2 promises to be not merely competitive with Intel, but superior to it. Key to this is TSMC's 7nm process, which offers twice the transistor density of the 14nm process the original Zen parts used. For the same performance level, power is reduced by about 50 percent, or, conversely, at the same power consumption, performance is increased by about 25 percent. TSMC's 14nm and 12nm processes both trail behind Intel's 14nm process in terms of performance per watt, but with 7nm, TSMC will take the lead. Zen 2 will also address certain weak aspects of the original Zen. For example, the original Zen used 128-bit data paths to handle 256-bit AVX2 operations; each operation was split into two parts and processed sequentially. In workloads using AVX2, this gave Intel, with its native 256-bit implementation, a huge advantage. Zen 2 doubles the floating-point execution units and data paths to be 256-bit, doubling the bandwidth available and greatly improving the performance of this code. For integer workloads, branch prediction and prefetching have been made more accurate and some caches enlarged. Zen 2 will also offer improved hardware protection against some variants of the Spectre attacks. The original Zen used a multichip module design. Chips used one, two, or four dies (for Ryzen, first-generation Threadripper, and Epyc/second-generation Threadripper, respectively) all put together into a single package. Each die had two Core Complexes (blocks of four cores), two memory controllers, some Infinity Fabric links (for connections between dies), and some PCIe channels. This made it straightforward for AMD to scale from the single-die, 8-core/16-thread Ryzen up to the 32-core/64-thread Epyc. Zen 2 is taking a very different approach, albeit one that still uses a multichip design. Instead of having each die contain CPUs, memory controllers, and I/O, the new design splits up the different roles. There will be a single 14nm I/O die, with eight memory controllers, eight Infinity Fabric ports, and PCIe lanes, and then a number of 7nm "chiplets" containing only CPUs and Infinity Fabric. This new approach should remedy some of the more awkward aspects of the original Zen; for example, there is a significant latency overhead when a core on one Zen die has to use memory from another die. With the Zen 2 design, memory latency should become much more uniform. AMD says that Zen 2 is sampling now, with processors due to hit the market in 2019. Zen 3, using an enhanced version of the 7nm process, is currently "on track" and likely to land in 2020, and Zen 4, on a more advanced process, is currently in the design stage.
You'll have to wait a while longer for gaming cards. AMD is following through on its promise of releasing 7-nanometer GPUs -- not that you can use one yet. The company has formally launched Radeon Instinct MI50 and MI60 cards that use the denser, more efficient chip technology to accelerate specialized computing tasks like AI, cloud services and scientific calculations. The MI60 in particular is billed as the fastest double-precision accelerator of its type, pumping out 7.4 teraflops when crunching 64-bit floating point data. Both boards pack very high-bandwidth (up to 1TB/s) HBM2 memory and can work together in "hive rings" of up to four GPUs thanks to 200GB/s peer-to-peer links. The MI60 will make the promise of 7nm GPUs a reality by shipping to data centers before the end of 2018, while its more accessible MI50 counterpart should arrive no later than the first quarter of 2019. This isn't the 7nm gaming card many people are clamoring for, but it's still a milestone for the computing industry -- you can finally find 7nm tech in a GPU outside of a mobile chip. NVIDIA's RTX graphics hardware remains built on a 12nm process. Look at this as AMD laying the groundwork for 2019, when 7nm could is more likely to find its way inside your gaming rig.
Advanced Micro Devices revealed the Zen 2 architecture for the family of processors that it will launch in the coming years, starting with 2019. The move is a follow-up to the competitive Zen designs that AMD launched in March 2017, and it promises two-times improvement in performance throughput. AMD hopes the Zen 2 processors will keep it ahead of or at parity with Intel, the worlds biggest maker of PC processors. The earlier Zen designs enabled chips that could process 52 percent more instructions per clock cycle than the previous generation. Lisa Su, CEO of Santa Clara, California-based AMD, made the announcement at an AMD press and analyst event in San Francisco. So much has really happened in the last two years, she said. Ive been CEO for four years. Its been an incredible four years. But we are just at the beginning of our journey. Zen has spawned AMDs most competitive chips in a decade, including Ryzen for the desktop, Threadripper (with up to 32 cores) for gamers, Ryzen Mobile for laptops, and Epyc for servers. In the future, you can expect to see Zen 2 cores in future models of those families of chips. AMDs focus is on making central processing units (CPUs), graphics processing units (GPUs), and accelerated processing units (APUs) that put the two other units together on the same chip. Zen 2 is our next-generation system architecture, Su said, noting chips using it will be made with 7-nanometer manufacturing, where the width between circuits is seven billionths of a meter. Su said the new chips will be targeted for the workloads of the future, including machine learning, big data analytics, cloud, and other tasks. AMD is going after the $29 billion total available market for data center chips by 2021. We see strong double-digit growth for the foreseeable future for the overall market, she said. We are not looking at incremental changes. The products you are seeing today are the products of the decisions we made four or five years ago. They were bets on where we think the market was going. The Zen-based designs are AMDs most competitive in a decade, and it now has every major computer maker using the Epyc chips for servers, from HP Enterprise to Dell. It is also feeding chips to data centers that run cloud deployments for Microsoft, Baidu, Tencent, Oracle, and others. AMD and Amazon Web Services announced today that Amazon Elastic Compute Cloud will use AMD Epyc CPUs, so customers can get access today to instances running on the AMD processors. Intel noted that it has an extensive relationship with AWS. The next-generation Epyc platform is code-named Rome, which will debut next year with 7-nanometer technology. Mark Papermaster, AMD chief technology officer, said AMD took a holistic design approach to creating Zen 2. Zen 2 marks the delivery of our promise of continuity, he said. We called a play and we are delivering. We are executing. Zen 2 chips are sampling today at 7-nanometer manufacturing, compared to the shipping 14-nanometer Zen processors that debuted in 2017. Zen 3 is on track to debut on 7-nanometer in 2020. AMD is using TSMC, the chip contract manufacturer, to make its 7-nanometer chips. Intel, meanwhile, has delayed its equivalent chips, dubbed 10-nanometer but at the same technology level, until late 2019. Zen 2 can get twice the throughput thanks to better branch prediction, or predicting what kind of processing will be necessary for the next computation. It also has better 256-bit load/store floating point processing, or double the previous generation. Zen 2 will also have stronger built-in security, where data can be fully encrypted as it is transferred to memory. You will see a huge jump as we go to Zen 2 products, Papermaster said. Intel has not yet made a comment, but it has scheduled a December 11 event to talk about its architecture. AMD also has a chiplet design approach with modular components on the chip that can more efficiently feed and receive data from processor cores. It will also have higher instructions per clock than the original Zen products. Kevin Krewell, analyst at Tirias Research, noted that AMD did not describe the instructions per clock cycle for Zen 2, but he assumes it will be better than the original Zen. He noted the doubled floating point performance figure was impressive. Forrest Norrod, senior vice president at AMD said that Epyc adoption can lead to 45 percent lower total cost of ownership (TCO) compared to Intel-based systems. He said that comes as a result of lower admin, licensing, hardware, and space costs. Pete Ungaro, CEO of Cray, said onstage that Crays next Shasta supercomputer with use AMD Epyc processors. The machine will be made for government agencies such as the Lawrence Berkeley National Laboratory and run at 100 petaflops.
Advanced Micro Devices unveiled its Radeon Instinct MI60 graphics processing unit (GPU) for the data center. It promises 1.25 times performance and twice the transistor density of the previous generation. David Wang, senior vice president of engineering in the Radeon Technologies Group, made the announcement at AMDs press and analyst day in San Francisco. He said it can deliver up to 7.4 teraflops of 64-bit floating point peak performance. The new Vega-based GPUs debuting later this year will be built on a 7-nanometer manufacturing process. AMD also described its Zen 2 architecture for new families of central processing units (CPUs) coming in 2019. The GPUs in the cloud will be useful for cloud gaming, virtual desktops and workstations, machine learning, and high-performance computing, Wang said. The total available market is $12 billion by 2021, Wang said. This is the worlds first 7-nanometer GPU, Wang said. It has 13.2 billion transistors, or twice the density of the previous generation, and 1.25 times the performance. It is the worlds fastest floating point 64 and floating point 32 PCIe GPU, he said. AMD will also have an MI50 version GPU available. One Epyc central processing unit (CPU) can connect without bridges to four Radeon Instinct GPUs via the Infinity fabric. The chip also has a third generation of AMDs hardware virtualization, so many users can use a single GPU. This is really our differentiation, and it comes for free, Wang said. In the data center, the GPU can handle machine learning tasks. AMD is releasing ROCm 2.0 open source software for machine learning tasks. Supporters include Baidu, which is using AMD tech. On one benchmark, AMDs MI60 GPU is 8.8 times faster on the DGEMM benchmark than the previous generation 14-nanometer MI25 GPU. On Resnet-50 image processing, it is 2.8 times faster. Wang claimed that AMDs chip can acheive comparable performance to Nvidias Tesla V100 PCIe rival chip. More significant, noted analyst Kevin Krewell of Tirias Research, is that AMDs die size (size of the chip) is less than half the size of the Nvidia chip. That translates into lower costs and lower prices. Peter McGuinness, CEO of startup Highwai, showed how the chip can be used to produce simulated worlds for machine learning, using massive data sets. He showed in real-time how the AMD chip can be used to process data from a self-driving car in real-time, simulating what would be necessary for a car moving down a street. The AMD Radeon Instinct MI60 chip is expected to ship to data center customers by the end of 2018, and the AMD Radeon Instinct MI50 accelerator is expected to begin shipping by the end of the first quarter of 2019. Wang also teased a MI Next chip coming in the future with software compatibility to previous chips. Patrick Moorhead, analyst at Moor Insights & Strategy, said, AMD moved the ball down the field from a hardware perspective with Instincts 7-nanometer design. I am impressed with its one terabyte per second memory bandwidth, ganging with Epyc and Infinity Fabric, and density. I believe its degree of success will be directly related to it uptake of ROCm 2.0 software into customers workflow. AMD Radeon has always had good hardware and it takes hardware, software plus go-to-market to fully move the needle.
Advanced Micro Devices said that its next-generation Epyc server chip module, code-named Rome, will have 64 cores based on the Zen 2 architecture. It will also have twice the performance per central processing unit socket as the previous generation, and it will have four times the floating point performance per socket. Rome will consist of eight chips, with eight cores per die, all glued together in a multichip module with accompanying input-output functions — in a single socket. Lisa Su, CEO of AMD, said that the new chip module will debut next year with a 7-nanometer manufacturing process (where the circuits are seven billionths of a meter wide). Rome will have eight 7-nanometer cores per die (or chip), plus a 14-nanometer input/output die. It is the best datacenter processor in the industry, said Su, speaking onstage at the AMD press and analyst day in San Francisco. We are absolutely on track to debut Rome in 2019. This is our space. This is where we lead. The input-output chip will be made with a 14-nanometer manufacturing process. Su and AMD senior vice president Forrest Norrod showed a demo of Rome executing a benchmark. It completed the test in 28 seconds, in comparison to 30 seconds for an Intel two-socket solution with the Intel Xeon Scalable 8180M. Thats a pretty impressive result, said Bob ODonnell, analyst at TECHnalysis Research. The Rome module will have Zen 2 cores, which are based on the second-generation architecture of the Zen platform that AMD introduced in the spring of 2017. Those Zen chips could execute 52 percent more instructions per clock cycle than the previous generation, and Su said that the Rome chips will beat that measure. Zen 2 chips are sampling today at 7-nanometer manufacturing, compared to the shipping 14-nanometer Zen processors that debuted in 2017. Zen 3 is on track to debut on 7-nanometer in 2020. AMD is using TSMC, the chip contract manufacturer, to make its 7-nanometer chips. Intel, meanwhile, has delayed its equivalent chips, dubbed 10-nanometer but at the same technology level, until late 2019. Zen 2 can get twice the throughput thanks to better branch prediction, or predicting what kind of processing will be necessary for the next computation. It also has better 256-bit load/store floating point processing, or double the previous generation. Zen 2 will also have stronger built-in security, where data can be fully encrypted as it is transferred to memory. Norrod said that Epyc adoption can lead to 45 percent lower total cost of ownership (TCO) compared to Intel-based systems. He said that comes as a result of lower admin, licensing, hardware, and space costs. Pete Ungaro, CEO of Cray, said onstage that his companys upcoming Shasta supercomputer will use AMD Epyc processors. The machine will be made for government agencies such as the Lawrence Berkeley National Laboratory and will run at 100 petaflops. Patrick Moorhead, analyst at Moor Insights & Strategy, said he thinks the large multichip modules are the future of entire chip industry. AMD called the chip components within this package chiplets. As for Rome, he said, AMD is showing yet again its commitment to a very aggressive product improvement roadmap. With Rome, AMD is changing everything. It is changing its system-on-chip architecture to 7-nanometer chiplets with an improved Infinity Fabric, doubling cores per socket, doubling bandwidth per socket, adding PCIe 4.0 and improving core and FPU capabilities. He added, AMD says this will deliver an impressive 2X performance per socket and 4X on floating point unit (FPU) per socket. With all these improvements, AMD made Rome socket-compatible with Naples, which should accelerate uptake with [computer makers] and ultimately, end customers. As AMD has delivered on its promises the last two years, I have little doubt AMD will deliver on-time, at quality.