Tuesday, July 9, 2024
HomeFeatureAMD flexes HPC and AI superiority at Intel and NVIDIA with Genoa-X,...

AMD flexes HPC and AI superiority at Intel and NVIDIA with Genoa-X, Bergamo, and the Instinct MI300X GPU

AMD HPC and AI updates

Note: This feature was first published on 20 June 2023.

AMD CEO Dr. Lisa Su holding up the new 4th-gen EPYC "Bergamo" cloud native server CPU.

During their recent Data Center and AI Technology Premiere event in San Francisco, AMD reinforced the fact that they are the only microprocessor company that has a CPU, GPU, and APU for every type of workload, unlike rivals Intel and NVIDIA, which as we know are each strong in one, but not the other, and definitely not the third.

While no new generational microarchitecture announcements were made at last week’s event, AMD expanded their existing 4th-gen EPYC server processor and Instinct AI accelerator lineups with new dedicated products for technical computing, cloud native computing, and generative AI inference acceleration to basically match, and beat every other competitor processor out there.


Let’s start with Genoa, AMD’s 4th-gen EPYC 9004-series CPUs. Built on the Zen 4 microarchitecture, the EPYC 9004-series is therefore the server counterpart to AMD’s consumer-based Ryzen 7000-series. Now, to be fair, Genoawas already launched back in November 2022, though it didn’t stop AMD from dishing on Intel with some benchmarks for performance and energy efficiency.

AMD's Dr. Lisa Su presenting 4th-gen AMD EPYC "Genoa" dominance over Intel Xeon Platinum.

One of AMD’s major EPYC customers, Amazon Web Services (AWS), already has their next-generation Elastic Compute (EC2) M7a instances available for preview. Based on Genoa, EC2 M7a instances is said to offer 50% better performance and 50% more memory bandwidth (thanks to DDR5) compared with last-generation M6a instances. It would also open up support for new processor capabilities like AVX3-512, VNNI, and BFloat16.

According to AWS EC2 VP, Dave Brown, “When we combine the performance of 4th-gen AMD EPYC processors with the AWS Nitro System, we’re advancing cloud technology for our customers by allowing them to do more with better performance on even more Amazon EC2 instances.”

AMD CEO Dr. Lisa Su and AWS EC2 VP, Dave Brown announcing the preview availability of Amazon EC2 M7a instances based on AMD 4th-gen EPYC "Genoa" CPUs.


Beginning with 3rd-gen EPYC processors, AMD started introducing a 3D V-Cache variant of the general purpose EPYC CPU specifically designed for technical workloads that required more L3 cache (such as computational fluid dynamics, electronic design automation, and structural analysis). It was called Milan-X. And now, the 4th-gen EPYC (Genoa) will also receive a 3D V-Cache version, unsurprisingly called Genoa-X. These processors will have the EPYC 9004X-series designation.

So, what’s new? We know that Genoa’s Zen 4 CCD features 8-cores per core complex and 32MB of L3 cache. With Genoa-X, AMD adds an additional 64MB of L3 using a 7nm X3D memory chiplet that’s Hybrid Bonded to the CCD, bringing a total of 96MB L3 per CCD. With a maximum of 12 CCDs per CPU, Genoa-X can support a massive 1,152MB of L3 cache (up from 768MB of Milan-X).

While Genoa-X CPUs can go up to 96-cores, AMD provided a direct comparison between a 32-core 9384X vs a 32-core Intel Xeon Platinum 8462Y+

Genoa-X is also already being implemented by AMD’s partners. For one, Microsoft has announced general availability of Azure HBv4 and HX instances which are powered by EPYC 9004X-series processors with 3D V-Cache.


AMD Bergamo chip close up.

And here's a side by side view of Genoa and Bergamo.

Another piece of the 4th-gen EPYC CPU family is codenamed Bergamo, and this one is designed for Cloud Native computing workloads, digital services and DevOps applications where computing density and efficiency are more important.

Bergamo is also the one CPU to actually have a minor architecture update, and the reason for this is so that AMD can cram more cores into each SoC. Genoa’s Zen 4 architecture allows for up to 12 CCDs and 96-cores (with an 8-core CCX) per socket.

Bergamo will employ a modified Zen 4c architecture that features a 35% smaller core than Zen 4. This allows AMD to double the cores per CCD (using a 2x 8-core CCX structure). You will notice that all functionality remains identical in the chart below between Zen 4 and Zen 4c, but L3 cache has been halved for each core. Because of this, each CCX only has 16MB of L3 cache. However, since one CCD now comprises two CCX, you technically get the same 32MB of L3 per CCD. However, total CCDs have also been reduced per socket from 12 (Genoa) to 8 (Bergamo). This means that each Bergamo CPU can have up to 128-cores and 256MB L3 cache. Bergamo CPUs will have the 4th-gen EPYC 97X4-series designation.

There's basically no read difference between Zen 4 and Zen 4c architecture besides the reduction in L3 per core, thus reducing size.

For the geeks, a technical look at the "Bergamo" configuration.

AMD isn’t just claiming higher performance with Bergamo, but massive space and power savings due to its core density, minimising Opex and TCO for its clients. A single rack of 15 EPYC 9754 (Bergamo) servers provides the same performance as 43 Ampere Altara Max or 38 Intel Xeon Platinum servers.

Instinct MI300X

Last but not least, we arrive at the AI battle. Back in January during CES 2023, AMD announced the world’s first data centre APU, now officially known as the Instinct MI300A. Technically, this would also be counted as part of the 4th-gen EPYC lineup since it is a chip that features three Zen 4 CPU parts along with CDNA 3 GPU parts packaged together with AMD’s 3D chiplet packaging process. MI300A would feature 128GB of shared HBM3 memory across both CPU and GPU.

NVIDIA’s answer to the MI300A, announced during Computex 2023, is the GH200 Grace-Hopper “Superchip”, which is a massive board that fuses NVIDIA’s Grace CPU and Hopper GPU together. The difference is that these are two separate components, requiring separate memory pools (DDR5X for the CPU and HBM3 for the GPU), and interconnected with NVLink. AMD’s 3D chiplet design to fuse CPU and GPU into a single chip sharing ultra-fast HBM3 memory, is a more elegant design.

But in a reverse role, AMD has now gone and retrofitted the MI300A to become a full-GPU AI inference accelerator instead of a multi-purpose APU. AMD swapped out the three Zen 4 chiplets with two CDNA 3 chiplets. The result is the new(er)MI300X, and this single chip will now feature a whopping 153 billion transistors (up from the already impressive 146 billion in the MI300A). Total HBM3 memory has also been increased to 192GB, with 5.2TB/s of memory bandwidth.By comparison, NVIDIA’s standalone H100 (Hopper) GPU features 80 billion transistors, and 80GB of HBM3 memory with 3.35TB/s of memory bandwidth.

.embed-container { position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; max-width: 100%; } .embed-container iframe, .embed-container object, .embed-container embed { position: absolute; top: 0; left: 0; width: 100%; height: 100%; }

AMD’s Dr. Lisa Su showed a demo of a generative AI poem about San Francisco, and claimed that the MI300X is the first chip capable of running a large language AI model such as Falcon-40B with 40 billion parameters entirely within its memory. This would reduce the amount of GPUs required for the same tasks, and as a continuing theme across everythingAMD showed at the event, would reduce space, energy consumption, and TCO for its customers.

In addition to a single MI300X GPU, AMD also showcasedthe Instinct Platform,which is designed forOpen Compute Project (OCP) industry compatibility. The Instinct Platform is a GPU cluster that boasts of 8x MI300X GPUs and 1.5TB of HBM3 memory.

The AMD Instinct Platform, and next to it on the left, a single Instinct MI300X GPU.

< PrevPage 1 of 1 – AMD HPC and AI updatesPage 1 of 1 – AMD HPC and AI updatesPage 1 of 1 Page 1 of 1 – AMD HPC and AI updatesNext >

- Advertisment -

Most Popular