  • Ampere GPU
  • 10,752 NVIDIA® CUDA® Cores
  • 336 NVIDIA® Tensor Cores
  • 84 NVIDIA® RT Cores
  • 48GB GDDR6 Memory with ECC
  • Up to 768GB/s Memory Bandwidth
  • Max. Power Consumption: 300W
  • Graphics Bus: PCI-E 4.0 x16
  • Thermal Solution: Passive
  • Support Quadro vDWS
  • NVLink: 2-way low profile (2-slot)

Modern data centers are evolving rapidly. Advanced technologies such as real-time ray tracing, AI, compute, simulation, and VR are common across industries. The need to work remotely has accelerated faster than anyone could have anticipated, with workloads that span the entire enterprise.

NVIDIA® A40 delivers the data center-based solution designers, engineers, artists, and scientists need to meet today’s challenges. Built on the NVIDIA Ampere architecture, the A40 combines the latest generation RT Cores, Tensor Cores, and CUDA® Cores with 48GB of graphics memory for unprecedented graphics, rendering, compute, and AI performance. From powerful virtual workstations accessible from anywhere, to dedicated render nodes, the A40 is built to tackle the most demanding visual computing workloads from the data center.

Performance Features

NVIDIA Ampere Architecture

NVIDIA A40 is the world's most powerful data center GPU for visual computing, offering high performance real-time ray tracing, AI-accelerated compute, and professional graphics rendering. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances ray tracing operations, tensor matrix operations, and concurrent executions of FP32 and INT32 operations.

CUDA Cores

The NVIDIA Ampere architecture’s CUDA cores bring up to 2.5X the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for graphics workflows such as 3D model development and compute for workloads such as desktop simulation for computer-aided engineering (CAE).

2nd Generation RT Cores

Incorporating 2nd generation ray tracing engines, the NVIDIA Ampere GPU architecture provides incredible ray traced rendering performance. A single NVIDIA A40 board can render complex professional models with physically accurate shadows, reflections, and refractions to empower users with instant insight. Working in concert with applications leveraging APIs such as NVIDIA OptiX, Microsoft DXR and Vulkan ray tracing, servers based on NVIDIA A40 will power truly interactive design workflows to provide immediate feedback for unprecedented levels of productivity. NVIDIA A40 is up to 2x faster in ray tracing compared to the previous generation. This technology also speeds up the rendering of ray-traced motion blur by up to 7X for faster results with greater visual accuracy through hardware accelerating Motion BVH (bounding volume hierarchy).

3rd Generation Tensor Cores

Purpose-built for deep learning matrix arithmetic at the heart of neural network training and inferencing functions, the NVIDIA A40 includes enhanced Tensor Cores that accelerate more datatypes (TF32 and BF16) and includes a new Fine-Grained Structured Sparsity feature that delivers up to 2X throughput for tensor matrix operations compared to the previous generation.

Higher Speed GDDR6 Memory

Built with 48GB GDDR6 memory delivering up to 10% greater throughput for ray tracing, rendering, and AI workloads than the previous generation. The NVIDIA A40 provides the industry’s largest graphics memory footprint to address the largest datasets and models in latency-sensitive professional applications.

Error Correcting Code (ECC) on Graphics Memory

Meet strict data integrity requirements for mission critical applications with uncompromised computing accuracy and reliability.

5th Generation NVDEC Enginei

NVDEC is well suited for transcoding and video playback applications for real-time decoding. The following video codecs are supported for hardware-accelerated decoding: MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1. Pairing this technology with Ampere Tensor Cores, the A40 can quickly apply AI and inferencing to real-time video.

7th Generation NVENC Engine

NVENC can take on the most demanding 4K or 8K video encoding tasks to free up the graphics engine and the CPU for other operations. NVENC also enables virtual workstations to stream up to 8K content for high fidelity design and rendering workloads. In addition, the NVIDIA A40 provides better encoding quality than software-based x264 encoders.


Preemption at the instruction-level provides finer grain control over compute and graphics tasks to prevent longer-running applications from either monopolizing system resources or timing out.

Multi-GPU Technology

3rd Generation NVLinkii

Connect two NVIDIA A40 cards with NVLink to double the effective memory footprint and scale application performance by enabling GPU-to-GPU data transfers at rates up to 112.5 GB/s (total bandwidth).

Display Features

DisplayPort 1.4

Supports up to three 5K monitors @ 60Hz, or dual 8K displays @ 60Hz per card. The NVIDIA A40 supports HDR color for 4K @ 60Hz for 10/12b HEVC decode and up to 4K @ 60Hz for 10b HEVC encode. Each DisplayPort connector can drive ultra-high resolutions of 4096x2160 @ 120 Hz with 30-bit color. * A40 is configured for virtualization by default with physical display connectors disabled. The display outputs can be enabled via management software tools.

NVIDIA® Quadro® Mosaic Technology

Transparently scale the desktop and applications across up to 12 displays from 4 GPUs while delivering full performance and image quality.

NVIDIA® Quadro Sync IIiii

Synchronize the display and image output of up to 32 displays[iii ] from 8 GPUs (connected through two Sync II boards) in a single system, reducing the number of machines needed to create an advanced video visualization environment.

Frame Lock Connector Latch

Each frame lock connector is designed with a self-locking retention mechanism to secure its connection with the frame lock cable to provide robust connectivity and maximum productivity.

Software Support

Virtual GPU Software for Virtualization

Support for NVIDIA virtual GPU (vGPU) software enables A40 to be virtualized to accelerate high-end design, AI, and compute workloads. The NVIDIA Quadro Virtual Data Center Workstation (Quadro vDWS) license provides access to the world’s most powerful virtual workstations to enable flexible, work-from-anywhere solutions, while the NVIDIA Virtual Compute Server (vCS) license accelerates virtualized compute workloads such as high performance computing, AI and data science.

Software Optimized for AI

Deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others deliver dramatically faster training times and higher multi-node training performance. GPU accelerated libraries such as cuDNN, cuBLAS, and TensorRT delivers higher performance for both deep learning inference and High-Performance Computing (HPC) applications.

NVIDIA® CUDA® Parallel Computing Platform

Natively execute standard programming languages like C/C++ and Fortran, and APIs such as OpenCL, OpenACC and Direct Compute to accelerates techniques such as ray tracing, video and image processing, and computation fluid dynamics.

Unified Memory

A single, seamless 49-bit virtual address space allows for the transparent migration of data between the full allocation of CPU and GPU memory.


Supports a family of technologies to speed communication between the GPU and devices like NICs or Video I/O boards by reducing CPU overhead and minimizing copies.

GPU Architecture Ampere
CUDA Parallel Processing cores 10,752
NVIDIA Tensor Cores 336
NVIDIA RT Cores 84
Frame Buffer Memory 48 GB GDDR6 with ECC
Memory Interface 384-bit
Memory Bandwidth 768 GB/s
Max Power Consumption 300 W
Graphics Bus PCI Express 4.0 x16
Display Connectors DP 1.4 (3) **
Form Factor 4.4” H x 10.5” L Dual Slot
Product Weight 1.179 kg
Thermal Solution Active
vGPU Software Support NVIDIA® GRID® , NVIDIA Quadro® Virtual Data Center Workstation, NVIDIA Virtual Compute Server*
vGPU Profiles Supported 1 GB, 2 GB, 3 GB, 4 GB, 6 GB, 8 GB, 12 GB, 16 GB, 24 GB, 48 GB
Power Connector 1x 8-pin
Frame lock Compatible (with Quadro Sync II)
NVLink Interconnect 112.5 GB/s (bidirectional)
NVLink 2-way low profile (2-slot)

* A40 does not support MIG.

* Display ports are not active by default for A40. NVIDIA vGPU software is only supported when display ports are not active.

