PyTorch on AMD GPUs with ROCm
The most surprising thing about running PyTorch on AMD GPUs is that it’s not just about installing a different version of PyTorch; it’s a whole ecosystem shift, often requiring specific Linux distributions and kernel versions to even get started.
Let’s see it in action. Imagine you’ve got a new AMD Instinct MI250X. You’ve installed Ubuntu 22.04, and you’re ready to go. First, you need the ROCm drivers and libraries. You’d typically install these via apt:
sudo apt update
sudo apt install rocm-dev rocm-smi
rocm-smi is your dashboard for checking GPU status, memory usage, and temperature. It’s the nvidia-smi equivalent for AMD.
Now, for PyTorch itself. You don’t just pip install torch. You need a ROCm-enabled build. The official PyTorch website provides installation commands. For ROCm 5.6, it might look like this:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
Notice the --index-url. This points to a specific repository hosting the ROCm-compatible wheels, not the standard CUDA ones.
Let’s verify it’s working. In a Python interpreter:
import torch
print(torch.cuda.is_available()) # This will print True if ROCm is detected and working
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
If you see True, 1 (or more, depending on your hardware), and your GPU name (e.g., AMD Instinct MI250X), you’re golden.
What problem does this solve? It allows deep learning practitioners to leverage the growing power of AMD’s discrete GPUs for training and inference, offering an alternative to NVIDIA’s CUDA ecosystem, which has historically dominated the AI hardware space.
Internally, ROCm (Radeon Open Compute platform) is AMD’s parallel computing platform and programming model. It provides a software stack that includes drivers, compilers (HIPCC, which compiles HIP code to run on AMD or NVIDIA GPUs), and libraries (like rocBLAS for linear algebra, rocFFT for Fast Fourier Transforms, and MIOpen for deep neural network primitives). PyTorch, when built with ROCm support, uses these underlying ROCm libraries to perform tensor operations on the GPU. The HIP (Heterogeneous-Compute Interface for Portability) layer is key here; it allows developers to write code that can be compiled for both AMD (via the HIP runtime) and NVIDIA (via CUDA) GPUs, though the primary focus for PyTorch is the ROCm path on AMD hardware.
The exact levers you control are primarily around the ROCm version and the corresponding PyTorch build. Compatibility is paramount. ROCm 5.x is not compatible with PyTorch built for ROCm 4.x. You need to match them precisely. Furthermore, the ROCm installation itself is sensitive to the Linux distribution and kernel version. Ubuntu 20.04 with kernel 5.15 might work, but Ubuntu 20.04 with kernel 5.4 might not. The ROCm documentation provides a compatibility matrix that is essential reading.
When you compile PyTorch with ROCm, the torch.cuda namespace is repurposed. Instead of CUDA calls, it translates to HIP API calls, which then interface with the ROCm driver to execute computations on the AMD GPU. This abstraction allows PyTorch to maintain a mostly unified API for GPU acceleration, regardless of the underlying hardware vendor, provided the correct backend is installed and configured. The performance characteristics can differ significantly from CUDA, often requiring tuning of ROCm-specific parameters or different network architectures to achieve optimal speed.
A common pitfall is assuming that any ROCm installation will work. The ROCm installation process itself can be complex, involving ensuring the correct kernel modules are loaded and that the user has the necessary permissions. Running rocminfo provides a low-level report of detected hardware and ROCm software components, which is invaluable for debugging. If torch.cuda.is_available() returns False, the first place to check is rocminfo and rocm-smi to ensure the ROCm stack itself recognizes the hardware and is running. Many issues stem from incomplete or misconfigured ROCm installations, rather than PyTorch itself.
The next hurdle you’ll likely encounter is optimizing performance, as PyTorch’s default kernels might not be as finely tuned for AMD hardware as they are for NVIDIA, requiring a deeper dive into ROCm profiling tools and potentially custom kernel development.