NVIDIA MIG allows you to slice up a single A100 GPU into up to seven smaller, fully isolated GPU instances, each with its own dedicated compute, memory, and cache.
Imagine you have a beast of an A100 GPU, but your workloads are small – maybe a few concurrent inference requests or a small training job. You’re wasting resources. MIG is like a chef’s knife for your GPU, letting you carve it into perfectly sized portions for each task. This isn’t just software abstraction; MIG hardware partitions the GPU’s silicon.
Here’s how it looks in practice. Let’s say you have an A100 80GB. You can create various combinations of GPU instances (GIs). For example, you could have:
- Seven GIs of type
1g.5gb(1 compute instance, 5GB memory) - Three GIs of type
3g.20gb(3 compute instances, 20GB memory) - One GI of type
7g.40gb(7 compute instances, 40GB memory)
The key is that these GIs are physically separated. Each GI gets its own slice of the SMs (Streaming Multiprocessors), L2 cache, and memory controllers. This means no noisy neighbor problem – one GI’s intense computation won’t hog resources or impact the performance of another.
To get started, you need the NVIDIA MIG Manager tool, typically available as part of the NVIDIA driver package or the CUDA Toolkit.
First, you need to enable MIG mode on the GPU. This is a system-level change and requires a reboot.
sudo nvidia-smi -mig 1
After the reboot, the GPU will be in MIG mode. You won’t see it as a single 00000000:01:00.0 device anymore. Instead, you’ll see a new set of devices, each representing a potential MIG instance.
Next, you need to create the actual GPU instances. You specify the GPU device ID and the desired instance configuration. For example, to create a 1g.5gb instance on the first GPU (assuming it’s GPU-0000 in MIG mode):
sudo migctl --gpuid 0 create instance --type 1g.5gb
You can list available instance types and the current configuration with:
sudo migctl --gpuid 0 list instance
sudo migctl --gpuid 0 list config
Once an instance is created, it will appear as a new PCI device. You can then assign applications to these specific MIG devices. For example, if your MIG instance is assigned PCI ID 00000000:01:00.1, you’d use CUDA_VISIBLE_DEVICES=1 (if that’s the device index) or specify it in your container runtime.
The migctl tool is your primary interface for managing MIG. It allows you to destroy instances, list devices, and inspect the MIG configuration.
sudo migctl --gpuid 0 destroy instance --instanceid <instance_id>
The instance_id is what you get back when you create an instance.
The most surprising thing about MIG is how it leverages the GPU’s physical architecture to achieve true isolation. It’s not just about software limits; the SMs, L2 cache slices, and memory controllers are physically partitioned. This means that performance isolation is guaranteed at the hardware level, preventing one GI from starving another of critical resources like compute units or cache bandwidth, which is a common problem with traditional GPU sharing methods.
Deleting a MIG instance returns the GPU resources to their unpartitioned state, and the GPU needs to be rebooted into MIG mode again if you want to reconfigure it.
The next hurdle you’ll face is managing these smaller GPU instances efficiently across a cluster, which often involves container orchestration platforms like Kubernetes.