The Go runtime doesn’t actually run your goroutines directly; it multiplexes them onto a smaller, fixed number of OS threads.
Let’s see this in action. Imagine a simple Go program:
package main
import (
"fmt"
"runtime"
"sync"
"time"
)
func worker(id int) {
fmt.Printf("Worker %d starting\n", id)
time.Sleep(time.Second)
fmt.Printf("Worker %d done\n", id)
}
func main() {
runtime.GOMAXPROCS(2) // Limit to 2 OS threads
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
worker(id)
}(i)
}
wg.Wait()
fmt.Println("All workers finished")
}
If you run this, you’ll notice that even though we’ve launched 5 goroutines, the runtime.GOMAXPROCS(2) call tells the Go runtime to use at most 2 operating system threads. You can observe this behavior by looking at your system’s process monitor; you’ll see only a couple of threads associated with the Go program. The Go scheduler is responsible for taking your 5 goroutines and mapping them onto those 2 OS threads. When a goroutine performs a blocking operation (like time.Sleep or I/O), the scheduler can quickly unmap that goroutine from its OS thread and map another ready goroutine onto it, keeping the thread busy.
The core of the Go scheduler revolves around three key data structures:
- M (Machine): Represents an OS thread. The Go runtime creates and manages these.
- P (Processor): Represents a logical processor, which is essentially a Go runtime scheduler context. A P is needed to execute a goroutine. The number of Ps is typically limited by
GOMAXPROCS. A P has a local queue of goroutines ready to run. - G (Goroutine): Represents your Go routine. Each goroutine has a state (running, runnable, waiting, etc.) and a pointer to its execution context.
When a goroutine is ready to run, it’s placed on a local run queue associated with a P. A P then picks up a goroutine from its queue and starts executing it on an M. If the goroutine performs a blocking operation, it’s unmapped from the M, and the M can then pick up another goroutine from a P’s queue (either its own or one stolen from another P). If a P runs out of goroutines, it can try to steal work from another P’s queue to ensure its M stays busy.
The scheduler also handles system calls. When a goroutine makes a system call that would block the OS thread (e.g., reading from a file), the Go runtime might detach the M from the P. This allows the P to be associated with a new M, keeping the parallelism high. Once the system call returns, the original M can be reattached to the P, or the goroutine can be placed back on a run queue. This dynamic management of Ms and Ps is crucial for achieving high concurrency without requiring one OS thread per goroutine.
A surprising aspect of the scheduler is how it handles network I/O. Unlike traditional threading models where network I/O often blocks the entire thread, Go’s network poller (using mechanisms like epoll on Linux or kqueue on macOS/BSD) allows a goroutine performing network I/O to be unblocked without the underlying OS thread being blocked. When the network event is ready, the poller signals the Go runtime, and the goroutine is moved from a "waiting" state back to a runnable state on a P’s run queue, all without the OS thread having to wait idly.
The number of goroutines you can run is limited more by memory and the complexity of your program than by the scheduler itself, as the scheduler efficiently maps them onto a small number of threads. The GOMAXPROCS variable is your primary lever for controlling the degree of CPU parallelism.
The next thing you’ll likely encounter is how channels and the select statement interact with this scheduler.