Read OSS

Channels, Network Poller, and the Runtime's Cross-Cutting Concerns

Advanced

Prerequisites

  • Articles 1-5: Full series through Memory and GC
  • Concurrent programming (atomic operations, lock-free structures)
  • I/O multiplexing concepts (epoll, kqueue)

Channels, Network Poller, and the Runtime's Cross-Cutting Concerns

We've traced Go from repository structure through the compiler pipeline, runtime bootstrap, scheduler, and memory management. This final article examines the high-level concurrency primitives and I/O infrastructure built atop those foundations: channels, the network poller, synchronization primitives, and the compiler directives that tie the runtime together. These are the systems that make Go's "don't communicate by sharing memory; share memory by communicating" philosophy possible at the implementation level.

Channel Implementation: hchan and sudog

Channels are the signature concurrency primitive of Go. Under the hood, they're implemented as a mutex-protected circular buffer with wait queues:

src/runtime/chan.go#L34-L55

type hchan struct {
    qcount   uint           // total data in the queue
    dataqsiz uint           // size of the circular queue
    buf      unsafe.Pointer // points to an array of dataqsiz elements
    elemsize uint16
    closed   uint32
    timer    *timer          // timer feeding this chan
    elemtype *_type          // element type
    sendx    uint            // send index
    recvx    uint            // receive index
    recvq    waitq           // list of recv waiters
    sendq    waitq           // list of send waiters
    bubble   *synctestBubble
    lock     mutex
}

The waitq type is a linked list of sudog structures — the runtime's representation of a goroutine waiting on a synchronization operation:

src/runtime/runtime2.go#L404-L446

sequenceDiagram
    participant G1 as Goroutine 1
    participant CH as hchan (buffered, cap=2)
    participant G2 as Goroutine 2

    Note over CH: buf: [_, _], sendx=0, recvx=0

    G1->>CH: ch <- "a" (chansend)
    Note over CH: buf: ["a", _], sendx=1

    G1->>CH: ch <- "b" (chansend)
    Note over CH: buf: ["a", "b"], sendx=0

    G1->>CH: ch <- "c" (buffer full!)
    Note over CH: G1 parked in sendq as sudog

    G2->>CH: <-ch (chanrecv)
    Note over CH: Returns "a", copies "c" to buf
    CH->>G1: goready(G1) — wake from sendq
    Note over CH: buf: ["c", "b"], recvx=1

The channel operations have three interesting fast paths:

  1. Direct send: When a receiver is already waiting in recvq, the sender copies the value directly to the receiver's stack (bypassing the buffer entirely) and wakes the receiver with goready. This avoids two copies through the buffer.

  2. Buffered send/receive: When the buffer has space (or data), the operation completes without blocking — just a copy to/from the circular buffer under the lock.

  3. Blocking: When neither fast path applies, the goroutine creates a sudog, enqueues itself in the appropriate wait queue, and calls gopark to deschedule. As we saw in Article 4, gopark integrates with the scheduler to make the goroutine waiting without consuming an OS thread.

The invariants documented at the top of chan.go are worth studying:

src/runtime/chan.go#L9-L18

For buffered channels: if there are items in the buffer (qcount > 0), the receive queue must be empty. And if there's buffer space (qcount < dataqsiz), the send queue must be empty. These invariants simplify the implementation because you never have waiters and buffer space simultaneously.

Tip: The debugChan constant at line 31 can be set to true during development to enable verbose channel operation logging. The runtime has similar debug constants for most subsystems.

Select Statement Implementation

The select statement is compiled into calls to runtime.selectgo:

src/runtime/select.go#L17-L43

Each case is represented by an scase struct containing the channel and a pointer to the data element. The implementation is careful about lock ordering — when a select involves multiple channels, all channels must be locked simultaneously to prevent deadlock:

func sellock(scases []scase, lockorder []uint16) {
    var c *hchan
    for _, o := range lockorder {
        c0 := scases[o].c
        if c0 != c {
            c = c0
            lock(&c.lock)
        }
    }
}

The lockorder slice sorts channels by address, ensuring a consistent global lock order. Cases are evaluated in a randomized order (using a pollorder slice) to prevent starvation — without randomization, the first case would be systematically favored.

flowchart TD
    A["select statement<br/>(N cases)"] --> B["Shuffle pollorder<br/>(random evaluation)"]
    B --> C["Sort lockorder<br/>(by channel address)"]
    C --> D["Lock all channels"]
    D --> E{"Any case ready?"}
    E -->|Yes| F["Execute that case,<br/>unlock all"]
    E -->|No| G["Create sudog for each case"]
    G --> H["Enqueue in all channel wait queues"]
    H --> I["gopark (sleep)"]
    I --> J["Woken by some channel"]
    J --> K["Dequeue from all other channels"]
    K --> F

The Network Poller

Go's network I/O appears blocking to the goroutine but is actually multiplexed onto non-blocking I/O under the hood. The network poller is the bridge between the two worlds.

The platform-independent interface is defined in netpoll.go:

src/runtime/netpoll.go#L15-L41

Each platform must implement: netpollinit(), netpollopen(fd, pd), netpollclose(fd), netpoll(delta), and netpollBreak(). The pollDesc structure tracks the state of each file descriptor:

src/runtime/netpoll.go#L51-L80

Each pollDesc contains two semaphores (rg and wg) for read and write operations. These semaphores use goroutine pointers as state: pdNil (idle), pdWait (preparing to park), pdReady (I/O ready), or a *g pointer (goroutine parked and waiting).

On Linux, the implementation uses epoll:

src/runtime/netpoll_epoll.go#L21-L40

graph TD
    subgraph "User Code"
        A["conn.Read()"]
    end
    subgraph "net package"
        B["pollDesc.waitRead()"]
    end
    subgraph "Runtime"
        C["runtime_pollWait"]
        D["gopark on pollDesc.rg"]
    end
    subgraph "Scheduler"
        E["findRunnable calls netpoll"]
        F["epoll_wait returns ready fds"]
        G["goready parked goroutines"]
    end

    A --> B --> C --> D
    E --> F --> G
    G -.->|"wake"| D

The integration with the scheduler (from Article 4) is elegant: findRunnable calls netpoll(0) (non-blocking) when looking for work. If a thread is about to park with no work, it calls netpoll(delta) with a timeout to wait for I/O. The sysmon thread also periodically polls to ensure no I/O events are missed.

Runtime Synchronization Primitives

The runtime builds its own synchronization hierarchy, documented in HACKING.md:

src/runtime/HACKING.md#L139-L179

Primitive Blocks G Blocks M Blocks P Use Case
mutex Yes Yes Yes Protecting shared runtime state
note Yes Yes Yes/No One-shot notifications
gopark/goready Yes No No Channel ops, netpoll, timers

The runtime mutex is the lowest-level lock. On Linux, it's implemented using futex:

src/runtime/lock_futex.go#L1-L53

This is not sync.Mutex — it's a runtime-internal lock that blocks the OS thread. Using it blocks both the goroutine and the thread, which is why it's reserved for short critical sections in the runtime's lowest levels.

The note primitive provides one-shot notification with futex:

func notewakeup(n *note) {
    old := atomic.Xchg(key32(&n.key), 1)
    if old != 0 {
        throw("notewakeup - double wakeup")
    }
    futexwakeup(key32(&n.key), 1)
}

The semaphore implementation in sema.go is what sync.Mutex actually uses:

src/runtime/sema.go#L1-L49

It uses a balanced tree of sudogs (the same structure used by channels) hashed into a fixed table of 251 entries. This design avoids allocating per-mutex kernel resources while providing O(log n) lookup for waiters on distinct addresses.

Linkname and Compiler Directives

The runtime lives in a privileged position — it needs to expose functions to other packages without making them part of the public API. The //go:linkname directive enables this:

src/runtime/HACKING.md#L277-L356

Three forms exist:

  • Push linkname: Give a local definition a symbol name in another package
  • Pull linkname: Reference a symbol defined in another package
  • Export linkname: Mark a symbol as available for linkname by other packages

For example, runtime.main accesses the user's main.main via:

//go:linkname main_main main.main
func main_main()

The runtime also uses compiler directives that are unavailable to normal Go code:

src/runtime/HACKING.md#L424-L488

  • //go:systemstack — Function must run on the system stack (g0)
  • //go:nowritebarrier — Assert no write barriers in this function
  • //go:nowritebarrierrec — Assert no write barriers in this function or any function it calls (recursively)
  • //go:nosplit — Don't insert stack growth check (function must fit in current stack)

These directives are essential for the runtime's correctness. For example, code that runs without a P (during scheduler transitions) must not trigger write barriers, because write barriers require a P. The nowritebarrierrec directive enforces this at compile time across the entire call graph.

Tip: When reading runtime code, pay attention to //go:nosplit annotations. They indicate functions that cannot grow the stack and therefore have strict size constraints. If you see //go:systemstack combined with //go:nosplit, the function runs on the fixed-size system stack and must be very careful about stack usage.

The Complete Picture

Over these six articles, we've traced Go from its repository structure and bootstrap process, through the go command's build orchestration, the compiler's SSA pipeline, the runtime's assembly bootstrap and G-M-P scheduler, memory allocation and garbage collection, and finally the channel, networking, and synchronization infrastructure.

The recurring design themes are worth calling out:

  • Layered dispatch: Thin entry points delegate to architecture-specific implementations (compiler, linker, runtime entry, netpoll)
  • Lock-free fast paths: Per-P mcaches for allocation, per-P run queues for scheduling, direct sends for channels
  • Declarative constraints: SSA pass ordering, API compatibility files, lock rankings
  • Cooperative integration: The scheduler, GC, netpoll, and channel operations all coordinate through gopark/goready rather than separate blocking mechanisms

The Go runtime is a cohesive system where every piece — from the first assembly instruction to the garbage collector's write barrier — is designed to work together. Understanding these internals doesn't just satisfy curiosity; it makes you a better Go programmer, giving you the mental model to reason about performance, debug mysterious behavior, and write code that works with the runtime rather than against it.