Read OSS

I/O in Node.js: Streams, Handles, and the Event Loop

Advanced

Prerequisites

  • Article 1: architecture-overview
  • Article 3: cpp-object-model-and-bindings (BaseObject/Wrap hierarchy)
  • Understanding of libuv event loop model (handles vs requests, uv_run phases)
  • Familiarity with Node.js streams API from a user perspective

I/O in Node.js: Streams, Handles, and the Event Loop

Node.js's raison d'être is non-blocking I/O. Every network connection, file operation, timer, and child process ultimately flows through the same machinery: libuv handles and requests on the C++ side, streams and event emitters on the JavaScript side, connected by the Wrap hierarchy we explored in Article 3. This article shows how these pieces fit together in practice — from a TCP connection's lifecycle to the ingenious timer system to the microtask queue that powers process.nextTick().

libuv Integration: Handles vs Requests

As we established in Article 3, libuv has two fundamental abstractions that Node.js wraps:

Handles (uv_handle_t) are long-lived objects that can generate multiple events over their lifetime. TCP servers, TCP connections, timers, file system watchers, and signal handlers are all handles. They keep the event loop alive when referenced.

Requests (uv_req_t) are one-shot operations. A file read, a DNS lookup, a connect attempt — each creates a request, dispatches it, and receives a single callback when complete.

graph TD
    subgraph "Handles — Long-lived"
        TCP["uv_tcp_t<br/>TCP socket"]
        TIMER["uv_timer_t<br/>Timer"]
        PIPE["uv_pipe_t<br/>Unix pipe / Windows named pipe"]
        FSE["uv_fs_event_t<br/>File system watcher"]
        SIGNAL["uv_signal_t<br/>Signal handler"]
        UDP["uv_udp_t<br/>UDP socket"]
    end
    
    subgraph "Requests — One-shot"
        FSREQ["uv_fs_t<br/>File system operation"]
        CONN["uv_connect_t<br/>Connection attempt"]
        WRITE["uv_write_t<br/>Stream write"]
        DNS["uv_getaddrinfo_t<br/>DNS lookup"]
        WORK["uv_work_t<br/>Thread pool work"]
    end
    
    subgraph "Event Loop"
        LOOP["uv_run()<br/>Process events"]
    end
    
    TCP --> LOOP
    TIMER --> LOOP
    FSREQ --> LOOP
    CONN --> LOOP

The event loop in SpinEventLoopInternal() calls uv_run(UV_RUN_DEFAULT), which processes all pending I/O events. The UV_RUN_DEFAULT mode blocks until there are events to process or no more handles/requests are active. Between uv_run() iterations, platform->DrainTasks(isolate) processes V8 background tasks like optimized code compilation and garbage collection finalization.

The Wrap Hierarchy in Practice: TCP Connection Lifecycle

Let's trace a real TCP connection to see how the C++ wrap hierarchy (from Article 3) operates. When you call net.createServer() and a client connects:

sequenceDiagram
    participant NET as lib/net.js
    participant TW as TCPWrap (C++)
    participant CW as ConnectionWrap
    participant LSW as LibuvStreamWrap
    participant HW as HandleWrap
    participant UV as libuv

    Note over NET: server.listen(port)
    NET->>TW: new TCP(TCPConstants.SERVER)
    TW->>HW: HandleWrap(env, object, &handle_)
    HW->>UV: uv_tcp_init(loop, &handle_)
    NET->>TW: bind(address, port)
    NET->>TW: listen(backlog)
    TW->>UV: uv_listen(&handle_, backlog, OnConnection)
    
    Note over UV: Client connects
    UV->>CW: OnConnection(handle, status)
    CW->>TW: TCPWrap::Instantiate(env, parent, SOCKET)
    CW->>UV: uv_accept(server_handle, &client_handle)
    CW->>NET: MakeCallback(onconnection, client_wrap)
    
    Note over NET: Data flows
    NET->>LSW: ReadStart()
    LSW->>UV: uv_read_start(handle, OnAlloc, OnRead)
    UV->>LSW: OnRead(handle, nread, buf)
    LSW->>NET: MakeCallback(onread, buffer)

TCPWrap inherits from ConnectionWrap<TCPWrap, uv_tcp_t>, which inherits from LibuvStreamWrap, then HandleWrap, then AsyncWrap, then BaseObject. Each layer adds functionality:

  • BaseObject: Links the C++ object to the JavaScript socket object
  • AsyncWrap: Provides async_id for async_hooks tracking
  • HandleWrap: Manages the libuv handle lifecycle (ref/unref/close)
  • LibuvStreamWrap: Implements ReadStart()/ReadStop() and write operations
  • ConnectionWrap: Handles OnConnection() and AfterConnect() callbacks
  • TCPWrap: TCP-specific methods like bind(), listen(), connect()

StreamBase is worth noting separately — it's an abstract interface that LibuvStreamWrap implements, providing a unified stream API that JavaScript can call. Both libuv streams and TLS streams implement StreamBase, which is why tls.TLSSocket can transparently replace a plain net.Socket.

JavaScript Stream Architecture

On the JavaScript side, Node.js streams are state machines built on EventEmitter. The four stream types — Readable, Writable, Duplex, and Transform — live in lib/internal/streams/.

stateDiagram-v2
    [*] --> Flowing: pipe() or resume()
    [*] --> Paused: Initial state
    Paused --> Flowing: resume() / pipe() / 'data' listener
    Flowing --> Paused: pause()
    Flowing --> Ended: push(null)
    Paused --> Ended: push(null) + drain
    Ended --> [*]
    
    state Flowing {
        [*] --> Reading
        Reading --> Buffering: _read() returns data
        Buffering --> Reading: Data consumed below hwm
        Buffering --> Backpressure: Buffer > highWaterMark
        Backpressure --> Reading: Data consumed
    }

Readable streams operate in two modes: flowing (data is pushed to consumers automatically) and paused (data must be pulled with read()). The highWaterMark controls buffering: when the internal buffer exceeds this threshold, the stream signals backpressure by returning false from push().

Writable streams have a complementary state machine. The critical method is write(), which returns false when the internal buffer is full — the caller should wait for the 'drain' event before writing more data.

The pipeline() utility in lib/internal/streams/pipeline.js handles the complex error propagation and cleanup when chaining streams, making it the recommended way to connect streams rather than .pipe().

Tip: Always use pipeline() instead of .pipe() for production code. pipeline() properly handles errors and cleanup across the entire chain, while .pipe() famously doesn't clean up on errors, leading to resource leaks.

The Timer System: Linked Lists and a Single libuv Timer

The timer implementation in lib/internal/timers.js has one of the best ASCII art comments in the codebase. The design is ingenious: rather than creating a libuv timer for each setTimeout() call (which would be expensive with thousands of active timers), Node.js groups timers by duration.

graph TD
    subgraph "Timer Architecture"
        MAP["PriorityQueue + Object Map<br/>Keys: durations in ms"]
        MAP --> L40["TimersList {duration: 40ms}"]
        MAP --> L320["TimersList {duration: 320ms}"]
        MAP --> L1000["TimersList {duration: 1000ms}"]
        
        L40 --> T1["Timer A<br/>_onTimeout: cb1"]
        T1 --> T2["Timer B<br/>_onTimeout: cb2"]
        T2 --> T3["Timer C<br/>_onTimeout: cb3"]
        
        L1000 --> T4["Timer D<br/>_onTimeout: cb4"]
        T4 --> T5["Timer E<br/>_onTimeout: cb5"]
    end
    
    UV_TIMER["Single libuv timer<br/>Set to earliest expiry"] --> MAP

Each duration bucket is a doubly-linked list (TimersList). Adding a timer is O(1) — just append to the list for that duration. Removing is O(1) — unlink from the doubly-linked list. When a timer fires, only the head of the relevant list needs checking, because all timers in the list share the same duration and were inserted in chronological order.

A PriorityQueue (binary heap) tracks which duration bucket expires next. A single libuv uv_timer_t is set to the earliest expiry time. When it fires, Node.js processes all expired timers across all duration buckets, then resets the libuv timer to the next expiry.

This design means Node.js can efficiently manage hundreds of thousands of active timers — a common scenario in HTTP servers where every connection has an idle timeout.

Microtasks, nextTick, and setImmediate

The relationship between process.nextTick(), V8 microtasks (Promises), and setImmediate() is one of the most asked-about aspects of Node.js. They execute at different points in the event loop:

flowchart TD
    UV["uv_run() iteration"] --> TIMERS["1. Timers phase<br/>setTimeout / setInterval"]
    TIMERS --> NT1["⚡ nextTick queue + microtasks"]
    NT1 --> PENDING["2. Pending I/O callbacks"]
    PENDING --> NT2["⚡ nextTick queue + microtasks"]
    NT2 --> POLL["3. Poll phase<br/>I/O events"]
    POLL --> NT3["⚡ nextTick queue + microtasks"]
    NT3 --> CHECK["4. Check phase<br/>setImmediate callbacks"]
    CHECK --> NT4["⚡ nextTick queue + microtasks"]
    NT4 --> CLOSE["5. Close callbacks"]
    CLOSE --> NT5["⚡ nextTick queue + microtasks"]
    NT5 --> UV

process.nextTick() uses a FixedQueue and the TickInfo shared state (an AliasedFloat64Array in env.h that's accessible from both C++ and JavaScript without crossing the boundary). The kHasTickScheduled flag tells the C++ layer that the nextTick queue needs draining.

The critical insight is that nextTick and microtasks run between every phase of the event loop, not just once per iteration. This means process.nextTick() callbacks execute before any I/O — a sharp tool that can starve I/O if used carelessly.

setImmediate() runs in the "check" phase, which is after the "poll" phase. This means setImmediate() callbacks execute after I/O events have been processed, making it the right choice for deferring work without starving I/O.

async_hooks and AsyncWrap Tracking

Every async operation in Node.js flows through AsyncWrap (from Article 3), which enables the async_hooks API. The tracking works through four lifecycle events:

sequenceDiagram
    participant AH as async_hooks
    participant AW as AsyncWrap (C++)
    participant UV as libuv
    
    Note over AW: Creating a new TCP connection
    AW->>AH: init(asyncId, type, triggerAsyncId, resource)
    Note over AH: Track: "TCPWrap #7 triggered by #3"
    
    Note over UV: Connection callback fires
    AW->>AH: before(asyncId)
    Note over AH: Set execution context to #7
    AW->>AW: MakeCallback(onconnection)
    AW->>AH: after(asyncId)
    Note over AH: Restore previous context
    
    Note over AW: Socket closed
    AW->>AH: destroy(asyncId)
    Note over AH: Cleanup tracking for #7

The provider types are defined in src/async_wrap.h using a macro that lists every async resource type: TCPWRAP, FSREQCALLBACK, GETADDRINFOREQWRAP, HTTP2SESSION, and dozens more. Each type gets a unique enum value that async_hooks consumers can use to filter events.

The executionAsyncId() and triggerAsyncId() functions expose the async context chain, enabling tools like AsyncLocalStorage to propagate request-scoped data through async boundaries without explicit parameter passing. This is built on the same AsyncWrap infrastructure — AsyncLocalStorage stores data keyed by async_id and propagates it through the init hook's triggerAsyncId chain.

Tip: async_hooks has measurable overhead. In production, prefer AsyncLocalStorage (which optimizes the common case) over raw async_hooks. If you need diagnostic hooks, consider the diagnostics_channel API instead, which has lower overhead for pub/sub style instrumentation.

What's Next

We've now covered the complete I/O path: from libuv events through C++ wraps to JavaScript streams and back. In the final article of this series, we'll explore Node.js's cross-cutting concerns — the permission model, the error system, Web Platform API integration, V8 snapshots, single executable applications, and the built-in test runner.