Read OSS

How Node.js Loads Code: CJS, ESM, and the Module Pipeline

Advanced

Prerequisites

  • Article 1: architecture-overview
  • Article 2: startup-and-bootstrap
  • Article 3: cpp-object-model-and-bindings
  • Understanding of CommonJS require() and ES Module import semantics

How Node.js Loads Code: CJS, ESM, and the Module Pipeline

Node.js has not one but two complete module systems, plus an internal module system used exclusively by the runtime's own code. The CommonJS loader has been shipping since Node.js 0.1. The ES module loader arrived years later with fundamentally different semantics. Making them coexist — and interoperate — is one of the most complex engineering challenges in the codebase.

This article traces how code gets loaded in Node.js: from the defensive primordials pattern that protects internal modules, through the CommonJS and ESM loaders, to the customization hooks that enable TypeScript type-stripping.

Primordials: Defending Against Prototype Pollution

Before any module can load, Node.js needs to protect its internal code from user-land monkey-patching. If someone does Array.prototype.push = () => { throw new Error('gotcha') }, that shouldn't break fs.readFile(). The solution is primordials.js, which runs before anything else and captures frozen copies of all JavaScript built-ins.

The approach is methodical: for every built-in prototype method, primordials creates an "uncurried" version using Function.prototype.call.bind():

// Instead of: array.push(item)        — vulnerable to monkey-patching
// Internal code uses: ArrayPrototypePush(array, item)  — safe
const uncurryThis = bind.bind(call);

This means every internal module imports from primordials at the top of the file. Look at the first lines of lib/internal/modules/cjs/loader.js:

const {
  ArrayIsArray,
  ArrayPrototypeFilter,
  ArrayPrototypeIncludes,
  ArrayPrototypeIndexOf,
  ArrayPrototypeJoin,
  // ... dozens more
} = primordials;

The performance tradeoff is real. As the source comment notes: "Use of primordials have sometimes a dramatic impact on performance, please benchmark all changes made in performance-sensitive areas." Calling ArrayPrototypePush(arr, item) is measurably slower than arr.push(item) because V8 can't inline the uncurried form as aggressively. But the security guarantee is considered worth the cost.

Tip: When writing or reviewing patches to internal modules, always use primordials for builtin methods. The linter enforces this — node-core/prefer-primordials will flag direct prototype method calls.

The Internal Module System (BuiltinModule)

As we covered in Article 3, the BuiltinLoader compiles embedded JavaScript at runtime. But the JavaScript-side orchestration happens in realm.js, which creates the BuiltinModule class.

flowchart TD
    REQ["require('internal/fs/utils')"] --> BM["BuiltinModule.require()"]
    BM --> CACHED{"Already compiled<br/>and cached?"}
    CACHED -->|Yes| RET["Return cached exports"]
    CACHED -->|No| COMPILE["BuiltinLoader::CompileAndCall()"]
    COMPILE --> WRAP["Wrap source in function:<br/>(exports, require, module,<br/>__filename, __dirname,<br/>internalBinding, primordials)"]
    WRAP --> V8["V8 ScriptCompiler<br/>Compile + Execute"]
    V8 --> CACHE["Cache in module registry"]
    CACHE --> RET

BuiltinModule provides a require() function to internal modules that's distinct from the public require(). Internal modules can require each other using paths like 'internal/fs/utils', and they automatically get access to internalBinding() and primordials — things that are invisible to user-land modules.

The module load list is tracked by process.moduleLoadList, which records every binding and module loaded during the process lifetime, in order. This is invaluable for debugging startup performance.

CommonJS Loader Deep Dive

The CommonJS loader lives in lib/internal/modules/cjs/loader.js — a 2,158-line file that's been evolving since Node.js's earliest days.

The entry point is Module._load(), which implements the core require algorithm:

flowchart TD
    LOAD["Module._load(request, parent)"] --> CACHE_CHECK{"Fast path:<br/>relativeResolveCache hit?"}
    CACHE_CHECK -->|Yes| MODULE_CACHE{"Module._cache[filename]?"}
    CACHE_CHECK -->|No| RESOLVE["resolveForCJSWithHooks()"]
    MODULE_CACHE -->|Yes, loaded| RETURN_EXPORTS["Return cached exports"]
    MODULE_CACHE -->|Yes, loading| CIRCULAR["getExportsForCircularRequire()"]
    MODULE_CACHE -->|No| NEW_MODULE["new Module(filename)"]
    RESOLVE --> HOOKS{"Custom resolve hooks?"}
    HOOKS -->|Yes| CUSTOM["Run custom resolver"]
    HOOKS -->|No| RESOLVE_FN["Module._resolveFilename()"]
    RESOLVE_FN --> BUILTIN{"Is builtin module?"}
    BUILTIN -->|Yes| LOAD_BUILTIN["Load from BuiltinModule"]
    BUILTIN -->|No| FIND_PATH["Module._findPath()<br/>Search node_modules tree"]
    FIND_PATH --> NEW_MODULE
    NEW_MODULE --> COMPILE["module._compile(content, filename)"]
    COMPILE --> RETURN_EXPORTS

Module._resolveFilename() implements the resolution algorithm: check if it's a builtin, walk up the node_modules tree, check package.json for main and exports fields. The relative resolve cache (relResolveCacheIdentifier = parent.path + '\x00' + request) makes repeated requires from the same directory nearly free.

Module.prototype._compile() is where the CommonJS wrapper magic happens. Every CJS module gets wrapped in a function:

(function(exports, require, module, __filename, __dirname) {
  // Your module code here
});

This is why exports, require, module, __filename, and __dirname are available in every CommonJS file without explicit import — they're function parameters, not globals.

ESM Loader Architecture

The ESM loader is fundamentally different from CommonJS. It's asynchronous, supports import assertions, has a phase-based lifecycle, and delegates to V8's native module API through the ModuleWrap C++ binding.

The orchestrator is ModuleLoader in lib/internal/modules/esm/loader.js. It manages the resolve → load → translate → instantiate → evaluate pipeline.

sequenceDiagram
    participant USER as import 'specifier'
    participant ML as ModuleLoader
    participant MJ as ModuleJob
    participant TR as Translators
    participant MW as ModuleWrap (C++)
    participant V8 as V8 Module API

    USER->>ML: import('specifier')
    ML->>ML: resolve(specifier) → URL
    ML->>ML: load(URL) → source + format
    ML->>TR: translate(source, format)
    TR->>MW: new ModuleWrap(source, url)
    MW->>V8: v8::Module::Compile()
    ML->>MJ: new ModuleJob(loader, url, moduleWrap)
    MJ->>MJ: link() — resolve all dependencies
    MJ->>MW: instantiate()
    MW->>V8: v8::Module::InstantiateModule()
    MJ->>MW: evaluate()
    MW->>V8: v8::Module::Evaluate()
    V8-->>USER: module namespace

ModuleJob represents a single module going through its lifecycle. The constructor immediately starts linking — resolving all import statements in the module to create ModuleJob instances for dependencies, forming a dependency graph.

The translators module maps file formats to translation strategies. JavaScript files are compiled as ES modules via ModuleWrap. JSON files are wrapped in export default. WebAssembly .wasm files are compiled via V8's WebAssembly API. .node native addons are loaded via process.dlopen(). And CJS files go through a special CJS-to-ESM translator.

CJS↔ESM Interoperability

The interop between the two module systems is one of the trickiest parts of Node.js. The fundamental tension: CJS require() is synchronous, while ESM import is asynchronous.

flowchart LR
    subgraph "ESM importing CJS"
        ESM1["import cjs from 'pkg'"] --> TRANSLATE["CJS translator<br/>Executes CJS module<br/>Wraps exports"]
        TRANSLATE --> NS1["Module namespace<br/>default = module.exports"]
    end
    
    subgraph "CJS requiring ESM"
        CJS1["require('esm-pkg')"] --> SYNC{"Module already<br/>evaluated?"}
        SYNC -->|Yes| NS2["Return namespace"]
        SYNC -->|No| ERR["ERR_REQUIRE_ESM<br/>(unless --experimental-require-module)"]
    end

ESM importing CJS works by executing the CJS module synchronously and wrapping module.exports as the default export. Named exports are detected by statically analyzing the CJS source.

CJS requiring ESM is harder. require() is synchronous, but ESM evaluation can be asynchronous (top-level await). Node.js now supports require() of ESM modules that don't use top-level await, but if the module has async evaluation, you get ERR_REQUIRE_ASYNC_MODULE.

The ModuleJobSync class handles the synchronous CJS-requires-ESM path, providing a stripped-down version of the full ModuleJob lifecycle that can run without promises.

Module Customization Hooks and TypeScript Support

Node.js supports customizing module resolution and loading through hooks. There are two systems:

  1. Synchronous hooks via lib/internal/modules/customization_hooks.jsresolve and load hooks that run in the main thread.

  2. Async hooks via --experimental-loader — these run in a separate worker thread to avoid blocking the main event loop with potentially async resolution.

flowchart TD
    REGISTER["register() hook API"] --> SYNC{"Sync or async?"}
    SYNC -->|Sync| MAIN["Hooks run in main thread<br/>customization_hooks.js"]
    SYNC -->|Async| WORKER["Hooks run in worker thread<br/>Communicates via MessagePort"]
    
    MAIN --> RESOLVE["resolve(specifier, context, next)"]
    MAIN --> LOAD["load(url, context, next)"]
    WORKER --> RESOLVE2["resolve(specifier, context, next)"]
    WORKER --> LOAD2["load(url, context, next)"]
    
    LOAD --> TS{"TypeScript file?"}
    TS -->|Yes| AMARO["deps/amaro<br/>Strip type annotations"]
    TS -->|No| CONTINUE["Continue normal loading"]
    AMARO --> CONTINUE

TypeScript support is built on this hooks infrastructure. When --strip-types is enabled (or the file has a .ts extension), Node.js uses lib/internal/modules/typescript.js to strip type annotations via the amaro dependency (vendored in deps/amaro/). This is type stripping, not full TypeScript compilation — it removes type annotations but doesn't perform type checking or transform TypeScript-specific syntax like enum.

Tip: If you're building a custom loader (e.g., for a compile-to-JS language), prefer the register() API over the deprecated --experimental-loader flag. The register() API supports both sync and async hooks and is the path forward.

Entry Point Module Resolution

When you run node app.js, how does Node.js decide whether to load it as CJS or ESM? The logic lives in lib/internal/modules/run_main.js:

flowchart TD
    ENTRY["node app.js"] --> RESOLVE["resolveMainPath(main)"]
    RESOLVE --> EXT{".mjs extension?"}
    EXT -->|Yes| ESM["Load as ESM"]
    EXT -->|No| CJS_EXT{".cjs extension?"}
    CJS_EXT -->|Yes| CJS["Load as CJS"]
    CJS_EXT -->|No| WASM{".wasm extension?"}
    WASM -->|Yes| ESM
    WASM -->|No| TS{".mts extension?<br/>(--strip-types)"}
    TS -->|Yes| ESM
    TS -->|No| CTS{".cts extension?"}
    CTS -->|Yes| CJS
    CTS -->|No| PKG_TYPE["Check nearest<br/>package.json 'type' field"]
    PKG_TYPE -->|"module"| ESM
    PKG_TYPE -->|"commonjs" or absent| CJS

The shouldUseESMLoader() function makes this decision. File extension is checked first (.mjs → ESM, .cjs → CJS). If the extension is .js, it falls back to the nearest package.json's type field. The presence of --experimental-loader or --import flags also forces the ESM loader path.

run_main_module.js then calls Module.runMain(), which either executes the file directly through the CJS loader or hands off to the ESM loader.

What's Next

We've now covered how code gets into the Node.js process — from C++ bindings through to user modules. In the next article, we'll explore what happens when that code actually does something: the I/O system, streams, timers, and the event loop mechanics that make Node.js async I/O work.