The Install Pipeline — From `yarn install` to `node_modules`
Prerequisites
- ›Article 1: Architecture Overview and Code Navigation
- ›Understanding of async/await and Promise-based patterns
- ›Familiarity with package.json dependency fields
The Install Pipeline — From yarn install to node_modules
The install command is the beating heart of Yarn. Every yarn install, yarn add, yarn remove, and yarn upgrade ultimately flows through the same pipeline defined in a single 800-line file. This article dissects that pipeline—a clever "steps array" pattern where each phase is an async function that can bail out early, enabling the integrity check that makes subsequent installs near-instantaneous.
The Steps Array Pattern
The Install class constructor is deceptively simple. It creates a resolver, a linker, an integrity checker, and a script runner—all wired to the same Config instance we explored in Part 1.
The real action is in init(), which builds an array of async step functions and executes them sequentially:
const steps: Array<(curr: number, total: number) => Promise<{bailout: boolean} | void>> = [];
Each step receives (currentStep, totalSteps) for progress reporting and may return {bailout: true} to skip all remaining steps. The execution loop at line 728 is elegant:
for (const step of steps) {
const stepResult = await step(++currentStep, steps.length);
if (stepResult && stepResult.bailout) {
return flattenedTopLevelPatterns;
}
}
This pattern offers two key advantages: steps are defined declaratively (you can count them before executing any), and the bailout mechanism enables the integrity optimization without complex conditional logic.
flowchart TD
A["init()"] --> B["Step 1: Check manifest compatibility"]
B --> C["Step 2: Resolve packages"]
C -->|bailout?| Z["Return early ✓"]
C --> D["Step 3: Fetch packages"]
D --> E["Step 4: Link dependencies"]
E --> F["Step 5: Generate PnP map (if enabled)"]
F --> G["Step 6: Build (run lifecycle scripts)"]
G --> H["Step 7: Save HAR (if --har)"]
H --> I["Step 8: Autoclean (if .yarnclean)"]
I --> J["Save lockfile + integrity"]
Manifest Checking and Dependency Gathering
Before any step runs, init() calls fetchRequestFromCwd() to read the project's package.json and build the list of dependency patterns to resolve.
This method does more than parse a single file. For workspace projects, it:
- Reads the root manifest and iterates each registry's filename (
package.jsonfor npm). - Collects
dependencies,devDependencies, andoptionalDependenciesvia thepushDepshelper (line 295). - Resolves all workspace packages via
config.resolveWorkspaces(). - Creates a virtual manifest—a synthetic
package.jsonnamedworkspace-aggregator-<uuid>that depends on every workspace package (line 369). This ensures the resolver and hoister treat all workspaces as a single dependency graph.
sequenceDiagram
participant Init as init()
participant FRFC as fetchRequestFromCwd()
participant FS as File System
participant WS as Workspace Resolution
Init->>FRFC: Collect dependencies
FRFC->>FS: Read package.json
FS-->>FRFC: manifest JSON
FRFC->>FRFC: pushDeps(dependencies)
FRFC->>FRFC: pushDeps(devDependencies)
FRFC->>FRFC: pushDeps(optionalDependencies)
FRFC->>WS: resolveWorkspaces()
WS-->>FRFC: workspace map
FRFC->>FRFC: Create virtual aggregator manifest
FRFC-->>Init: {requests, patterns, manifest, workspaceLayout}
The output is an InstallCwdRequest containing requests (the patterns to resolve), patterns (all pattern strings), usedPatterns (patterns actually referenced by non-dev dependencies in production mode), and ignorePatterns (unused patterns kept only for deterministic hoisting).
Tip: When debugging why a dependency isn't being installed, check whether
pushDepsis categorizing it correctly. In production mode (--production),devDependenciesare pushed asignorePatternsrather thanusedPatterns(line 338).
The Integrity Bailout
The most important performance optimization in Yarn is the integrity check. On a project with thousands of dependencies, a fresh yarn install might take 30 seconds—but a repeat install with nothing changed returns in under a second.
The bailout() method is called after the resolve step. If the integrity check passes, it returns true and the pipeline skips fetch, link, and build entirely.
The IntegrityChecker writes a .yarn-integrity file into node_modules/ containing:
flowchart TD
A[".yarn-integrity"] --> B["systemParams<br/>(OS, arch, Node version)"]
A --> C["flags<br/>(flat, checkFiles, etc.)"]
A --> D["topLevelPatterns<br/>(all resolved patterns)"]
A --> E["lockfileEntries<br/>(pattern → version map)"]
A --> F["files<br/>(optional: all file paths)"]
A --> G["artifacts<br/>(build output tracking)"]
The check compares the stored integrity data against the current state. Several conditions cause a mismatch:
- System params changed (different OS, Node version, or CPU arch)
- Flags changed (e.g., switching from
--flatto non-flat) - Patterns changed (dependencies added or removed)
- Lockfile entries changed (versions updated)
- Module folders are missing
--check-filesis enabled and files don't match the stored list
The bailout is disabled in three cases: when running --audit, when PnP is enabled (PnP is fast enough that checking isn't worth it), and when --force or --skip-integrity-check is passed.
When the integrity file itself is missing but a lockfile exists (line 477), Yarn doesn't bail out—but it sets scripts.setForce(true) to ensure all lifecycle scripts re-run, since it can't know what state node_modules is in.
Steps 3-7: Resolve, Fetch, Link, Build
With the patterns collected and integrity checked, the remaining steps execute the core pipeline. Each step is wrapped in callThroughHook(), which we'll cover shortly.
Resolve (line 595): The PackageResolver takes the dependency request patterns and determines exactly which version of each package to install. This involves checking the lockfile, querying registries, and resolving exotic specifiers (git URLs, file paths). We'll cover this in depth in Article 3.
Fetch (line 638): The PackageFetcher downloads every resolved package into the global cache. It checks the cache first, validates integrity hashes, and falls back to network requests. Article 4 covers the fetcher strategies.
Link (line 648): The PackageLinker takes the resolved, fetched packages and constructs the node_modules directory. This involves the hoisting algorithm that flattens the dependency tree. Article 5 covers hoisting in detail.
PnP (line 665, conditional): If Plug'n'Play is enabled, generatePnpMap() creates a .pnp.js file instead of (or in addition to) node_modules.
Build (line 690): PackageInstallScripts runs preinstall, install, and postinstall lifecycle scripts for packages that define them.
After all steps complete, saveLockfileAndIntegrity() writes the updated yarn.lock and .yarn-integrity files.
Command Inheritance: Add, Remove, and Upgrade
One of Yarn's cleanest design decisions is that Add, Remove, and Upgrade are subclasses of Install. They don't duplicate the pipeline—they override specific hooks.
The Add class extends Install and overrides three methods:
classDiagram
class Install {
+prepareRequests(requests): requests
+preparePatterns(patterns): patterns
+prepareManifests(): manifests
+init(): patterns
+fetchRequestFromCwd(): InstallCwdRequest
+bailout(): boolean
}
class Add {
+prepareRequests(requests): requests + new packages
+preparePatterns(patterns): patterns + resolved versions
+prepareManifests(): manifests + new entries
}
class Remove {
+... overrides to strip packages
}
Install <|-- Add
Install <|-- Remove
prepareRequests() (line 50): Appends the user's yarn add <package> arguments as additional dependency request patterns. The base Install class returns requests unchanged.
preparePatterns() (line 95): After resolution, converts the raw patterns (e.g., lodash) into versioned patterns (e.g., lodash@^4.17.21) by looking up the resolved version and applying the configured save prefix (^, ~, or exact).
prepareManifests(): Updates the root package.json in memory to include the new dependencies under the correct field (dependencies, devDependencies, optionalDependencies, or peerDependencies), determined by the flagToOrigin computed in the constructor (line 32).
This inheritance pattern means every yarn add operation gets the full install pipeline—resolution, fetching, linking, building, integrity saving—without any code duplication. The --dev, --optional, and --peer flags simply change which dependency field the new package is written to.
Experimental Hooks
Yarn v1 includes an undocumented extensibility mechanism: the callThroughHook() function. Every pipeline step is wrapped in a hook call:
steps.push((curr, total) =>
callThroughHook('resolveStep', async () => {
// ... resolve logic
}),
);
The implementation checks for a global.experimentalYarnHooks object. If a hook function exists for the given type, it receives the original function and an optional context, allowing external code to intercept, wrap, or replace pipeline behavior:
const hook = global[YARN_HOOKS_KEY][type];
if (!hook) return fn();
return hook(fn, context);
Seven hook points are defined: resolveStep, fetchStep, linkStep, buildStep, pnpStep, auditStep, and runScript. While never officially documented, this mechanism was used by some tooling to instrument or modify Yarn's behavior at runtime.
Tip: To use experimental hooks, create a script that sets
global.experimentalYarnHooksbefore requiring Yarn's CLI. For example, you could wrapfetchStepto log timing data for each pipeline phase.
What's Next
We've now traced the entire install pipeline from init() through the steps array to the final lockfile save. But we glossed over the two most complex phases: resolution and linking. In the next article, we'll dive deep into how the resolver decides which version of every package to install—covering the three resolver categories, the lockfile-first strategy, the custom lockfile parser, and the BlockingQueue that controls concurrency.