Read OSS

From JVM Launch to Cluster Join: The Elasticsearch Startup Sequence

Advanced

Prerequisites

  • Article 1: Architecture Overview
  • Java module system (JPMS) basics
  • Dependency injection concepts

From JVM Launch to Cluster Join: The Elasticsearch Startup Sequence

Starting an Elasticsearch node is deceptively complex. What appears to be a simple java -jar invocation triggers a precisely choreographed startup sequence that must initialize logging before anything else, lock memory and install security sandboxes before any threads spawn, wire together dozens of interdependent services in the correct order, and — critically — start accepting HTTP traffic only after the node is fully ready. One misstep in this ordering and the node either crashes or accepts requests it can't properly handle.

This article traces the complete path from Elasticsearch.main() through the final HttpServerTransport.start().

Three-Phase Bootstrap in Elasticsearch.java

The entry point is Elasticsearch.main(), which dispatches to exactly three phases:

public static void main(final String[] args) {
    Bootstrap bootstrap = initPhase1();
    assert bootstrap != null;
    try {
        initPhase2(bootstrap);
        initPhase3(bootstrap);
    } catch (NodeValidationException e) {
        bootstrap.exitWithNodeValidationException(e);
    } catch (Throwable t) {
        bootstrap.exitWithUnknownException(t);
    }
}
sequenceDiagram
    participant JVM
    participant Phase1
    participant Phase2
    participant Phase3

    JVM->>Phase1: initPhase1()
    Note over Phase1: Static init, read CLI args,<br/>configure logging LAST
    Phase1-->>JVM: Bootstrap object

    JVM->>Phase2: initPhase2(bootstrap)
    Note over Phase2: Log system info, PID file,<br/>native access, JarHell,<br/>plugin loading, entitlements

    JVM->>Phase3: initPhase3(bootstrap)
    Note over Phase3: Construct Node,<br/>start Node,<br/>signal readiness

Phase 1: Static Init and Logging

initPhase1() does the absolute minimum: initializes security properties, reads ServerArgs from stdin (the CLI launcher process pipes them in), creates a basic Environment, and configures logging. The source code contains an emphatic comment:

// DO NOT MOVE THIS
// Logging must remain the last step of phase 1.

This constraint exists because any initialization step that needs logging must happen in Phase 2, after logging is configured. Phase 1 writes exceptions directly to stderr because the logging framework isn't ready yet.

Phase 2: Security and Native Initialization

initPhase2() is the most complex bootstrap phase. It handles:

  1. System info logging — JVM version, OS, build hash
  2. PID file creation with a shutdown hook for cleanup
  3. Uncaught exception handler registration
  4. Native controller spawning (for ML or other native processes)
  5. Native access initialization — memory locking (mlockall), system call filters, coredump filter configuration
  6. JarHell check — scans for duplicate classes on the classpath
  7. Plugin loading — loads module and plugin bundles, creates JPMS module layers
  8. Entitlement bootstrap — the new security system replacing SecurityManager, with a self-test that verifies process creation is properly blocked

Tip: The entitlement system (EntitlementBootstrap) is Elasticsearch's replacement for the deprecated Java SecurityManager. It uses bytecode instrumentation to intercept sensitive operations. After bootstrapping, a self-test at line 274 attempts to start a process and verifies that it's properly denied.

Phase 3: Node Construction and Startup

initPhase3() first verifies Lucene version compatibility, then constructs the Node, starts it, and signals readiness to the parent CLI process. The readiness signal is the last thing that happens — another "DO NOT MOVE THIS" comment guards this ordering:

// DO NOT MOVE THIS
// Signaling readiness to accept requests must remain the last step

NodeConstruction: The 1,888-Line Orchestration

The Node constructor delegates the heavy lifting to NodeConstruction.prepareConstruction(). This method is the largest single orchestration point in the codebase — wiring together dozens of services in strict dependency order.

flowchart TD
    START[prepareConstruction] --> ENV[createEnvironment<br/>Plugin loading, settings merge]
    ENV --> TEL[TelemetryProvider]
    TEL --> TP[ThreadPool creation]
    TP --> SM[SettingsModule validation]
    SM --> PR[ProjectResolver<br/>single vs multi-project]
    PR --> SEARCH[SearchModule]
    SEARCH --> REG[Client & Registries<br/>NamedWriteableRegistry, XContentRegistry]
    REG --> SCRIPT[ScriptService]
    SCRIPT --> ANALYSIS[AnalysisRegistry]
    ANALYSIS --> CONSTRUCT[construct<br/>ClusterService, IngestService,<br/>IndicesService, TransportService,<br/>ActionModule, and more]
    CONSTRUCT --> GUICE[Guice binding & Injector creation]

The construction order is dictated by dependencies. For example, ThreadPool must exist before SettingsModule (because settings validation uses thread context), and SearchModule must exist before ActionModule (because action registration depends on search capabilities).

Let's look at the top of prepareConstruction:

static NodeConstruction prepareConstruction(
    Environment initialEnvironment,
    PluginsLoader pluginsLoader,
    NodeServiceProvider serviceProvider,
    boolean forbidPrivateIndexSettings
) {
    List<Closeable> closeables = new ArrayList<>();
    try {
        NodeConstruction constructor = new NodeConstruction(closeables);
        Settings settings = constructor.createEnvironment(initialEnvironment, serviceProvider, pluginsLoader);
        // ...

The closeables list is a cleanup mechanism — if construction fails partway through, all already-created resources are properly closed. This is critical for a system that opens file handles, thread pools, and network connections during initialization.

The Environment and Plugin Loading Phase

createEnvironment() creates the PluginsService, merges plugin-provided settings with user settings, and builds the final Environment. This is where the PluginsService — the runtime container for all plugins — comes to life.

The ProjectResolver initialization at line 325 is noteworthy — it's part of the emerging multi-project architecture for serverless deployments. In the default single-project mode, it resolves to ProjectResolverFactory.DEFAULT.

Plugin Loading via Module Layers

Elasticsearch uses the Java Platform Module System (JPMS) to isolate plugins. Each plugin gets its own module layer and class loader, created during Phase 2 by PluginsLoader. This provides several guarantees:

  • Plugins cannot access each other's internals
  • Different plugins can depend on different versions of the same library
  • Plugin code can be selectively granted entitlements (permissions)
graph TD
    BOOT[Boot Module Layer<br/>JDK modules] --> SERVER[Server Module Layer<br/>Elasticsearch core]
    SERVER --> MOD1[Module Layer: lang-painless]
    SERVER --> MOD2[Module Layer: repository-s3]
    SERVER --> MOD3[Module Layer: x-pack-security]
    SERVER --> MODN[Module Layer: ...]

The Plugin.PluginServices interface serves as a dependency injection bridge — plugins receive a PluginServices instance during createComponents() that gives them access to Client, ClusterService, ThreadPool, and other core services without requiring Guice bindings.

Node.start(): Service Startup Ordering

After construction, Node.start() starts services in a carefully specified order. This sequence can be grouped into five tiers:

sequenceDiagram
    participant Node
    participant Tier1 as Tier 1: Foundations
    participant Tier2 as Tier 2: Cluster
    participant Tier3 as Tier 3: Coordination
    participant Tier4 as Tier 4: Join
    participant Tier5 as Tier 5: HTTP (LAST)

    Node->>Tier1: Plugin lifecycle, IndicesService,<br/>SnapshotsService, SearchService,<br/>FsHealthService, NodeMetrics
    Node->>Tier2: ClusterService, NodeConnectionsService,<br/>GatewayService
    Node->>Tier3: TransportService.start(),<br/>Coordinator.start(),<br/>ClusterService.start()
    Node->>Tier4: coordinator.startInitialJoin()<br/>Wait for master (with timeout)
    Node->>Tier5: HttpServerTransport.start()<br/>ReadinessService.start()

The first tier starts foundational services — IndicesService (index management), SnapshotsService, SearchService, and monitoring components. These don't depend on cluster state.

The second tier starts cluster-aware services. NodeConnectionsService manages persistent connections to other nodes. GatewayService handles metadata recovery from disk.

The third tier is where the node joins the cluster. TransportService.start() binds the transport port and makes the node reachable. Then Coordinator.start() and ClusterService.start() bring up the consensus layer. The coordinator begins its initial join process.

At line 378-423, there's a blocking wait with timeout for the initial cluster state. If the node can't find a master within the configured timeout, it logs a warning with a troubleshooting reference link.

Finally, at line 428 — after a prominent DO NOT ADD NEW START CALLS BELOW HERE comment — HttpServerTransport.start() opens the HTTP port. This deliberate ordering ensures that no HTTP request can arrive before the node is fully operational.

Tip: If you're debugging startup issues, the tier ordering tells you which services might not be initialized yet. A failure in the transport tier means HTTP never started — the node won't even be reachable for debugging via REST APIs.

ThreadPool Design and Named Pools

The ThreadPool is created early in NodeConstruction because nearly every other service depends on it. Elasticsearch defines approximately 20 named pools, each tuned for a specific workload:

Pool Name Type Purpose
GENERIC Scaling Catch-all for recovery, misc tasks. Very high max size.
CLUSTER_COORDINATION Fixed(1) Single-threaded to avoid contention on Coordinator#mutex
SEARCH Fixed(cpus) Query phase execution
WRITE Fixed(cpus) Indexing operations
GET Fixed(cpus) Real-time get operations
MANAGEMENT Scaling(small) Stats collection, admin tasks
SNAPSHOT Scaling Snapshot/restore operations
FLUSH Scaling Lucene flush and translog operations

The CLUSTER_COORDINATION pool's default size of 1 is a deliberate design choice — the Coordinator class uses a single mutex to protect all coordination state, and running coordination work on multiple threads would create contention without improving throughput. We'll explore this mutex pattern in detail in the next article.

Where to Go Next

Now that we understand how a single node boots up and starts its services, the next article Part 3: Cluster Coordination explores how multiple nodes discover each other, elect a master, and maintain a consistent view of cluster state through Elasticsearch's Raft-like consensus protocol.