Read OSS

Container Lifecycle: From `container run` to Exit

Advanced

Prerequisites

  • Article 1: Architecture and Navigation Guide
  • Article 2: The XPC Communication Layer

Container Lifecycle: From container run to Exit

In the previous articles we mapped the architecture and dissected the XPC communication layer. Now it's time to watch everything work together. This article traces the complete path of a container run command — from the moment you type it to the moment the container process exits and you get your shell prompt back.

This is where the four-layer architecture, the Service/Harness pattern, the two-server endpoint handshake, and the file-descriptor passing all converge into a single, coordinated flow.

CLI Parsing and ContainerConfiguration

Everything begins in ContainerRun.swift. The command uses Swift Argument Parser's @OptionGroup pattern to organize flags into logical groups: process options (tty, interactive), resource options (CPUs, memory, storage), management options (name, detach, auto-remove), and registry options.

flowchart TD
    A["container run --name web -p 8080:80 nginx"] --> B[Parse Flags]
    B --> C[Generate Container ID]
    C --> D[Check for Existing Container]
    D --> E["Utility.containerConfigFromFlags()"]
    E --> F[ContainerConfiguration]
    F --> G[ContainerClient.create]
    G --> H[ContainerClient.bootstrap]

The flag groups are combined by Utility.containerConfigFromFlags() into a ContainerConfiguration — the central data type that describes everything about a container. This struct is Codable and travels across process boundaries as JSON embedded in XPC messages.

The configuration captures the complete container spec:

Field Purpose
id Unique container identifier
image OCI image reference
mounts Host-to-container filesystem mounts
publishedPorts Port mappings (host:container)
networks Network attachment configurations
resources CPU count, memory (default 1 GiB), storage quota
rosetta Enable x86-64 translation
ssh Forward SSH agent socket
readOnly Mount rootfs read-only
runtimeHandler Which runtime plugin to use (default: container-runtime-linux)
initProcess The process to run inside the container

Tip: The runtimeHandler field defaults to "container-runtime-linux" but is configurable — this is how the plugin system allows alternative runtimes.

ContainerClient: Creating and Bootstrapping via XPC

With the configuration built, the CLI uses ContainerClient to make two XPC calls: create() and bootstrap().

The create() call at lines 48-76 JSON-encodes the ContainerConfiguration, the kernel information, and creation options, stuffs them into an XPCMessage with route .containerCreate, and sends it to the API server.

The bootstrap() call at lines 116-146 is more interesting. It packs the stdio file handles (stdin, stdout, stderr pipes) directly into the XPC message. These file descriptors will travel from the CLI process, through the API server, and into the container runtime — crossing two process boundaries via XPC's kernel-level fd passing.

sequenceDiagram
    participant CLI as container CLI
    participant API as container-apiserver
    participant LD as launchd
    participant RT as container-runtime-linux

    CLI->>API: containerCreate(config, kernel)
    API->>API: Persist ContainerSnapshot
    API->>API: Find runtime plugin
    API->>LD: bootstrap plist for runtime
    LD->>RT: Launch process
    API-->>CLI: OK

    CLI->>API: containerBootstrap(id, stdio fds)
    API->>RT: createEndpoint (public Mach service)
    RT-->>API: XPC endpoint
    API->>RT: bootstrap(stdio fds, attachments)
    RT->>RT: Boot Linux VM
    RT-->>API: OK
    API-->>CLI: OK + ClientProcess handle

ContainersService: Plugin Registration and Sandbox Setup

On the server side, ContainersService is the actor that manages all container state. When it receives a create request, it:

  1. Deserializes the ContainerConfiguration from the XPC message
  2. Creates a ContainerSnapshot — the persistent state record
  3. Persists it to disk via FilesystemEntityStore
  4. Finds the appropriate runtime plugin using pluginLoader
  5. Registers the runtime plugin with launchd via pluginLoader.registerWithLaunchd()

The persistence layer uses FilesystemEntityStore — an actor that writes JSON files to disk and maintains an in-memory index. Each container gets its own directory under <appRoot>/containers/<id>/, containing an entity.json file with the serialized snapshot.

The ContainersService maintains an in-memory dictionary of ContainerState structs, each holding the snapshot, the SandboxClient (once connected), and allocated network attachments.

When bootstrap is called, the ContainersService performs the endpoint handshake described in Article 2 — connecting to the runtime's public Mach service, obtaining an anonymous endpoint, and establishing a direct connection.

The SandboxClient Endpoint Handshake

The SandboxClient.create() static method implements the two-server handshake in practice:

  1. Construct the Mach service label: com.apple.container.runtime.container-runtime-linux.{uuid}
  2. Create an XPCClient connected to that service
  3. Send a createEndpoint request
  4. Extract the xpc_endpoint_t from the response
  5. Call xpc_connection_create_from_endpoint to get a direct connection
  6. Return a SandboxClient backed by the direct connection

From this point on, all communication with the runtime bypasses the public Mach service entirely. The bootstrap, createProcess, start, wait, and other operations all flow through the anonymous connection.

SandboxService: VM Creation and Linux Boot

Inside the runtime helper, SandboxService is the actor that manages the VM lifecycle. Its bootstrap method at lines 126-179 is where the Linux VM actually starts.

The bootstrap sequence:

  1. Create a container bundle on disk if it doesn't exist
  2. Load the container configuration and kernel from the bundle
  3. Configure kernel arguments (including security modules: lsm=lockdown,capability,landlock,yama,apparmor)
  4. Create a VZVirtualMachineManager from the containerization library
  5. Extract allocated network attachments from the XPC message
  6. Dynamically configure DNS nameservers if not explicitly set
  7. Select the network interface strategy based on macOS version
  8. Boot the VM
flowchart TD
    A[bootstrap message] --> B[Load config from bundle]
    B --> C[Configure kernel args]
    C --> D[Create VZVirtualMachineManager]
    D --> E[Extract network attachments]
    E --> F{macOS version?}
    F -->|"macOS 26+"| G[NonisolatedInterfaceStrategy]
    F -->|"macOS 15"| H[IsolatedInterfaceStrategy]
    G --> I[Attach interfaces to VM]
    H --> I
    I --> J[Boot Linux VM]
    J --> K[Start guest agent]
    K --> L[Create init process]

The network interface strategy selection at RuntimeLinuxHelper+Start.swift#L67-L72 uses #available(macOS 26, *) guards to choose between NonisolatedInterfaceStrategy (macOS 26+, which supports full container-to-container networking) and IsolatedInterfaceStrategy (macOS 15, where containers are network-isolated from each other).

ProcessIO: Stdio, Signals, and Terminal Resize

Back on the CLI side, ProcessIO manages the connection between the user's terminal and the container's stdio streams. Its create method at lines 46-84 sets up the I/O pipeline differently based on the mode:

Interactive TTY mode (--tty --interactive): The terminal is put into raw mode via Terminal.setraw(). Stdin is read using non-blocking I/O with a readabilityHandler callback. Stdout from the container is piped directly to the user's stdout. Stderr is merged into stdout (as is standard for TTY mode).

Non-TTY mode: Stdout and stderr get separate pipes with independent readabilityHandler callbacks. An IoTracker coordinates stream completion — it uses an AsyncStream<Void> to signal when each output stream has finished (received empty data indicating EOF).

Detached mode (--detach): No output pipes are created. The container ID is printed and the CLI exits immediately.

flowchart LR
    subgraph CLI Process
        STDIN["Host stdin"]
        STDOUT["Host stdout"]
        STDERR["Host stderr"]
    end
    subgraph "Pipes (via XPC)"
        P1["stdin pipe"]
        P2["stdout pipe"]
        P3["stderr pipe"]
    end
    subgraph Runtime Process
        VM["Container VM"]
    end

    STDIN -->|readabilityHandler| P1
    P1 -->|fd passed via XPC| VM
    VM -->|fd passed via XPC| P2
    P2 -->|readabilityHandler| STDOUT
    VM -->|fd passed via XPC| P3
    P3 -->|readabilityHandler| STDERR

Signal handling is centralized. ProcessIO registers handlers for SIGTERM, SIGINT, SIGUSR1, SIGUSR2, and SIGWINCH. For TTY sessions, SIGWINCH (terminal resize) triggers a resize command sent to the container via XPC. For non-TTY sessions, a SignalThreshold counter allows the user to force-exit after three consecutive SIGINT/SIGTERM signals.

Tip: The non-blocking stdin trick using OSFile.makeNonBlocking() is essential. Without it, a blocking read on stdin would prevent the process from responding to signals or detecting when the container has exited.

Container States and Exit

Containers move through a simple state machine:

stateDiagram-v2
    [*] --> created: create()
    created --> running: bootstrap()
    running --> stopped: exit / stop / kill
    stopped --> [*]: delete()
    created --> [*]: delete()

The ContainersService tracks these states in the ContainerSnapshot persisted to disk. When the CLI calls bootstrap, the service updates the snapshot to running. When the container's init process exits, the runtime notifies via the exit monitor, and the state moves to stopped.

The exit flow is instructive. After bootstrap returns, the CLI's ContainerRun.run() method at line 165 calls io.handleProcess(process:log:), which starts the container's init process and waits for it to exit. The wait is implemented via the containerWait XPC route, which blocks until the runtime reports an exit code.

The exit code from the container process is propagated all the way back to the CLI, which throws it as an ArgumentParser.ExitCode at line 173. If the container exits with code 0, the CLI exits with 0. If it exits with 1, the CLI exits with 1. Clean and transparent.

If the --remove flag was set, the container is automatically deleted after exit. If an error occurs during the run, the CLI attempts to clean up by calling client.delete(id:) in the catch block at line 167.

What's Next

We've now traced the full lifecycle of a container, but we glossed over a critical subsystem: networking. How does the container get an IP address? How can containers resolve each other by hostname? Why does the project include its own DNS server? The next article dives into the networking stack — from virtual network creation to IP allocation to a surprisingly specific musl libc compatibility workaround in the DNS handler.