Read OSS

The Line Breaking Engine: Fast-Path Arithmetic and Browser Parity

Advanced

Prerequisites

  • Article 1: Architecture and the Two-Phase Model
  • Article 2: Text Analysis Pipeline
  • CSS line-breaking behavior (overflow-wrap, trailing whitespace)

The Line Breaking Engine: Fast-Path Arithmetic and Browser Parity

The first two articles covered the "slow" half of Pretext — the analysis and measurement work that prepare() performs once. Now we enter the "fast" half: the line-breaking engine in line-break.ts that executes during every layout() call. This is the code path that achieves ~0.0002ms per text block, and its design choices reflect an obsession with keeping the hot path tight.

The engine implements CSS white-space: normal + overflow-wrap: break-word line-breaking semantics: break before any non-space segment that would overflow, hang trailing whitespace past the line edge, and fall back to grapheme-level breaking for words wider than the container. It also handles the full complexity of soft hyphens, tabs, hard breaks, and browser-specific behavioral differences.

Simple vs Full Path Dispatch

The entry point for the hot path is countPreparedLines(), which dispatches between two walkers based on a flag set at prepare time:

src/line-break.ts#L166-L171

export function countPreparedLines(prepared, maxWidth) {
  if (prepared.simpleLineWalkFastPath) {
    return countPreparedLinesSimple(prepared, maxWidth)
  }
  return walkPreparedLines(prepared, maxWidth)
}

The simpleLineWalkFastPath flag is true when the text contains only text, space, and zero-width-break segments — no hard breaks, soft hyphens, tabs, glue, or preserved spaces. As we saw in Part 2, this flag is flipped to false during measurement whenever a non-simple segment kind is encountered.

flowchart TD
    A["countPreparedLines(prepared, maxWidth)"] --> B{simpleLineWalkFastPath?}
    B -->|true| C["walkPreparedLinesSimple()<br/>Handles: text, space, ZWSP"]
    B -->|false| D["walkPreparedLines()<br/>Handles: all 8 segment kinds"]
    C --> E[Return line count]
    D --> E

This matters more than it might seem. The simple walker avoids chunk iteration, soft-hyphen logic, tab stop calculation, and the lineEndFitAdvance vs lineEndPaintAdvance distinction entirely. For the majority of text in a typical application (English prose, chat messages, social posts), the simple path is the one that runs.

The Simple Line Walker

walkPreparedLinesSimple() is the core hot-path loop. Let's trace its logic:

src/line-break.ts#L177-L351

The walker maintains a small set of mutable state variables — all primitive numbers and booleans:

let lineCount = 0
let lineW = 0                    // Accumulated width of current line
let hasContent = false            // Whether the current line has any content
let pendingBreakSegmentIndex = -1 // Where to break if overflow occurs
let pendingBreakPaintWidth = 0    // Width to report for the pending break point

The main loop iterates segment indices and handles three cases for each segment:

Case 1: Starting a new line. Skip leading spaces and zero-width breaks, then start the line at the first non-space segment. If that first segment is wider than maxWidth and has breakableWidths, break it at grapheme level.

Case 2: Segment fits. Add the segment's width to lineW, advance the end cursor, and if the segment is a break opportunity (space or ZWSP), record it as a pendingBreak.

Case 3: Segment overflows. This is where the interesting logic lives. The walker must decide where to break:

stateDiagram-v2
    state "Segment overflows (newW > maxWidth + ε)" as Overflow
    state "Is current segment breakable?" as CanBreak
    state "Pending break exists?" as HasPending
    state "Segment wider than maxWidth with grapheme widths?" as WideBreakable

    Overflow --> CanBreak
    CanBreak --> EmitWithTrailing: yes (space/ZWSP)
    CanBreak --> HasPending: no
    HasPending --> EmitAtPending: yes
    HasPending --> WideBreakable: no
    WideBreakable --> GraphemeBreak: yes
    WideBreakable --> EmitBeforeCurrent: no

    EmitWithTrailing: Append segment, emit line\nwithout trailing space width
    EmitAtPending: Emit line at pending break point
    GraphemeBreak: Emit current line, break\nsegment at grapheme boundaries
    EmitBeforeCurrent: Emit line, retry\ncurrent segment on next line

The trailing whitespace handling is subtle and CSS-correct: when a space segment causes overflow, it's appended to the current line but the emitted line width excludes the space width. This is the "hanging whitespace" behavior — trailing spaces hang past the line edge without triggering breaks.

The pending break mechanism is how the walker handles cases where multiple text segments overflow past a previous break opportunity. If lineW exceeds maxWidth at segment i, but a space at segment j < i was the last break opportunity, the walker emits the line through segment j and retries from j+1.

The Full Line Walker: Chunks, Soft-Hyphens, and Tabs

When simpleLineWalkFastPath is false, the full walker walkPreparedLines() engages. It has the same core structure but adds three major capabilities:

src/line-break.ts#L353-L648

Chunk-based iteration: Hard breaks divide the text into chunks (established during analysis). The full walker iterates chunks in its outer loop and segments within each chunk in its inner loop. Empty chunks (two consecutive hard breaks) emit an empty line immediately.

src/line-break.ts#L539-L544

Soft-hyphen logic: Soft hyphens are invisible by default but create break opportunities. When a soft hyphen is encountered, the walker records a pending break whose width includes discretionaryHyphenWidth (the width of -). If overflow occurs and the pending break is a soft hyphen, the walker first tries continueSoftHyphenBreakableSegment() — fitting as many graphemes of the next word as possible before the hyphen + remaining graphemes would overflow. Safari has a further specialization: preferEarlySoftHyphenBreak causes it to take the soft-hyphen break early rather than trying to fit more graphemes.

src/line-break.ts#L489-L525

Tab stops: Tab segments have a dynamic width computed by getTabAdvance(), which calculates the distance to the next tab stop based on the current line position. Tab stops are spaced at 8 × spaceWidth intervals.

src/line-break.ts#L61-L67

Fit-Advance vs Paint-Advance: The Subtle Width Distinction

One of the most nuanced aspects of the engine is the split between lineEndFitAdvance and lineEndPaintAdvance. Each segment carries both values, and they mean different things:

lineEndFitAdvance: The width contribution used for line-fit decisions. When deciding whether the line overflows, this is what matters. For trailing spaces, it's zero — spaces don't contribute to fitting.

lineEndPaintAdvance: The width reported to the caller as the visible line width. For collapsible spaces, it's zero (they're invisible at line end). For preserved spaces in pre-wrap, it's the actual space width (they remain visible).

src/layout.ts#L322-L329

const lineEndFitAdvance =
  segKind === 'space' || segKind === 'preserved-space' || segKind === 'zero-width-break'
    ? 0  // Don't count for fitting
    : w
const lineEndPaintAdvance =
  segKind === 'space' || segKind === 'zero-width-break'
    ? 0  // Invisible at line end
    : w  // preserved-space: still visible

This distinction is critical for the pending break mechanism in the full walker. When a pending break is recorded, both the fit width and paint width at that break point are stored:

pendingBreakFitWidth = lineW - segmentWidth + fitAdvance
pendingBreakPaintWidth = lineW - segmentWidth + paintAdvance

The fit width is compared against maxWidth + lineFitEpsilon to decide whether the break is valid. The paint width is what gets reported in the emitted line. This matches how CSS renders trailing whitespace — it hangs past the container edge without affecting layout.

Tip: If you're implementing custom rendering using layoutWithLines(), the line.width value is the paint width, not the container width. A line ending with a trailing space might have a paint width smaller than the max width, even though the space visually hangs past the edge.

layoutNextLine(): Iterator-Style Variable-Width Layout

While layout() and layoutWithLines() use a fixed maxWidth for all lines, layoutNextLine() enables line-by-line iteration where each line can have a different width. This powers variable-width layouts like text flowing around obstacles:

src/line-break.ts#L651-L662

sequenceDiagram
    participant App as Application
    participant LNL as layoutNextLine()
    participant State as Cursor State

    App->>LNL: start={seg:0, grapheme:0}, width=400
    LNL->>State: Normalize start, walk one line
    LNL-->>App: {text: "Hello world", end: {seg:3, grapheme:0}}

    App->>LNL: start={seg:3, grapheme:0}, width=250
    LNL->>State: Resume from cursor, walk one line
    LNL-->>App: {text: "foo bar", end: {seg:6, grapheme:0}}

    App->>LNL: start={seg:6, grapheme:0}, width=400
    LNL->>State: Resume, walk one line
    LNL-->>App: null (end of text)

The key design is cursor hand-off: each call returns a LayoutLine whose end cursor is passed as the start of the next call. This enables multi-column layouts (left column consumes text, right column resumes from the cursor), obstacle routing (width changes per line), and progressive rendering.

The internal implementation layoutNextLineRange() normalizes the start cursor (skipping leading spaces), finds the containing chunk, and runs a single-pass version of the full walker that stops after one line instead of continuing through the whole text.

Browser Shims: lineFitEpsilon and Engine-Specific Behavior

The line walker uses a lineFitEpsilon tolerance when comparing accumulated width against maxWidth:

if (newW > maxWidth + lineFitEpsilon) { ... }

This epsilon compensates for floating-point arithmetic differences between Pretext's sum-of-segment-widths approach and the browser's native layout engine. The value varies by browser:

Browser Engine lineFitEpsilon carryCJKAfterClosingQuote preferPrefixWidthsForBreakableRuns preferEarlySoftHyphenBreak
Chromium 0.005 true false false
Gecko (Firefox) 0.005 false false false
WebKit (Safari) 1/64 (≈0.0156) false true true
Server (no navigator) 0.005 false false false

src/measurement.ts#L65-L101

Safari's larger epsilon reflects its use of fixed-point arithmetic (1/64 precision) internally. The carryCJKAfterClosingQuote flag triggers the Chromium-specific merge where a CJK character after a closing quote stays on the same line. preferPrefixWidthsForBreakableRuns enables Safari's alternative measurement approach for sub-word breaking, where cumulative prefix widths are more accurate than summing individual grapheme widths. preferEarlySoftHyphenBreak matches Safari's tendency to break at soft hyphens earlier rather than fitting more graphemes.

Tip: If your line counts don't match the browser's native layout in edge cases, the lineFitEpsilon is usually the first thing to check. It's a tuning parameter that balances false positives (breaking too early) against false negatives (not breaking when the browser would).

Looking Ahead

We've now walked through the complete line-breaking engine — from the simple/full dispatch, through the pending-break mechanism and trailing whitespace handling, to the subtle fit/paint distinction and browser-specific tuning. This engine consumes the parallel arrays built by the analysis and measurement pipeline from Part 2, and produces line counts in microseconds.

In Part 4, we'll zoom out from the line walker to examine the cross-browser and internationalization challenges: how engine profiles are detected, how emoji width correction compensates for Canvas/DOM discrepancies, how the segment metric cache is structured, and how the bidi implementation from pdf.js handles mixed RTL/LTR text.