Read OSS

Controlling Three Browsers: The Browser Abstraction Layer

Advanced

Prerequisites

  • Articles 1-2: Architecture and Protocol Layer
  • Familiarity with Chrome DevTools Protocol (CDP) concepts
  • Understanding of abstract class patterns in TypeScript

Controlling Three Browsers: The Browser Abstraction Layer

In Articles 1 and 2, we traced how a user's API call becomes a protocol message that reaches the server. But what happens when that message arrives? The server must translate it into browser-specific commands for Chromium, Firefox, or WebKit — three browsers with fundamentally different remote control protocols. This article explores Playwright's browser abstraction layer: the class hierarchy, the PageDelegate interface at its heart, Chromium's CDP integration, emerging BiDi support, the instrumentation system, and the browser registry.

The Server-Side Class Hierarchy

Playwright's server organizes browser control into a clean hierarchy:

BrowserType → Browser → BrowserContext → Page → Frame

Each level has an abstract base class in packages/playwright-core/src/server/ and browser-specific implementations prefixed with CR (Chromium), FF (Firefox), or WK (WebKit).

The BrowserType base class at packages/playwright-core/src/server/browserType.ts#L48-L56 is the starting point for all browser interactions:

export abstract class BrowserType extends SdkObject {
  private _name: BrowserName;
  constructor(parent: SdkObject, browserName: BrowserName) {
    super(parent, 'browser-type');
    this.attribution.browserType = this;
    this._name = browserName;
  }

The launch() method at packages/playwright-core/src/server/browserType.ts#L66-L72 validates options, checks for Selenium hub overrides, and delegates to browser-specific launch logic.

classDiagram
    class BrowserType {
        <<abstract>>
        +launch()
        +launchPersistentContext()
        +executablePath()
    }
    class Chromium {
        -_bidiChromium: BrowserType
    }
    class Firefox
    class WebKit
    
    class Browser {
        <<abstract>>
        +newContext()
        +close()
    }
    class CRBrowser {
        +_connection: CRConnection
        +_session: CRSession
    }

    class Page {
        +_delegate: PageDelegate
        +click()
        +fill()
    }
    
    BrowserType <|-- Chromium
    BrowserType <|-- Firefox
    BrowserType <|-- WebKit
    Browser <|-- CRBrowser
    Browser *-- BrowserContext
    BrowserContext *-- Page

The server-side Playwright root at packages/playwright-core/src/server/playwright.ts#L50-L66 instantiates all browser types:

this.chromium = new Chromium(this, new BidiChromium(this));
this.firefox = new Firefox(this, new BidiFirefox(this));
this.webkit = new WebKit(this);

Note how BidiChromium and BidiFirefox are passed as constructor arguments to the traditional implementations — Playwright is gradually adding WebDriver BiDi alongside its existing CDP and custom protocol paths.

PageDelegate: The Abstraction Point

The most important interface in Playwright's browser abstraction is PageDelegate, defined in packages/playwright-core/src/server/page.ts#L55-L105. This is where the generic Page class delegates to browser-specific implementations:

export interface PageDelegate {
  readonly rawMouse: input.RawMouse;
  readonly rawKeyboard: input.RawKeyboard;
  readonly rawTouchscreen: input.RawTouchscreen;
  reload(): Promise<void>;
  goBack(): Promise<boolean>;
  goForward(): Promise<boolean>;
  navigateFrame(frame: Frame, url: string, referrer: string | undefined): Promise<GotoResult>;
  takeScreenshot(progress: Progress, format: string, ...): Promise<Buffer>;
  // ... 20+ more methods
}

Every browser-specific page class (CRPage, FFPage, WKPage) implements this interface. The base Page class holds a _delegate: PageDelegate reference and calls through it for all browser-specific operations.

What's elegant about this design is the comments that reveal cross-browser quirks:

// Work around WebKit's raf issues on Windows.
rafCountForStablePosition(): number;
// Work around Chrome's non-associated input and protocol.
inputActionEpilogue(): Promise<void>;
// Work around for asynchronously dispatched CSP errors in Firefox.
readonly cspErrorsAsynchronousForInlineScripts?: boolean;

Each browser has its own idiosyncrasies, and PageDelegate provides hooks for each one rather than littering conditionals throughout the codebase.

classDiagram
    class PageDelegate {
        <<interface>>
        +rawMouse: RawMouse
        +rawKeyboard: RawKeyboard
        +reload()
        +goBack()
        +navigateFrame()
        +takeScreenshot()
        +rafCountForStablePosition()
        +inputActionEpilogue()
    }
    class CRPage {
        -_client: CRSession
        +reload()
        +navigateFrame()
    }
    class FFPage {
        -_session: FFSession
        +reload()
        +navigateFrame()
    }
    class WKPage {
        -_session: WKSession
        +reload()
        +navigateFrame()
    }
    
    PageDelegate <|.. CRPage
    PageDelegate <|.. FFPage
    PageDelegate <|.. WKPage

Tip: If you're debugging a browser-specific issue, search for the relevant PageDelegate method in crPage.ts, ffPage.ts, or wkPage.ts to see how each browser handles the operation differently.

Chromium: CDP and Session Multiplexing

Chromium is controlled via the Chrome DevTools Protocol (CDP). The Chromium class at packages/playwright-core/src/server/chromium/chromium.ts#L54-L60 extends BrowserType and handles process launching and CDP connection setup.

CRBrowser at packages/playwright-core/src/server/chromium/crBrowser.ts#L43-L57 manages the CDP connection with session multiplexing:

export class CRBrowser extends Browser {
  readonly _connection: CRConnection;
  _session: CRSession;
  readonly _contexts = new Map<string, CRBrowserContext>();
  _crPages = new Map<string, CRPage>();
  _serviceWorkers = new Map<string, CRServiceWorker>();

Each browser tab gets its own CDP session, allowing concurrent communication with multiple pages. The CRConnection multiplexes these sessions over a single WebSocket transport.

sequenceDiagram
    participant PW as Playwright Server
    participant Conn as CRConnection
    participant Root as Root CDP Session
    participant S1 as Session (Tab 1)
    participant S2 as Session (Tab 2)

    PW->>Conn: Connect via WebSocket
    Conn->>Root: Target.getTargets()
    Root-->>Conn: [page1, page2]
    Conn->>S1: Target.attachToTarget(page1)
    Conn->>S2: Target.attachToTarget(page2)
    PW->>S1: Page.navigate(url)
    PW->>S2: Runtime.evaluate(expr)

The static CRBrowser.connect() method at packages/playwright-core/src/server/chromium/crBrowser.ts#L59-L69 creates the connection and attaches to existing targets. It's worth noting that Playwright wraps CDP rather than exposing it directly — this lets it normalize behavior across browsers and add features like auto-waiting that CDP doesn't provide natively.

BiDi: The Emerging Alternative

Playwright is gradually adding support for the WebDriver BiDi protocol, which aims to standardize browser automation across vendors (like CDP, but for all browsers). The BiDi implementation lives in packages/playwright-core/src/server/bidi/:

File Purpose
bidiChromium.ts BiDi-over-CDP for Chromium
bidiFirefox.ts Native BiDi for Firefox
bidiBrowser.ts Shared BiDi browser logic
bidiPage.ts BiDi page implementation
bidiConnection.ts BiDi WebSocket protocol
bidiOverCdp.ts BiDi tunneled through CDP

BiDi support is passed as a constructor parameter to the traditional browser types, as we saw earlier. This design lets users opt into BiDi per-launch rather than replacing the existing protocol wholesale.

The distinction between BidiChromium (BiDi tunneled through CDP) and BidiFirefox (native BiDi) reflects the current state of the ecosystem: Chrome implements BiDi on top of CDP, while Firefox implements it natively.

SdkObject, Instrumentation, and ProgressController

Every server-side object inherits from SdkObject, defined in packages/playwright-core/src/server/instrumentation.ts#L47-L71:

export class SdkObject extends EventEmitter {
  guid: string;
  attribution: Attribution;
  instrumentation: Instrumentation;

SdkObject provides three things: a globally unique ID (guid), an attribution chain (which playwright → browserType → browser → context → page → frame this object belongs to), and access to the Instrumentation interface.

The Instrumentation interface at packages/playwright-core/src/server/instrumentation.ts#L80-L93 defines lifecycle hooks that enable tracing, debugging, and the recorder:

export interface Instrumentation {
  onBeforeCall(sdkObject: SdkObject, metadata: CallMetadata): Promise<void>;
  onBeforeInputAction(sdkObject: SdkObject, metadata: CallMetadata): Promise<void>;
  onAfterCall(sdkObject: SdkObject, metadata: CallMetadata): Promise<void>;
  onPageOpen(page: Page): void;
  onPageClose(page: Page): void;
  onBrowserOpen(browser: Browser): void;
  onBrowserClose(browser: Browser): void;
  onDialog(dialog: Dialog): void;
  onDownload(page: Page, download: Download): void;
}

The implementation at packages/playwright-core/src/server/instrumentation.ts#L108-L128 uses a Proxy to dynamically dispatch to registered listeners, scoped by BrowserContext. This means the trace recorder only receives events from the context it's attached to.

ProgressController at packages/playwright-core/src/server/progress.ts#L26-L56 wraps every user-facing operation with timeout management. As we saw in the Dispatcher._runCommand() method in Article 2, every protocol command gets its own ProgressController:

export class ProgressController {
  readonly metadata: CallMetadata;
  private _forceAbortPromise = new ManualPromise<any>();
  
  async run<T>(task: (progress: Progress) => Promise<T>, timeout?: number): Promise<T> {
    const deadline = timeout ? monotonicTime() + timeout : 0;
    // ... timeout and abort management
  }
flowchart TD
    A["User calls page.click()"] --> B["Dispatcher._runCommand()"]
    B --> C["ProgressController.createForSdkObject()"]
    C --> D["controller.run(task, timeout)"]
    D --> E{"Timeout reached?"}
    E -->|No| F["Execute task"]
    E -->|Yes| G["Throw TimeoutError"]
    F --> H{"Abort signal?"}
    H -->|No| I["Return result"]
    H -->|Yes| J["Reject with error"]

Tip: When debugging timeout issues in Playwright, the Progress object passed to server-side methods has a log() function. These logs appear in the error's "Call log" section, showing exactly what the server was doing when the timeout hit.

Browser Registry and Custom Builds

The browser registry at packages/playwright-core/src/server/registry/index.ts#L1-L48 manages browser binary downloads and executable paths. It handles:

  • Resolving the correct executable for the current platform
  • Downloading browsers from CDN mirrors (with fallbacks)
  • Managing custom browser builds

Playwright maintains multiple CDN mirrors:

const PLAYWRIGHT_CDN_MIRRORS = [
  'https://cdn.playwright.dev/dbazure/download/playwright',
  'https://playwright.download.prss.microsoft.com/dbazure/download/playwright',
  'https://cdn.playwright.dev',
];

An important aspect of Playwright's approach is that Firefox and WebKit require custom-patched builds maintained in the browser_patches/ directory. Unlike Puppeteer, which only supports Chrome, Playwright patches Firefox and WebKit to expose the automation APIs it needs. This is why playwright install downloads specific browser versions — they're custom builds, not stock releases.

What's Next

Now that we understand how the server abstracts over three browsers, the next article goes inside the browser page itself. We'll explore the injected script architecture, the selector engine system, how Locators compose selector strings lazily, and the auto-waiting retry logic that makes Playwright's interactions reliable.