Read OSS

Architecture Overview and Navigating the curl Codebase

Intermediate

Prerequisites

  • Basic C programming (structs, pointers, function pointers)
  • Familiarity with HTTP and general networking concepts
  • Understanding of what curl does at the user level

Architecture Overview and Navigating the cURL Codebase

cURL is one of the most ubiquitous pieces of software ever written — installed on billions of devices, embedded in cars, gaming consoles, and virtually every Linux distribution on earth. Yet for all its reach, relatively few developers have actually read the source code. That's a shame, because curl's internals reveal a masterclass in C software architecture: clean abstraction boundaries, composable I/O layers, and a state machine that has been refined over 28 years of production use.

This article is the first in a six-part deep dive. We'll build the mental model you need to navigate the ~130 source files in lib/, understand the three architectural pillars, and orient yourself in a codebase that supports 28+ protocols while remaining remarkably coherent.

The Dual-Product Architecture: command-line tool vs Library

cURL ships two distinct products from a single repository. The first is the cURL command-line tool — the thing you invoke with curl https://example.com. The second is libcurl, a C library that any application can link against to transfer data programmatically. This distinction is the single most important architectural fact about the project.

The boundary between them is defined by the public API headers in include/curl/:

flowchart LR
    subgraph "CLI Tool (src/)"
        A[tool_main.c] --> B[tool_operate.c]
        B --> C[tool_getparam.c]
    end
    subgraph "Public API (include/curl/)"
        D[curl.h]
        E[easy.h]
        F[multi.h]
    end
    subgraph "libcurl (lib/)"
        G[easy.c]
        H[multi.c]
        I[url.c]
        J[transfer.c]
    end
    B --> E
    B --> F
    G --> H
    H --> I
    I --> J

The easy API in include/curl/easy.h declares the functions most developers know: curl_easy_init(), curl_easy_setopt(), curl_easy_perform(), and curl_easy_cleanup(). The multi API in include/curl/multi.h adds concurrent transfer support.

The CLI tool is a consumer of libcurl. It lives entirely in src/ and never touches internal library headers directly. Every single thing the curl command does — from parsing --header flags to writing output files — ultimately boils down to curl_easy_setopt() calls and a curl_easy_perform() or curl_multi_perform() invocation.

Tip: If you're contributing a new feature to curl, your first question should be: "Does this belong in libcurl (available to all consumers) or in the CLI tool (only for command-line users)?" Getting this wrong creates coupling that the maintainers will ask you to untangle.

Directory Structure and File Naming Conventions

The repository layout reflects the dual-product split cleanly:

Directory Purpose
src/ CLI tool implementation (~80 files, all prefixed tool_)
lib/ libcurl implementation (~130 core files)
lib/curlx/ Shared utility functions used by both tool and library
lib/vtls/ TLS backend abstraction (OpenSSL, wolfSSL, rustls, etc.)
lib/vquic/ QUIC backend abstraction
lib/vssh/ SSH backend abstraction
lib/vauth/ Authentication mechanism abstraction
include/curl/ Public API headers
tests/ Test suite (~1978 test cases)
docs/ Documentation including internals docs

The v* directories follow a consistent pattern: each contains a vtable-based abstraction layer that insulates the core library from backend-specific implementations. We'll explore the TLS vtable in detail in Part 4.

Naming Conventions

curl's naming conventions are documented in docs/INTERNALS.md and enforced consistently:

Prefix Visibility Example
curl_ Public API, exported curl_easy_perform()
Curl_ Internal, used across multiple .c files Curl_sendrecv()
curlx_ Utility functions in lib/curlx/ curlx_safefree()
static File-local, no prefix needed static easy_perform()

File prefixes follow their own system. In lib/, files prefixed with cf- implement connection filters (cf-socket.c, cf-dns.c, cf-ip-happy.c). In src/, every file starts with tool_ (tool_main.c, tool_operate.c, tool_getparam.c). The lib/curlx/ directory contains utility code (dynamic buffers, base64 encoding, string parsing) shared between the library and the CLI tool.

Core Data Structures: Curl_easy, connectdata, and Curl_multi

Three structs form the backbone of libcurl. Understanding their relationships is essential before reading any transfer-related code.

classDiagram
    class Curl_multi {
        +uint32_tbl xfers
        +cpool cpool
        +Curl_dnscache dnscache
        +Curl_ssl_scache *ssl_scache
        +Curl_tree *timetree
        +max_total_connections
        +max_host_connections
    }
    class Curl_easy {
        +uint32_t magic
        +curl_off_t id
        +CURLMstate mstate
        +connectdata *conn
        +Curl_multi *multi
        +SingleRequest req
        +UserDefined set
        +UrlState state
    }
    class connectdata {
        +curl_off_t connection_id
        +char *destination
        +Curl_cfilter *cfilter[2]
        +curl_socket_t sock[2]
        +Curl_scheme *scheme
        +ConnectBits bits
    }
    Curl_multi "1" --> "*" Curl_easy : manages
    Curl_easy "1" --> "0..1" connectdata : uses
    Curl_multi "1" --> "1" cpool : owns
    cpool "1" --> "*" connectdata : pools

Curl_easy — The Per-Transfer Handle

struct Curl_easy is what the public API calls a CURL * handle. It holds everything about a single transfer: configuration options (set), runtime state (state), request-specific data (req), progress info, cookies, and more. Crucially, it contains a mstate field of type CURLMstate — the transfer's current position in the state machine. Every easy handle is, at any given moment, in exactly one state.

The handle also holds a pointer to its current connectdata (conn) and to the Curl_multi it belongs to. A single easy handle can only be part of one multi at a time.

connectdata — The Reusable Connection

struct connectdata represents a network connection that can be reused across multiple transfers. It stores the hostname, socket pair, protocol handler reference (scheme), and — critically — two connection filter chains (cfilter[2]). The dual filter chains exist because protocols like FTP use separate control and data connections.

Connections outlive individual transfers. After a transfer completes, its connection goes back to the connection pool (cpool) in the multi handle, ready to be reused by the next transfer to the same destination.

Curl_multi — The Orchestrator

struct Curl_multi is the container that drives everything. It maintains sets of transfer IDs categorized by their processing state: process, dirty, pending, and msgsent. It owns the connection pool, the DNS cache, the TLS session cache, and the timer tree used for scheduling.

The multi handle also manages concurrency limits via max_host_connections and max_total_connections, and provides the wakeup mechanism (using eventfd/pipe/socketpair) that allows external threads to wake a sleeping curl_multi_poll().

The Three Pillars: A 10,000-Foot View

With the data structures in place, let's zoom out and look at the three architectural pillars that make libcurl work:

flowchart TD
    subgraph "Pillar 1: Multi State Machine"
        SM[CURLMstate enum<br/>16 states from INIT to COMPLETED]
        RS[multi_runsingle<br/>drives transitions]
        SM --> RS
    end
    subgraph "Pillar 2: Connection Filters"
        CF[Curl_cfilter chain<br/>Composable I/O stack]
        VT[Curl_cftype vtable<br/>connect, send, recv, close]
        CF --> VT
    end
    subgraph "Pillar 3: Protocol Handlers"
        PH[Curl_protocol vtable<br/>do_it, done, write_resp]
        SC[Curl_scheme<br/>Static metadata]
        PH --> SC
    end
    RS -->|"uses connections via"| CF
    RS -->|"invokes protocol via"| PH

Pillar 1: The Multi State Machine. Every transfer — whether using the simple curl_easy_perform() or the concurrent curl_multi_perform() — progresses through a state machine with 16 states defined by the CURLMstate enum. The central function multi_runsingle() in lib/multi.c contains a massive switch statement that handles each state, calling into connection setup, protocol handlers, and the transfer engine. This is the execution engine of libcurl — there is only one, and we'll dissect it in Part 2.

Pillar 2: Connection Filters. Introduced in curl v7.87.0, connection filters replaced a monolithic connection setup with a composable, stackable chain. Each filter — TCP socket, Happy Eyeballs, SOCKS proxy, TLS encryption, HTTP/2 multiplexing — implements the same Curl_cftype vtable with do_connect, do_send, do_recv, and other operations. The setup filter progressively builds the chain based on the connection's requirements. Part 3 covers this in depth.

Pillar 3: Protocol Handlers. curl supports 28+ protocols through a two-tier design. Curl_scheme provides static metadata (scheme name, default port, capability flags) while Curl_protocol provides the behavioral vtable with do_it, done, do_more, write_resp, and more. Part 5 explores how HTTP, FTP, and other protocols implement this interface.

How the Pieces Connect: A Transfer in 30 Seconds

Here's a simplified sequence of what happens when you call curl_easy_perform(handle):

sequenceDiagram
    participant App as Application
    participant Easy as curl_easy_perform
    participant Multi as Curl_multi
    participant SM as multi_runsingle
    participant CF as Connection Filters
    participant Proto as Protocol Handler

    App->>Easy: curl_easy_perform(handle)
    Easy->>Multi: create hidden multi handle
    Easy->>Multi: curl_multi_add_handle()
    loop until transfer complete
        Easy->>Multi: curl_multi_perform()
        Multi->>SM: multi_runsingle(data)
        SM->>CF: Curl_conn_connect() [CONNECTING]
        SM->>Proto: do_it() [DO state]
        SM->>Proto: Curl_sendrecv() [PERFORMING]
    end
    Easy->>Multi: curl_multi_remove_handle()
    Easy-->>App: CURLcode result

The easy API secretly creates a multi handle and runs a blocking loop. This design means there is exactly one execution engine in all of libcurl — the multi state machine. We'll prove this by reading the actual code in Part 2.

Orientation Guide: Where to Find Things

Here's a quick reference for locating common functionality:

What you're looking for Where to find it
Public API types and enums include/curl/curl.h
Easy API entry points lib/easy.c
Multi state machine lib/multi.c + lib/multihandle.h
URL parsing and connection setup lib/url.c
Data transfer loop lib/transfer.c
Connection filter framework lib/cfilters.h + lib/cfilters.c
Connection filter: TCP lib/cf-socket.c
Connection filter: Happy Eyeballs lib/cf-ip-happy.c
Connection filter: TLS lib/vtls/vtls.c
Connection filter: setup orchestrator lib/connect.c
Protocol handler interface lib/protocol.h + lib/protocol.c
HTTP protocol lib/http.c
FTP protocol lib/ftp.c
Connection pool lib/conncache.h + lib/conncache.c
DNS resolution lib/hostip.h, lib/asyn.h
CLI tool entry point src/tool_main.c
CLI option parsing src/tool_getparam.c

Tip: When navigating the source, start from the state machine in lib/multi.c. Every interesting code path in libcurl is reachable from the multi_runsingle() switch statement.

What's Next

In Part 2, we'll crack open the multi state machine and trace a transfer's complete lifecycle through all 16 states — from MSTATE_INIT through DNS resolution, TCP connect, TLS handshake, protocol negotiation, data transfer, and finally MSTATE_COMPLETED. We'll start with the surprising revelation that curl_easy_perform() is just a thin wrapper around the multi API, proving that every path through libcurl converges on a single execution engine.