Architecture Overview and Navigating the curl Codebase
Prerequisites
- ›Basic C programming (structs, pointers, function pointers)
- ›Familiarity with HTTP and general networking concepts
- ›Understanding of what curl does at the user level
Architecture Overview and Navigating the cURL Codebase
cURL is one of the most ubiquitous pieces of software ever written — installed on billions of devices, embedded in cars, gaming consoles, and virtually every Linux distribution on earth. Yet for all its reach, relatively few developers have actually read the source code. That's a shame, because curl's internals reveal a masterclass in C software architecture: clean abstraction boundaries, composable I/O layers, and a state machine that has been refined over 28 years of production use.
This article is the first in a six-part deep dive. We'll build the mental model you need to navigate the ~130 source files in lib/, understand the three architectural pillars, and orient yourself in a codebase that supports 28+ protocols while remaining remarkably coherent.
The Dual-Product Architecture: command-line tool vs Library
cURL ships two distinct products from a single repository. The first is the cURL command-line tool — the thing you invoke with curl https://example.com. The second is libcurl, a C library that any application can link against to transfer data programmatically. This distinction is the single most important architectural fact about the project.
The boundary between them is defined by the public API headers in include/curl/:
flowchart LR
subgraph "CLI Tool (src/)"
A[tool_main.c] --> B[tool_operate.c]
B --> C[tool_getparam.c]
end
subgraph "Public API (include/curl/)"
D[curl.h]
E[easy.h]
F[multi.h]
end
subgraph "libcurl (lib/)"
G[easy.c]
H[multi.c]
I[url.c]
J[transfer.c]
end
B --> E
B --> F
G --> H
H --> I
I --> J
The easy API in include/curl/easy.h declares the functions most developers know: curl_easy_init(), curl_easy_setopt(), curl_easy_perform(), and curl_easy_cleanup(). The multi API in include/curl/multi.h adds concurrent transfer support.
The CLI tool is a consumer of libcurl. It lives entirely in src/ and never touches internal library headers directly. Every single thing the curl command does — from parsing --header flags to writing output files — ultimately boils down to curl_easy_setopt() calls and a curl_easy_perform() or curl_multi_perform() invocation.
Tip: If you're contributing a new feature to curl, your first question should be: "Does this belong in libcurl (available to all consumers) or in the CLI tool (only for command-line users)?" Getting this wrong creates coupling that the maintainers will ask you to untangle.
Directory Structure and File Naming Conventions
The repository layout reflects the dual-product split cleanly:
| Directory | Purpose |
|---|---|
src/ |
CLI tool implementation (~80 files, all prefixed tool_) |
lib/ |
libcurl implementation (~130 core files) |
lib/curlx/ |
Shared utility functions used by both tool and library |
lib/vtls/ |
TLS backend abstraction (OpenSSL, wolfSSL, rustls, etc.) |
lib/vquic/ |
QUIC backend abstraction |
lib/vssh/ |
SSH backend abstraction |
lib/vauth/ |
Authentication mechanism abstraction |
include/curl/ |
Public API headers |
tests/ |
Test suite (~1978 test cases) |
docs/ |
Documentation including internals docs |
The v* directories follow a consistent pattern: each contains a vtable-based abstraction layer that insulates the core library from backend-specific implementations. We'll explore the TLS vtable in detail in Part 4.
Naming Conventions
curl's naming conventions are documented in docs/INTERNALS.md and enforced consistently:
| Prefix | Visibility | Example |
|---|---|---|
curl_ |
Public API, exported | curl_easy_perform() |
Curl_ |
Internal, used across multiple .c files |
Curl_sendrecv() |
curlx_ |
Utility functions in lib/curlx/ |
curlx_safefree() |
static |
File-local, no prefix needed | static easy_perform() |
File prefixes follow their own system. In lib/, files prefixed with cf- implement connection filters (cf-socket.c, cf-dns.c, cf-ip-happy.c). In src/, every file starts with tool_ (tool_main.c, tool_operate.c, tool_getparam.c). The lib/curlx/ directory contains utility code (dynamic buffers, base64 encoding, string parsing) shared between the library and the CLI tool.
Core Data Structures: Curl_easy, connectdata, and Curl_multi
Three structs form the backbone of libcurl. Understanding their relationships is essential before reading any transfer-related code.
classDiagram
class Curl_multi {
+uint32_tbl xfers
+cpool cpool
+Curl_dnscache dnscache
+Curl_ssl_scache *ssl_scache
+Curl_tree *timetree
+max_total_connections
+max_host_connections
}
class Curl_easy {
+uint32_t magic
+curl_off_t id
+CURLMstate mstate
+connectdata *conn
+Curl_multi *multi
+SingleRequest req
+UserDefined set
+UrlState state
}
class connectdata {
+curl_off_t connection_id
+char *destination
+Curl_cfilter *cfilter[2]
+curl_socket_t sock[2]
+Curl_scheme *scheme
+ConnectBits bits
}
Curl_multi "1" --> "*" Curl_easy : manages
Curl_easy "1" --> "0..1" connectdata : uses
Curl_multi "1" --> "1" cpool : owns
cpool "1" --> "*" connectdata : pools
Curl_easy — The Per-Transfer Handle
struct Curl_easy is what the public API calls a CURL * handle. It holds everything about a single transfer: configuration options (set), runtime state (state), request-specific data (req), progress info, cookies, and more. Crucially, it contains a mstate field of type CURLMstate — the transfer's current position in the state machine. Every easy handle is, at any given moment, in exactly one state.
The handle also holds a pointer to its current connectdata (conn) and to the Curl_multi it belongs to. A single easy handle can only be part of one multi at a time.
connectdata — The Reusable Connection
struct connectdata represents a network connection that can be reused across multiple transfers. It stores the hostname, socket pair, protocol handler reference (scheme), and — critically — two connection filter chains (cfilter[2]). The dual filter chains exist because protocols like FTP use separate control and data connections.
Connections outlive individual transfers. After a transfer completes, its connection goes back to the connection pool (cpool) in the multi handle, ready to be reused by the next transfer to the same destination.
Curl_multi — The Orchestrator
struct Curl_multi is the container that drives everything. It maintains sets of transfer IDs categorized by their processing state: process, dirty, pending, and msgsent. It owns the connection pool, the DNS cache, the TLS session cache, and the timer tree used for scheduling.
The multi handle also manages concurrency limits via max_host_connections and max_total_connections, and provides the wakeup mechanism (using eventfd/pipe/socketpair) that allows external threads to wake a sleeping curl_multi_poll().
The Three Pillars: A 10,000-Foot View
With the data structures in place, let's zoom out and look at the three architectural pillars that make libcurl work:
flowchart TD
subgraph "Pillar 1: Multi State Machine"
SM[CURLMstate enum<br/>16 states from INIT to COMPLETED]
RS[multi_runsingle<br/>drives transitions]
SM --> RS
end
subgraph "Pillar 2: Connection Filters"
CF[Curl_cfilter chain<br/>Composable I/O stack]
VT[Curl_cftype vtable<br/>connect, send, recv, close]
CF --> VT
end
subgraph "Pillar 3: Protocol Handlers"
PH[Curl_protocol vtable<br/>do_it, done, write_resp]
SC[Curl_scheme<br/>Static metadata]
PH --> SC
end
RS -->|"uses connections via"| CF
RS -->|"invokes protocol via"| PH
Pillar 1: The Multi State Machine. Every transfer — whether using the simple curl_easy_perform() or the concurrent curl_multi_perform() — progresses through a state machine with 16 states defined by the CURLMstate enum. The central function multi_runsingle() in lib/multi.c contains a massive switch statement that handles each state, calling into connection setup, protocol handlers, and the transfer engine. This is the execution engine of libcurl — there is only one, and we'll dissect it in Part 2.
Pillar 2: Connection Filters. Introduced in curl v7.87.0, connection filters replaced a monolithic connection setup with a composable, stackable chain. Each filter — TCP socket, Happy Eyeballs, SOCKS proxy, TLS encryption, HTTP/2 multiplexing — implements the same Curl_cftype vtable with do_connect, do_send, do_recv, and other operations. The setup filter progressively builds the chain based on the connection's requirements. Part 3 covers this in depth.
Pillar 3: Protocol Handlers. curl supports 28+ protocols through a two-tier design. Curl_scheme provides static metadata (scheme name, default port, capability flags) while Curl_protocol provides the behavioral vtable with do_it, done, do_more, write_resp, and more. Part 5 explores how HTTP, FTP, and other protocols implement this interface.
How the Pieces Connect: A Transfer in 30 Seconds
Here's a simplified sequence of what happens when you call curl_easy_perform(handle):
sequenceDiagram
participant App as Application
participant Easy as curl_easy_perform
participant Multi as Curl_multi
participant SM as multi_runsingle
participant CF as Connection Filters
participant Proto as Protocol Handler
App->>Easy: curl_easy_perform(handle)
Easy->>Multi: create hidden multi handle
Easy->>Multi: curl_multi_add_handle()
loop until transfer complete
Easy->>Multi: curl_multi_perform()
Multi->>SM: multi_runsingle(data)
SM->>CF: Curl_conn_connect() [CONNECTING]
SM->>Proto: do_it() [DO state]
SM->>Proto: Curl_sendrecv() [PERFORMING]
end
Easy->>Multi: curl_multi_remove_handle()
Easy-->>App: CURLcode result
The easy API secretly creates a multi handle and runs a blocking loop. This design means there is exactly one execution engine in all of libcurl — the multi state machine. We'll prove this by reading the actual code in Part 2.
Orientation Guide: Where to Find Things
Here's a quick reference for locating common functionality:
| What you're looking for | Where to find it |
|---|---|
| Public API types and enums | include/curl/curl.h |
| Easy API entry points | lib/easy.c |
| Multi state machine | lib/multi.c + lib/multihandle.h |
| URL parsing and connection setup | lib/url.c |
| Data transfer loop | lib/transfer.c |
| Connection filter framework | lib/cfilters.h + lib/cfilters.c |
| Connection filter: TCP | lib/cf-socket.c |
| Connection filter: Happy Eyeballs | lib/cf-ip-happy.c |
| Connection filter: TLS | lib/vtls/vtls.c |
| Connection filter: setup orchestrator | lib/connect.c |
| Protocol handler interface | lib/protocol.h + lib/protocol.c |
| HTTP protocol | lib/http.c |
| FTP protocol | lib/ftp.c |
| Connection pool | lib/conncache.h + lib/conncache.c |
| DNS resolution | lib/hostip.h, lib/asyn.h |
| CLI tool entry point | src/tool_main.c |
| CLI option parsing | src/tool_getparam.c |
Tip: When navigating the source, start from the state machine in
lib/multi.c. Every interesting code path in libcurl is reachable from themulti_runsingle()switch statement.
What's Next
In Part 2, we'll crack open the multi state machine and trace a transfer's complete lifecycle through all 16 states — from MSTATE_INIT through DNS resolution, TCP connect, TLS handshake, protocol negotiation, data transfer, and finally MSTATE_COMPLETED. We'll start with the surprising revelation that curl_easy_perform() is just a thin wrapper around the multi API, proving that every path through libcurl converges on a single execution engine.