Software → all posts

hpke-ng: Faster, Smaller, Harder HPKE for Rust

· 21 min read · #HPKE #Rust #Post-Quantum

Today we’re releasing hpke-ng, a clean-slate Rust implementation of HPKE (RFC 9180). It is published under Apache-2.0 OR MIT, and you can install it now with cargo add hpke-ng.

Across 62 head-to-head benchmarks against hpke-rs — currently the most widely deployed Rust HPKE library — hpke-ng wins 43, ties 14, and loses 5, with the wins concentrating on the post-quantum decap path (53–55% on ML-KEM-768 and ML-KEM-1024, the largest deltas in the entire dataset, plus a fresh 38% win on X-Wing decap from the same caching trick), the post-quantum encap path (30–37% on ML-KEM, 14% on X-Wing), the classical KEM decap path (41% on X25519 — caching the recipient’s public-key bytes alongside the secret eliminates a redundant base-point scalar multiplication on every call), the single-shot open path (23–35% across payload sizes), and the post-quantum and ChaCha20 setup paths (45–50% on ML-KEM receivers, 31% on X25519 receivers). Every PQ encap and decap row is a clean win.

cargo bench · comparative
62 head-to-head benchmarks
43
hpke-ng wins
14
tied
5
hpke-rs wins
hpke-ng faster (>3% delta) within criterion noise band (±3%) hpke-rs faster

Both libraries call the same RustCrypto primitive crates underneath, so the cryptography itself is identical. Both pass the full RFC 9180 known-answer-test set. Both produce byte-identical wire output to each other on every ciphersuite we differentially tested. The wins are gains in framing, monomorphization, allocation behaviour, and dispatch — not in the underlying crypto math. That is the point: the math is a solved problem, and the surrounding library is where the engineering still has slack. The 32 ties tell you something too — for the AEAD-bound rows and for KEM operations where both libraries converge to the speed of the underlying primitive (single-key generation, ML-KEM IKM expansion, X-Wing key derivation), hpke-ng matches that ceiling without overhead.

Why we built another HPKE library

Earlier this year we found and reported two security bugs in hpke-rs. The first was a missing RFC 9180 §7.1.4 zero-shared-secret check: a low-order or identity public key forces the X25519 shared secret to all zeros, after which the rest of the key schedule becomes deterministic and predictable to anyone who knows the static recipient key. The fix is a single comparison against zero, which RFC 9180 explicitly requires; it was missing. The second was a u32 sequence-counter that silently wrapped in release builds, reusing nonces after 2³² messages. Nonce reuse in AEAD is catastrophic — for AES-GCM it leaks the authentication key; for ChaCha20-Poly1305 it leaks plaintext via XOR — and the wrap was happening below the type system, in a release build, with no diagnostic. Both are now fixed. We documented the broader pattern of bugs we kept finding in libraries marketed under “high assurance” branding in February.

That experience — together with the day-to-day frictions of integrating HPKE through a library that wasn’t built to make those bug classes structurally impossible — is what got us thinking about a rewrite. Three frictions in particular kept showing up.

The provider abstraction. hpke-rs is structured as a generic library over an HpkeCrypto trait, with two backend implementations — RustCrypto and libcrux — shipped as separate crates. The abstraction is real engineering and serves a real purpose: it lets a deployment swap the underlying crypto stack without touching call sites. The cost is that every primitive call goes through a trait dispatch, every Hpke value carries a 320-byte instance struct (most of it a 256-byte ChaCha20 PRNG state), and the workspace is four crates instead of one.

The struct-owned PRNG. Hpke<Crypto>::new constructs and stores a PRNG. That PRNG is reused across operations, which is fine functionally but creates a subtle aliasing hazard: cloning an Hpke does not clone the PRNG state — per the rustdoc — so a careless clone can reset randomness in ways that aren’t visible at the call site. This is the kind of footgun whose damage is invisible until the day it isn’t.

Option<&[u8]> for required-by-mode arguments. hpke.seal(&pk, info, aad, pt, None, None, None) is the canonical Base-mode call. The three Nones are the PSK, PSK ID, and sender static key — all required in the Auth or AuthPsk modes. The single seal method accepts every mode by making mode-specific arguments optional, which means the type system can’t tell you that you’ve built a Base-mode call with a PSK supplied.

These are not catastrophic problems. They’re the kind of small persistent costs you stop noticing until you build the alternative.

The shape change: enum dispatch becomes type-state

The single biggest design difference between hpke-ng and hpke-rs is what carries the ciphersuite. In hpke-rs, the ciphersuite is four runtime enums (Mode, KemAlgorithm, KdfAlgorithm, AeadAlgorithm) constructed at Hpke::new time. In hpke-ng, the ciphersuite is the type itself — Hpke<DhKemX25519HkdfSha256, HkdfSha256, ChaCha20Poly1305> — and the struct body is PhantomData<(K, F, A)>. Zero bytes at runtime; everything resolved at the call site by the compiler.

canonical seal · "encrypt one message"
hpke-rs7 args · 3× Option<&[u8]>
let mut hpke = Hpke::<HpkeRustCrypto>::new(
    Mode::Base,
    KemAlgorithm::DhKem25519,
    KdfAlgorithm::HkdfSha256,
    AeadAlgorithm::ChaCha20Poly1305,
);
let kp = hpke.generate_key_pair()?;
let (sk, pk) = kp.into_keys();
let (enc, ct) = hpke.seal(
    &pk, info, aad, pt,
    None, None, None, // psk, psk_id, sk_s
)?;
hpke-ng5 args · zero placeholders
type Suite = Hpke<
    DhKemX25519HkdfSha256,
    HkdfSha256,
    ChaCha20Poly1305,
>;
let mut os = OsRng;
let mut rng = os.unwrap_mut();
let (sk, pk) = DhKemX25519HkdfSha256::generate(&mut rng)?;
let (enc, ct) = Suite::seal_base(
    &mut rng, &pk, info, aad, pt,
)?;

The visible consequence is the call site. The invisible consequence is what the compiler can rule out:

Operationhpke-rshpke-ng
Hpke::<XWingDraft06, _, _>::seal_auth(...)
runtime Error::UnsupportedKemOperation
compile trait bound K: AuthKem not satisfied
Hpke::<_, _, ExportOnly>::seal_base(...)
runtime HpkeError::InvalidConfig
compile no seal_base on ExportOnly
Wrong KemAlgorithm for a private key
runtime mismatch error at setup_*
compile key types are KEM-tagged
Base-mode call with Some(psk) argument
runtime HpkeError::UnnecessaryPsk
compile seal_base has no PSK parameter

Each row is a runtime error in hpke-rs and a compiler diagnostic in hpke-ng. The X-Wing/seal_auth line is the cleanest example: X-Wing is a KEM, but it isn’t a Diffie-Hellman KEM, so it has no notion of authenticated encapsulation. In hpke-rs, calling seal_auth on an X-Wing-configured Hpke returns Error::UnsupportedKemOperation at runtime. In hpke-ng, the trait bound on seal_auth requires K: AuthKem, and XWingDraft06 does not implement AuthKem — so the call doesn’t compile. The same shape applies to the other rows.

This isn’t theoretical. We’ve seen each of those four shapes in production code reviews — typically as a stale match arm catching the runtime error and turning it into a generic 500. Surfacing them as compile errors deletes a class of code path entirely.

Feature parity

Before going further: the obvious skeptical question is what got cut? The answer is one thing, deliberately, and it’s the thing that buys the wins everywhere else.

hpke-rshpke-ng
DH KEMs (X25519, X448, P-256, P-384, P-521, secp256k1)✓ all 6✓ all 6
Post-quantum KEMs (X-Wing draft-06, ML-KEM-768, ML-KEM-1024)✓ all 3 (experimental, upstream-flagged unstable)✓ all 3 (pq feature)
KDFs (HKDF-SHA-256 / -384 / -512)✓ all 3✓ all 3
AEADs (AES-128-GCM, AES-256-GCM, ChaCha20-Poly1305, ExportOnly)✓ all 4✓ all 4
Modes (Base, Psk, Auth, AuthPsk)✓ all 4✓ all 4
RFC 9180 KAT conformance (full vendored vector set)✓ pass✓ pass
Typed HpkeError variants1114
Compile-time-rejected operation classes04
Pluggable crypto provider (RustCrypto / libcrux backends)✓ both— RustCrypto only

hpke-ng supports every ciphersuite hpke-rs’s RustCrypto provider supports — six DH-based KEMs, three post-quantum KEMs, three KDFs, four AEADs (including ExportOnly), all four HPKE modes — and passes the full RFC 9180 KAT vector set against both libraries. There are 14 typed HpkeError variants instead of 11, and four classes of operation that are compile errors instead of runtime errors. On every dimension that affects what your application can express, hpke-ng is a superset.

The one thing hpke-rs has that hpke-ng deliberately doesn’t: a pluggable crypto provider. hpke-rs ships an HpkeCrypto trait with two backend implementations, RustCrypto and libcrux; a deployment can swap one for the other at compile time. hpke-ng ships one provider, RustCrypto, and removes the abstraction. That’s the load-bearing tradeoff. It’s why Hpke<...> is zero-sized, why Context::seal is monomorphized rather than dispatched, why the workspace is one crate, and why the canonical call site is five arguments instead of seven.

The libcrux half of that tradeoff is, in our view, not a feature worth chasing. Our February audit of Cryspen’s “formally verified” libraries found undisclosed silent cryptographic failures in libcrux-ml-dsa (platform-dependent SHA-3 corruption that was patched without a public advisory), entropy-reducing pre-hash clamping in Ed25519, and a denial-of-service panic in libcrux-psq’s AES-GCM decryption path. Cryspen’s public response acknowledged the findings without retracting the “highest level of assurance” marketing language attached to the library; our follow-up analysis and an independent paper have since surfaced further specification deviations in libcrux-ml-dsa. The “formally verified” framing that draws users to libcrux does not, on the evidence we have assembled, describe the engineering you are getting. If you’re using RustCrypto anyway — which is what hpke-ng does by default and what most hpke-rs deployments do in practice — the provider abstraction is paying rent for a backend you would be wise to avoid.

Speed

The full benchmark protocol is in cargo bench --features comparative in the hpke-ng repository: criterion harness, sample size 40–60, 2–3 second measurement window, RUSTFLAGS="-C target-cpu=native", lto = "thin", codegen-units = 1. Apple Silicon M-series, macOS. The numbers below are the median across two independent bench runs. The post-quantum rows enable hpke-rs-rust-crypto’s experimental feature flag — without it, hpke-rs’s PQ KEMs return UnsupportedKemOperation at runtime; the upstream comment on the gate is “broken and pre-releases. Disabling them until they are stable.”

The wins concentrate in four places: the post-quantum KEM path (the largest deltas in the entire dataset, peaking at −55% on ML-KEM decap), the classical KEM encap/decap path, the single-shot open path, and the ChaCha20 setup paths. These are the rows where hpke-rs’s per-call overhead — trait dispatch, allocator pressure, an enum match per primitive, a from-seed key reconstruction on every PQ decap — adds measurable framing cost on top of the underlying crypto. Start with the KEM operations on X25519:

KEM operations · X25519

hpke-rshpke-ng
generate
7.69 µs
8.94 µs
+16%
derive_key
8.79 µs
9.39 µs
+7%
encap
38.00 µs
34.33 µs
−10%
decap
36.73 µs
21.47 µs
−41%
Lower is faster. Bars normalized per-row to max(rs, ng).

The decap row tells the load-bearing story: hpke-ng caches the recipient’s serialized public-key bytes alongside the secret, so decap no longer pays a base-point scalar multiplication just to rebuild the recipient-PK piece of kem_context on every call. hpke-rs runs that scalar mult per decap. Encap is 10% faster from the same monomorphization trick — the LabeledExtract + LabeledExpand chain wrapping the raw Diffie-Hellman compiles to direct calls, where hpke-rs routes each through its enum-dispatched provider trait. The generate and derive_key_pair rows go the other way (+16% and +7%) because the public-key bytes are now computed and stored at construction time — a one-time cost paid for the per-call decap savings, on a one-call-per-keypair operation that doesn’t appear on any hot path. The same scalar-mult-saving trick lands much harder on the post-quantum decap rows we’ll see shortly.

Setup is the combined fixed cost paid every time a sender or receiver context is constructed: KEM op + key schedule + Context allocation. The wins here are consistent over ChaCha20:

Setup paths · sender / receiver / PSK

hpke-rshpke-ng
X25519+ChaCha20
sender (Base)
38.62 µs
34.17 µs
−12%
X25519+ChaCha20
receiver (Base)
36.84 µs
25.45 µs
−31%
X25519+ChaCha20
sender (PSK)
38.14 µs
33.86 µs
−11%
X25519+AES-128
sender (Base)
40.29 µs
34.35 µs
−15%
P-256+AES-128
sender (Base)
156.57 µs
145.59 µs
−7%
K256+ChaCha20
sender (Base)
54.32 µs
47.67 µs
−12%
Receiver-side wins are larger than sender-side because both `decap` and the post-decap key schedule benefit from the cached public-key bytes.

The post-quantum KEMs add a structural dimension to the comparison that the classical KEMs don’t have. hpke-ng stores MlKem* private keys as the 64-byte (d, z) seed plus the materialized FIPS 203 expanded decapsulation key — built once at construction, kept in the PrivateKey wrapper, reused on every decap. hpke-rs stores only the seed and rebuilds the expanded key on every setup_receiver call by re-running FIPS 203 KeyGen_internal. That single design choice is the load-bearing reason ML-KEM decap shows up at −54% to −55% — the largest single deltas in the dataset. ML-KEM-768 first:

KEM operations · ML-KEM-768

hpke-rshpke-ng
generate
17.18 µs
17.62 µs
+3%
derive_key
14.49 µs
14.17 µs
−2%
encap
23.54 µs
14.84 µs
−37%
decap
38.86 µs
17.53 µs
−55%
decap (−55%) is the largest delta in the suite, encap (−37%) close behind — hpke-ng caches the expanded decapsulation key in the private key and the parsed encapsulation key in the public key; hpke-rs rebuilds both from raw bytes on every call.

ML-KEM-1024 — the higher-security parameter set — shows the same shape, larger absolute numbers, same architectural delta in the same place:

KEM operations · ML-KEM-1024

hpke-rshpke-ng
generate
27.72 µs
28.00 µs
+1%
derive_key
23.19 µs
24.06 µs
+4%
encap
33.34 µs
23.46 µs
−30%
decap
58.27 µs
27.24 µs
−53%
Same pattern as ML-KEM-768 at higher security level: encap −30%, decap −53%. The +4% on `derive_key` is the cost of cloning the larger ML-KEM-1024 expanded encapsulation key into the public-key wrapper at construction.

X-Wing — the X25519 + ML-KEM-768 hybrid — picks up the same DecapsulationKey-caching trick we applied to ML-KEM, plus the parsed EncapsulationKey cache on the encap side. The decap delta lands close to the ML-KEM rows:

KEM operations · X-Wing draft-06

hpke-rshpke-ng
generate
38.94 µs
38.52 µs
−1%
derive_key
35.81 µs
37.60 µs
+5%
encap
64.94 µs
56.06 µs
−14%
decap
115.56 µs
72.25 µs
−38%
decap −38%: the construction-side `expand_key` that `DecapsulationKey::from(seed)` runs (SHAKE-256 + ML-KEM-768 keygen) is now amortized to once per private key. The +5% on `derive_key` is the cost of cloning the parsed `EncapsulationKey` into the public-key wrapper at construction.

The same pattern propagates to setup paths. setup_sender is dominated by encap; setup_receiver is dominated by decap; the ML-KEM rows widen accordingly:

Setup paths · post-quantum (HKDF-SHA-256 + ChaCha20-Poly1305)

hpke-rshpke-ng
X-Wing
sender (Base)
65.30 µs
60.14 µs
−8%
X-Wing
receiver (Base)
116.79 µs
77.45 µs
−34%
ML-KEM-768
sender (Base)
23.77 µs
18.91 µs
−20%
ML-KEM-768
receiver (Base)
39.80 µs
21.86 µs
−45%
ML-KEM-1024
sender (Base)
33.82 µs
27.85 µs
−18%
ML-KEM-1024
receiver (Base)
62.17 µs
31.33 µs
−50%
18 post-quantum head-to-heads across KEM ops + setup: 12 wins, 4 ties, 2 losses (both on `derive_key_pair`, where caching the expanded keys at construction is paid back on every subsequent encap/decap).

The single-shot open path — setup_receiver + Context::open for one message — is where hpke-ng wins most consistently across payload size. Six rows over four orders of magnitude, every row is hpke-ng:

Single-shot open · X25519 + HKDF-SHA-256 + ChaCha20-Poly1305

hpke-rshpke-ng
64 B
40.30 µs
26.07 µs
−35%
256 B
38.67 µs
26.32 µs
−32%
1 KiB
39.23 µs
27.77 µs
−29%
4 KiB
44.96 µs
31.88 µs
−29%
16 KiB
65.58 µs
50.61 µs
−23%
64 KiB
137.07 µs
127.55 µs
−7%
Lower is faster. Open inherits the full setup-receiver win — including the cached `pk_bytes` shaving a scalar mult off every `decap` — so the small-payload regime where setup dominates picks up the largest deltas.

The AES-128-GCM seal sweep shows hpke-ng 6–12% ahead from 64 B through 16 KiB — the cached AES cipher state (round keys plus the GHash precomputed table, built once at key-schedule time) eliminates the per-call expansion that hpke-rs runs on every seal — then converges to tied at 64 KiB as the AEAD primitive’s bulk-encryption cost dominates:

Single-shot seal · X25519 + HKDF-SHA-256 + AES-128-GCM

hpke-rshpke-ng
64 B
39.73 µs
35.38 µs
−11%
256 B
41.03 µs
36.33 µs
−12%
1 KiB
42.94 µs
38.56 µs
−10%
4 KiB
53.94 µs
49.64 µs
−8%
16 KiB
99.41 µs
93.05 µs
−6%
64 KiB
274.23 µs
280.15 µs
+2%
At 64 KiB the AES-NI bulk encryption cost dominates and the framing delta vanishes.

The X25519+ChaCha20 seal sweep — the same shape but with a different AEAD — now lands 7–18% in hpke-ng’s favour from 16 B through 64 KiB, then converges to tied at 256 KiB where the AEAD primitive both libraries call identically dominates the wall time. The most diagnostic single benchmark in the suite is post-setup Context::seal at the two ends of that spectrum:

Context::seal · 64 B
framing-dominant regime
260 nshpke-rs
221 nshpke-ng
−15% · framing only, same crypto
Context::seal · 16 KiB
primitive-dominant regime
24.60 µshpke-rs
24.46 µshpke-ng
tied · at the primitive's ceiling

This is the chart we kept coming back to during development. At 64 bytes, where framing dominates, hpke-ng is 15% faster: 221 ns versus 260 ns. The framing path inside hpke-ng’s Context::seal is a fixed-size 12-byte stack array for the nonce, an XOR loop, and a direct AEAD call — no allocations, and the AEAD cipher state is built once at key-schedule time and reused. hpke-rs allocates a fresh Vec<u8> per nonce computation and reconstructs cipher state from raw key bytes on every call.

At 16 KiB, where the AEAD primitive dominates, both libraries converge to identical wall time. They share the same primitive crate, so this is the ceiling: the wall-clock cost of ChaCha20Poly1305::encrypt_in_place_detached on this hardware, which neither library can dip below because both are calling the same code. hpke-ng matches it exactly — there’s no overhead left to remove, and the framing is as thin as the standard allows.

Memory

Memory has two halves: the configuration and per-context state — where hpke-ng is uniformly smaller — and the per-key footprint, where the post-quantum KEMs introduce a deliberate tradeoff in the other direction.

sizeof(Hpke<K, F, A>)−344 B (zero-sized)
hpke-rs
344 B
hpke-ng
PhantomData · 0 B
sizeof(Context<_, _, ChaCha20Poly1305>)−312 B (−78%)
hpke-rs
400 B
hpke-ng
88 B
sizeof(Context<_, _, Aes128Gcm>)+392 B (cached AES round keys + GHash)
hpke-rs
400 B
hpke-ng
792 B
sizeof(Context<_, _, Aes256Gcm>)+648 B (cached AES round keys + GHash)
hpke-rs
400 B
hpke-ng
1,048 B

hpke-ng::Hpke<K, F, A> is PhantomData<(K, F, A)>. There is no runtime presence; it costs zero bytes and cargo expand confirms the compiler optimizes it out completely. hpke-rs::Hpke<Crypto> carries a 256-byte ChaCha20 PRNG plus four enum discriminants and padding (344 bytes measured at the time of writing).

Context is where hpke-ng’s per-AEAD specialization shows up. With ChaCha20Poly1305 the cipher state is just the 32-byte key, so Context is 88 bytes — under a quarter of hpke-rs’s 400, since hpke-rs’s Context carries a per-instance PRNG plus trait-object overhead that hpke-ng’s monomorphized design doesn’t need. With AES-GCM the trade goes the other way: hpke-ng caches the expanded round keys plus the precomputed GHash table inline so that Context::seal doesn’t pay key-schedule expansion on every call. That’s the load-bearing reason AES-128 single-shot seal is 6–12% faster across small and mid payloads. Streaming AES applications get the throughput; ChaCha20 deployments stay at the small footprint.

Practical impact: an application that holds a thousand long-lived ChaCha20 contexts — a server with persistent client sessions, a relay, MLS group state — saves ~310 KB of resident memory over hpke-rs. An AES-128-GCM deployment with the same shape pays ~390 KB of additional context state for the per-call seal-side speedup. Whether that’s a good trade is application-specific, and it’s now an explicit choice the type system makes visible.

The post-quantum and PK-cache tradeoff: hpke-ng private keys are larger across the board

Per-key footprint is where the speed/memory trade lands explicitly. Every KEM private key in hpke-ng now caches material that hpke-rs reconstructs from raw bytes on demand: the recipient’s serialized public key for the DH-KEMs (so decap doesn’t recompute it via base-point scalar multiplication), the expanded x_wing::DecapsulationKey for X-Wing (so decap doesn’t re-run SHAKE-256 + ML-KEM-768 keygen), and the materialized FIPS 203 decapsulation key for ML-KEM (same trick at the parameter-set boundary). The size impact is uniform and easy to reason about:

KEM private key footprint · stack + heap, in bytes

hpke-rshpke-ng
X25519
56 B
88 B
+32 B
P-256
56 B
121 B
+65 B
X-Wing
56 B
1,698 B
+1,642 B
ML-KEM-768
88 B
3,266 B
+3,178 B
ML-KEM-1024
88 B
4,290 B
+4,202 B
Every row spends extra memory at private-key construction in exchange for not paying the same work on every subsequent `decap`. That is the explicit memory cost of the −38% to −55% decap deltas.

So the trade is concrete: an extra 32–65 B per DH private key, ~1.7 KB extra per X-Wing private key, ~3.2 KB extra per ML-KEM-768 private key, and ~4.2 KB extra per ML-KEM-1024 private key. In exchange the recipient skips a base-point scalar mult per DH decap, a SHAKE-256 + ML-KEM-768 keygen per X-Wing decap, and a FIPS 203 KeyGen_internal per ML-KEM decap. For a server pinning a few thousand long-lived receiver keys this is good arithmetic in essentially every setting we can think of, a possibly-bad trade on a microcontroller pinning very few keys, and we’d rather you have the numbers than guess.

Public keys grow in the same direction on the post-quantum side — hpke-ng now caches the parsed EncapsulationKey alongside the wire bytes so encap doesn’t re-decode the 1,184/1,568-byte payload on every call. X-Wing public keys are 1,656 B, ML-KEM-768 public keys are 1,624 B, ML-KEM-1024 public keys are 2,136 B (vs hpke-rs’s Vec<u8> of just the wire bytes). DH public keys are unchanged: 32 B for X25519, 96 B for P-256 (uncompressed encoded point form).

Smaller

Project surface area

hpke-rshpke-ng
End-user binary (stripped, release)
561 KB
392 KB
−30%
Total project code (cloc)
5,631
4,817
−14%
Library source (cloc)
2,623
2,426
−8%
Test code (cloc)
2,230
1,124
−50%
Bench code (cloc)
759
1,179
+55%
Crates in workspace
4
1
−75%
User-facing Cargo features
11
7
−36%

Expanding on the above:

End-user binary, −30%. A minimal application — generate a key, seal a message, open it back, ten lines of Rust — compiled with RUSTFLAGS="-C target-cpu=native", lto="thin", codegen-units=1, strip="symbols". hpke-ng comes in at 392 KB; hpke-rs at 561 KB. 168 KB is not nothing on embedded targets, in WASM bundles, or in CDN-served binaries.

Library code, −8%. hpke-ng is 197 lines smaller than hpke-rs at the library level (2,426 vs 2,623), while implementing strictly more — the full HPKE surface plus the optional post-quantum suite (X-Wing draft-06, ML-KEM-768, ML-KEM-1024) that hpke-rs reaches only via experimental feature flags. The type-state design earns its keep here: ciphersuite selection lives entirely in the type system, so there is no provider trait, no per-primitive enum dispatch, and no glue between the two. Inline #[cfg(test)] modules are kept tight — anything covered by tests/roundtrip.rs’s 59-cell macro matrix is deleted from src/.

Test code, −50%. hpke-rs’s test suite is 167 tests in 2,230 lines; hpke-ng’s is 128 tests in 1,124 lines, with deeper coverage on roundtrips (59 macro-generated (mode, KEM, KDF, AEAD) combinations vs hpke-rs’s 17 hand-written cases). The reduction is structural — the type system carries information that would otherwise be repeated test setup, and the roundtrip! macro generates one test per supported configuration from a single declaration.

Bench code, +55%. This is the one row that runs against the section’s grain, and it’s deliberate. hpke-ng’s bench harness has grown to 1,179 lines while hpke-rs’s stays at 759 — the extra 420 lines are coverage, not waste. A single cargo bench --features comparative reproduces every head-to-head number in this post, against a real hpke-rs install pulled in as a dev-dependency, including the full post-quantum suite (X-Wing, ML-KEM-768, ML-KEM-1024) that hpke-rs ships only behind an experimental feature flag. hpke-rs distributes its bench code across twelve per-provider files; hpke-ng keeps the comparative numbers in one place, which is what made the 62-row table at the top of this post possible at all.

The one place this chart doesn’t reach is the fuzz harnesses. hpke-ng spends substantially more code on cargo-fuzz targets than hpke-rs does — deliberately so. That’s the next section.

Harder

cargo test · --features pq,kat-internals,differential
128 / 128 passing
128
tests passing
1.9s
total wall time
37×
faster than hpke-rs's KAT runner
4 / 1
cargo-fuzz targets (hpke-ng / hpke-rs)
unit (lib)46/46
RFC 9180 KAT13/13
roundtrip matrix59/59
differential vs hpke-rs8/8
doctests2/2
cargo-fuzz targets4 / 4 clean

The 1.9-second figure is the headline number. The full hpke-ng test matrix — 128 tests across library unit tests, RFC 9180 KAT, generative roundtrips across every ciphersuite × mode combination, and byte-by-byte differential vs hpke-rs — runs in about 1.9 seconds. The roundtrip layer alone is 59 macro-generated tests covering every supported (mode, KEM, KDF, AEAD) combination including all four post-quantum and X-Wing/ML-KEM rows. hpke-rs’s KAT runner is structured as a single test that iterates 144 vectors sequentially and takes about 70 seconds. Same vectors, same coverage, structured to take advantage of cargo test’s thread pool.

For day-to-day development the headline number understates the impact. A 70-second feedback loop is one you avoid running until you’re “done”; a 1.9-second feedback loop is one you run after every save.

The fuzz layer is where hpke-ng makes its biggest investment in lines of code. hpke-rs ships one cargo-fuzz target — a seal/open harness for one ciphersuite. hpke-ng ships four:

  • pk_from_bytes — fuzzes public-key parsing for all 9 KEMs (X25519, X448, P-256, P-384, P-521, secp256k1, X-Wing, ML-KEM-768, ML-KEM-1024).
  • enc_from_bytes — fuzzes encapsulated-key parsing for all 9 KEMs.
  • key_schedule — fuzzes the internal key schedule with arbitrary mode bytes (including invalid 0x04..=0xFF mode values), arbitrary PSK / PSK-ID combinations, and arbitrary shared secrets.
  • open — fuzzes Hpke::open_base with arbitrary [encap || ciphertext] byte splits against a fixed receiver keypair.

The shared invariant across all four is that panics are bugs. Authentication failures, decode errors, length mismatches — all expected outcomes that the harness considers a successful run. A panic, a misaligned-pointer fault, or a debug-assertion failure under cargo-fuzz’s instrumentation is a finding. As of release, all four targets run clean.

There are also several structural footgun-prevention details that don’t show up in the fuzz output but are worth listing.

Context is not Clone. Cloning an HPKE context lets two callers reuse the same (key, base_nonce, seq) triple and produce a nonce-reuse bug — the kind of bug that’s invisible until it isn’t and unrecoverable when it surfaces. hpke-ng’s Context deliberately doesn’t implement Clone; cloning is a compile error.

Context::seal refuses to encrypt at seq == u64::MAX. A pre-check, before nonce computation. This makes nonce-reuse via counter wraparound structurally impossible regardless of how the caller handles a MessageLimitReached error.

All-zero shared-secret rejection (RFC 9180 §7.1.4) uses subtle::ConstantTimeEq for X25519 and X448. An attacker who supplies a small-order point and watches for timing variance is one of the more subtle attacks on DHKEM; the constant-time comparison closes it.

AEAD nonce length is enforced at compile time. Context::compute_nonce uses a const assertion that the AEAD’s NONCE_LEN is between 8 and 12 bytes. Any AEAD that violates this is a compile error at the call site that uses it.

Interop

We tested interop two ways. The first is RFC 9180 known-answer tests: both libraries are run against the same vendored test vector JSON (8 MB, derived from RFC 9180’s own test vector tooling) and required to produce byte-equal key, base_nonce, exporter_secret, decrypted ciphertexts, and exported values for every vector. The second is byte-by-byte differential testing: a deterministic ChaCha20Rng feeds identical inputs to both libraries; hpke-ng plays sender, hpke-rs plays receiver, and every byte that crosses the wire is asserted equal. Roughly 600 byte-equality assertions per CI run, all passing.

CiphersuiteRFC 9180 KATDifferential vs hpke-rs
DHKEM(X25519, SHA-256) × ChaCha20-Poly1305✓ Base / Psk / Auth / AuthPsk✓ Base + Psk
DHKEM(X25519, SHA-256) × AES-128-GCM✓ Base / Psk✓ Base
DHKEM(X25519, SHA-256) × AES-256-GCM✓ Base / Psk✓ Base
DHKEM(X25519, SHA-256) × ExportOnly✓ Base / Psk— no AEAD
DHKEM(P-256, SHA-256) × ChaCha20-Poly1305✓ Base / Psk✓ via KAT
DHKEM(P-256, SHA-256) × AES-128-GCM✓ Base / Psk / Auth / AuthPsk✓ Base + Psk
DHKEM(P-521, SHA-512) × AES-256-GCM✓ Base / Psk / Auth / AuthPsk— hpke-rs/RustCrypto unsupported
DHKEM(secp256k1, SHA-256) × ChaCha20-Poly1305✓ Base / Psk✓ via KAT
DHKEM(X448, SHA-512) × ChaCha20-Poly1305✓ Base / Psk— hpke-rs/RustCrypto unsupported
ML-KEM-768 × ChaCha20-Poly1305— no RFC vectors— seed-derivation differs by design
X-Wing draft-06 × ChaCha20-Poly1305— no RFC vectors— seed-derivation differs by design

Some gaps worth covering, for full disclosure:

Auth and AuthPsk differential. hpke-rs’s seed() injects raw bytes for the base ephemeral; for Auth modes there is also a sender static keypair derived earlier, before any seed-injection happens. Aligning the two libraries’ state for byte-by-byte Auth-mode differential testing would require a deeper hpke-rs API hook than hpke-test-prng exposes. Auth/AuthPsk-mode interop is verified at the KAT layer instead — both libraries pass the X25519+ChaCha20 Auth/AuthPsk vectors, the P-256+AES-128 vectors, and the P-521+AES-256 vectors.

Post-quantum differential. hpke-rs’s X-Wing and ML-KEM implementations use different SHAKE-256 seeding from hpke-ng’s RFC 9180 §7.1.3-compliant derive_key_pair construction, so the libraries produce different ephemeral keys from the same IKM bytes. The encap wire format itself is determined by the underlying KEM crate (which both use), so they should agree at the wire level — but we haven’t tested it in this repository, and we’d rather call that out than pretend otherwise.

Migrate today

Both libraries pass the same KATs against the same primitive crates. If your code uses HPKE through a small wrapper — which most production HPKE code does — switching is mechanical:

// hpke-rs
let mut hpke = Hpke::<HpkeRustCrypto>::new(
    Mode::Base,
    KemAlgorithm::DhKem25519,
    KdfAlgorithm::HkdfSha256,
    AeadAlgorithm::ChaCha20Poly1305,
);
let kp = hpke.generate_key_pair()?;
let (sk, pk) = kp.into_keys();
let (enc, ct) = hpke.seal(&pk, info, aad, pt, None, None, None)?;

// hpke-ng
type Suite = Hpke<DhKemX25519HkdfSha256, HkdfSha256, ChaCha20Poly1305>;
let mut os = OsRng;
let mut rng = os.unwrap_mut();
let (sk, pk) = DhKemX25519HkdfSha256::generate(&mut rng)?;
let (enc, ct) = Suite::seal_base(&mut rng, &pk, info, aad, pt)?;

The most common migration shape: define a type Suite = Hpke<…, …, …> alias once, change hpke.seal calls to Suite::seal_base (or seal_psk / seal_auth / seal_auth_psk per mode), thread an &mut rng through the call sites that need encap entropy, drop the Option placeholders.

Get it

[dependencies]
hpke-ng = "0.1.0-rc.3"

If you find a row in our benchmark suite that’s wrong, an interop gap we haven’t documented, or a footgun we missed, file an issue. We’d rather know.

Read more Cryptographic audits, advisories, and research from Symbolic Software. New posts roughly twice a month. RSS GitHub

More from Software

2026.03.01 · Software

Making Verifpal Easier to Reason About

Verifpal's analysis engine has been redesigned with a unified equational theory, provenance-tagged values, a formally grounded deduction loop, and a bounded-depth search that runs 3x faster — plus updated tooling across the board.

11 min read