Reproducible Builds

Reproducible builds address software provenance by making the build itself deterministic. A build is reproducible when the same source code, dependencies, toolchain, build instructions, and environment produce bit-for-bit identical output across separate machines. The goal is to make the relationship between declared inputs and output bytes behave like a stable one-to-one mapping for verification. A given input set should always produce the same artifact, and a matching artifact digest should identify the input set that produced it. Under that condition, a consumer does not need to trust the original build machine or its operator. The consumer, or some independent rebuilder, can rebuild the artifact and compare digests. If the locally produced digest matches the distributed artifact digest, the artifact corresponds to the published inputs.

                         Declared build inputs
                  source, deps, toolchain, environment
                                     │
                     ┌───────────────┴───────────────┐
                     │                               │
                     ▼                               ▼
          ┌──────────────────────┐        ┌──────────────────────┐
          │  Provider build      │        │  Independent         │
          │  original system     │        │  rebuild             │
          └───────────┬──────────┘        └───────────┬──────────┘
                      │                               │
                      ▼                               ▼
          ┌──────────────────────┐        ┌──────────────────────┐
          │  Artifact digest A   │        │  Artifact digest B   │
          └───────────┬──────────┘        └───────────┬──────────┘
                      │                               │
                      └───────────────┬───────────────┘
                                      ▼
                             ┌────────────────────┐
                             │   Accept if A = B  │
                             └────────────────────┘

Reproducible builds verify provenance by independent reconstruction. The original build and the rebuild must converge on the same output digest from the same declared inputs.

A number of projects work on making this practical across the open source ecosystem. One of them is the Reproducible Builds project, which supports reproducibility efforts across many parts of that ecosystem. Lamb and Zacchiroli survey the state of reproducibility across the Debian distribution and discuss its role in supply-chain integrity, including resistance to build-time tampering, independent verification of binaries, and reduced trust in build infrastructure.

The limitation is practical rather than conceptual. Many production builds are not deterministic. Compilers embed timestamps, parallel builds produce outputs in varying order, linkers record full file paths, archive tools preserve inconsistent ordering, and generated files may depend on host state. Achieving bit-for-bit identical output requires controlling these factors across the entire dependency tree. A single non-reproducible component anywhere in the chain breaks the guarantee.

Mapped against the constraints in the problem statement, this is where reproducible builds run into trouble. Achieving bit-for-bit determinism across an arbitrary set of languages, package managers, and toolchains is hard, which strains constraint 5 (toolchain coverage).

Nix and Reproducible Build Environments

Nix is a package manager and build system that describes software builds as pure functions of their inputs, captured in files called derivations. Rather than requiring every build output to be bit-for-bit reproducible, Nix makes the build environment itself reproducible. A derivation or flake lock identifies the source, dependencies, toolchain, and build recipe used to construct an output, and Nix uses those identifiers to build the same software in the same way on any machine. This is a substantial improvement over package managers where dependency resolution can vary across machines or over time.

The distinction matters. A reproducible build environment does not by itself imply reproducible build output. The same pinned source and compiler can still produce different artifacts if the compiler, linker, archive tooling, generated files, or build scripts contain non-determinism. Nix can support reproducible builds when the underlying toolchain is deterministic, but the reproducibility property comes from the combination of a controlled environment and deterministic build behavior, not from environment pinning alone.

Nix also leaves a trust question at the build boundary. A derivation hash proves consistency with a particular Nix evaluation, but it does not prove that the evaluator, builder, or local toolchain executed honestly. A compromised build host can still alter the build process or output after evaluation. Nix therefore provides strong input and environment control, and can be part of a provenance strategy, but it does not by itself provide hardware-rooted evidence that a particular artifact was produced by a particular build execution.

Attestable Builds

An attestable build is one that runs inside a Trusted Execution Environment and emits hardware-signed evidence about the environment, inputs, and outputs observed during the build. Verification then becomes a check of that evidence (signature, environment measurement, and digest comparisons) rather than a re-execution of the build itself. The trust anchor moves from the build infrastructure and its operators to the TEE hardware and its attestation chain.

Two prior systems instantiate this idea concretely. TEE Compile, developed by Automata Network, runs a project's build inside an AWS Nitro Enclave. A worker process inside the enclave fetches dependencies through the host, builds the project, and emits a tuple of artifact bytes, an input/output hash report, and a Nitro attestation document whose PCR0 covers the enclave image. Verification compares input and output hashes against the report and checks that the enclave image is one the verifier trusts. The trust root is the AWS Nitro Attestation PKI rather than a CPU vendor's root, and Nitro Enclaves are constrained micro-VMs rather than general-purpose confidential VMs. How a verifier independently obtains a trustworthy reference measurement for the enclave image itself is not specified.

Hugenroth, Lins, Mayrhofer, and Beresford generalize the approach to AMD SEV-SNP, Intel TDX, and AWS Nitro Enclaves alike. Their design treats the host and the build process as untrusted, and introduces an integrity-protected Enclave Client inside the TEE. That client records the repository snapshot hash before the untrusted build starts, runs the build inside an inner sandbox built on containerd and gVisor, records the produced artifact hash, and requests a TEE attestation covering the launch measurement, source snapshot hash, and artifact hash. The sandbox matters because the build process itself may execute arbitrary project code: the TEE protects the Enclave Client from the host, and the sandbox protects the Enclave Client from the build it is observing. The authors acknowledge the bootstrap problem, that a verifier needs to know the expected launch measurement of a genuine Enclave Client image, and recommend that the very first such image be produced with reproducible builds, but defer that step to future work.

This line of work changes the verification question. A reproducible build asks whether a second build produces the same bytes. An attestable build asks whether a particular artifact is accompanied by hardware-rooted evidence that a measured build environment observed particular inputs and produced particular output bytes. Both approaches try to close the source-to-binary gap, but they make different tradeoffs. Reproducible builds minimize trust by requiring deterministic reconstruction, while attestable builds reduce reconstruction cost by shifting trust to TEE hardware and its attestation chain.

Kettle builds on this prior work in three specific ways. First, the trust root is the CPU vendor's attestation chain (AMD VCEK, Intel TDX) rather than an operator-controlled PKI, so the signing identity behind every Kettle build is the same one that signs the underlying TEE attestation. Second, Kettle commits an explicit Merkle-rooted input manifest into a SLSA Provenance v1.2 document carried in a canonical in-toto Statement (see Provenance Format and Standards) and binds the document to hardware via the report-data field, so the evidence format is one that existing supply-chain tooling can already consume. Third, Kettle closes the bootstrap loop that prior work defers: each Kettle release is itself reproducibly built using the Stage^x deterministic toolchain, and the CVM image is reproducibly assembled from that Kettle binary and a published recipe (see TEE Environment Setup, Measurement Allow-Lists, and Kettle). A verifier rebuilds the image from public source, confirms that the resulting launch measurement matches the value published with the release, and adds the measurement to their allow-list. The trust question thereby reduces to source inspection plus the verifier's own toolchain, rather than trust in any image distributor. Kettle also offers an optional pre-attested confidential build flow (see Confidential Builds) for sensitive source, which has no equivalent in either system above. The longer-term goal of layering an inner sandbox on top of CVM isolation, in the spirit of Hugenroth et al., is described under Open Directions.

References

Reproducible Builds. Reproducible Builds project.
Lamb, Chris, and Stefano Zacchiroli. Reproducible Builds: Increasing the Integrity of Software Supply Chains. IEEE Software 39(2):62–70, 2022. Preprint: https://arxiv.org/abs/2104.06020.
Dolstra, Eelco, Merijn de Jonge, and Eelco Visser. Nix: A Safe and Policy-Free System for Software Deployment. In Proceedings of the 18th USENIX Conference on System Administration (LISA '04), pp. 79–92, 2004.
Automata Network. Creating Attestable Builds with AWS Nitro Enclaves. AWS Builder Center, September 2024.
Hugenroth, Daniel, Mario Lins, René Mayrhofer, and Alastair R. Beresford. Attestable Builds: Compiling Verifiable Binaries on Untrusted Systems using Trusted Execution Environments. In Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25), pp. 4514–4528, 2025. Preprint: https://arxiv.org/abs/2505.02521.
Stage^x. A container-native, full-source bootstrapped, reproducible toolchain. Source at https://codeberg.org/stagex/stagex.
Lunal Dev. Kettle: attested builds for verifiable software provenance.