apt-get install gobby-infinote gobby -c gobby.debian.org -n debconf13/bof/reproducible-builds Byte-for-byte identical reproducible builds? ============================================ BoF at DebConf13 / Vaumarcus, Switzerland; chair: Lunar Abstract: The Bitcoin client and the upcoming Tor Browser Bundle 3.0 series are using a build system that produces “deterministic builds” — packages which are byte-for-byte identical no matter who actually builds them, or what hardware they use. The idea is that current popular software development practices simply cannot survive targeted attacks of the scale and scope that we are seeing today. With “deterministic builds”, any individual can use an anonymity network to download publicly signed and audited source code and reproduce the builds exactly, without being subject to such targeted attacks. If they notice any differences, they can alert the public builders/signers, hopefully anonymously. Is such ideas applicable to Debian? To what extent? What would be the first stones to pave the way toward reproducible builds of Debian packages? Foreword -------- Huge, huge thanks to Asheesh for helping me prepare this BoF. Agenda ------ “Good news everyone! We are are going to get pwned!” — Professor Farnsworth 1. Go around: why do you care? (5-10 min.) 2. Mike Perry's work on the Tor Browser Bundle (5 min.) 3. Asheesh's experiments (5 min.) 4. On the technical side, there's two aspects to the problem: a. at the package level: How do we guarantee that given the same source package and the same build environment, we get the same binary results? (5-10 min.) b. at the archive level: How to record the build environment of a package (and enable its reproduction at a later time)? (5 min.) 5. What's next? (15 min.) Experience from making the Tor Browser Bundle builds reproducible ----------------------------------------------------------------- Mike Perry worked on making the Tor Browser Bundle builds reproducible. That's hard work: Tor Browser is based on Firefox (huge code base) and is built for Linux, Mac OS X and Windows. - How: - Uses Gitian from Bitcoin - Thin layer around Ubuntu virtualization tools - Spins up a ubuntu VM with fixed hostname, username, path, and fake timestamps (via faketime) - List packages and architecture - Runs a bash script you specify - Cross compiles for Windows (mingw-w64) and Mac (toolchain4) - Took about 3-4 days per OS to write a working descriptor set for Tor, Firefox and bundling/localization - 2 weeks after starting, I was producing matching repeat builds on my own hardware - Issues: - FIPS-140 mode has non-deterministic sigs on Linux - Millisecond timestamps encoded by Firefox - Mystery 3 bytes of randomness on Windows. Bitstomped - 6 more weeks of work to get the builds to match externally - Filesytem reordering - Affects Zip, Tar, .a, and even aspects of Firefox scripts - created wrappers for archives - Firefox ordering enforced via sorting inputs in Firefox scripts - Localization LC_ALL leaks - Alters sort order - Permissions differences - Even though I set umask... To sum it up: the key that needs to be controled are the hostname, username, build path, OS locale, uname output, toolchain version, and time. We can either make everything deterministic or record on first build and the replay on subsequent builds. Results from Asheesh's experiments ---------------------------------- Asheesh jumped on the idea and played with the hello package. Rebuilt using faketime on top of fakeroot. * When you rebuild that way, the data.tar.gz of the built Debian package has the same contents * Same with control.tar.gz However, the data.tar.gz and control.tar.gz *both* don't match each other. This is because of a semi-bug in dpkg, we need convince dpkg to fix the 'not calling gzip -n' issue. * ELF binaries like /usr/bin/hello in the "hello" package contain *no* timestamp that needs to be stripped. * gzip files need '-n' to be passed to gzip for avoiding embedding a timestamp. * xz and bzip2 don't have this problem. I'm too pressed for time to write a test script, but I did test it. * dedup.debian.net can be used to detect duplicates, especially if we hack it to detect files that change between uploads of a package, rather than just between packages. - future work: ssdeep hashes, which could be useful for finding files that should be duplicates but aren't NOTE that this might instead be because the *timestamps* of files within control.tar.gz and data.tar.gz.. testing that theory... I have not finished testing this theory, sadly, but here is a shell script I use to set up a lab: http://rose.makesad.us/~paulproteus/tmp/extract_both.sh - please provide an index for the PTS :) Package level issues -------------------- ### time * Remove/strip the timestamps for build results. * Use faketime (reports faked system time to programs). Time could be automatically set to the time of the last debian/changelog entry. * Base timestamps on timestamps of the source code, which should be unchanged * Record time on first build and replay them later (see below). (In most case, recording the time of the build is actually wrong. For documentation, what matters is the time of the last change in the source package and not the time of the build itself.) ### Build path * Debian buildds use per-build temporary path names; so that any paths accidentally embedded in binaries do not exist on end-user systems (potential security issue). * Stripping the path with debugedit (???) * Correct solution: patch out where path appears -> use paths relative to the builddir instead of having a common build directory for everyone. (Because having encoded paths can hide real bugs, anyway.) ### OS locale * Use LANG=C.UTF-8 ? -> LC_ALL=C.UTF-8 * Let's make dpkg-buildpackage export this value (or another wrapper? because dpkg-buildpackage is not the policy canonical way to build all packages; but debian/rules is painful) Lets make this an option so that users see translated messages and the buildds all build with English * Change the policy to make dpkg-buildpackage be the canonical solution to build package. ### hostname, uname output, username liblietome? But kernel version is part of the build environment, so we might need to record that somewhere else. Are kernels used on buildds always available? Or are some using non-standard kernels? ### toolchain version * part of the system state and build info ### file ordering issues Need to patch the build systems to add proper `sort` calls. ### Randomisation * Define seed? * ASLR? ### pid numbers Let's patch that out if needed. ### Others issues? Archive level issues -------------------- Not all packages are built on the buildds so the build environment isn't going to be the same (for now). .changes file are not currently kept except on mailing lists. We want .changes files: they are signed by the maintainer. If we keep .changes file, we can add a `XC-Built-Environment` field. It would add to the .changes files something like: Built-Environment: apt (= 0.9.9.4), aptitude (= 0.6.8.2-1), aptitude-common (= 0.6.8.2-1), base-files (= 7.2), base-passwd (= 3.5.26), bash (= 4.2+dfsg-1), binutils (= 2.23.52.20130727-1), bsdutils (= 1:2.20.1-5.5), build-essential (= 11.6), bzip2 (= 1.0.6-4), ccache (= 3.1.9-1), coreutils (= 8.21-1), cpp (= 4:4.8.1-2), cpp-4.6 (= 4.6.4-4), cpp-4.7 (= 4.7.3-6), cpp-4.8 (= 4.8.1-8), dash (= 0.5.7-3), debconf (= 1.5.50), debconf-i18n (= 1.5.50), debian-archive-keyring (= 2012.4), debianutils (= 4.4), diffutils (= 1:3.2-8), dpkg (= 1.17.1), dpkg-dev (= 1.17.1), e2fslibs (= 1.42.8-1), e2fsprogs (= 1.42.8-1), fakeroot (= 1.19-2), findutils (= 4.4.2-6), g++ (= 4:4.8.1-2), g++-4.6 (= 4.6.4-4), g++-4.8 (= 4.8.1-8), gcc (= 4:4.8.1-2), gcc-4.4-base (= 4.4.7-4), gcc-4.5-base (= 4.5.4-1), gcc-4.6 (= 4.6.4-4), gcc-4.6-base (= 4.6.4-4), gcc-4.7 (= 4.7.3-6), gcc-4.7-base (= 4.7.3-6), gcc-4.8 (= 4.8.1-8), gcc-4.8-base (= 4.8.1-8), gnupg (= 1.4.14-1), gpgv (= 1.4.14-1), grep (= 2.14-2), gzip (= 1.6-1), hostname (= 3.13), initscripts (= 2.88dsf-43), insserv (= 1.14.0-5), less (= 458-2), libacl1 (= 2.2.52-1), libapt-pkg4.12 (= 0.9.9.4), libasan0 (= 4.8.1-8), libatomic1 (= 4.8.1-8), libattr1 (= 1:2.4.47-1), libblkid1 (= 2.20.1-5.5), libboost-iostreams1.49.0 (= 1.49.0-4), libbz2-1.0 (= 1.0.6-4), libc-bin (= 2.17-92), libc-dev-bin (= 2.17-92), libc6 (= 2.17-92), libc6-dev (= 2.17-92), libcap2 (= 1:2.22-1.2), libclass-isa-perl (= 0.36-5), libcloog-isl4 (= 0.18.0-2), libcloog-ppl1 (= 0.16.1-3), libcomerr2 (= 1.42.8-1), libcwidget3 (= 0.5.16-3.4), libdb5.1 (= 5.1.29-6), libdpkg-perl (= 1.17.1), libept1.4.12 (= 1.0.9), libfile-fcntllock-perl (= 0.14-2), libgcc-4.7-dev (= 4.7.3-6), libgcc-4.8-dev (= 4.8.1-8), libgcc1 (= 1:4.8.1-8), libgdbm3 (= 1.8.3-12), libgmp10 (= 2:5.1.2+dfsg-2), libgmpxx4ldbl (= 2:5.1.2+dfsg-2), libgomp1 (= 4.8.1-8), libgpm2 (= 1.20.4-6.1), libisl10 (= 0.11.2-1), libitm1 (= 4.8.1-8), liblocale-gettext-perl (= 1.05-7+b1), liblzma5 (= 5.1.1alpha+20120614-2), libmount1 (= 2.20.1-5.5), libmpc2 (= 0.9-4), libmpc3 (= 1.0.1-1), libmpfr4 (= 3.1.1-1), libncurses5 (= 5.9+20130608-1), libncursesw5 (= 5.9+20130608-1), libpam-modules (= 1.1.3-9), libpam-modules-bin (= 1.1.3-9), libpam-runtime (= 1.1.3-9), libpam0g (= 1.1.3-9), libpcre3 (= 1:8.31-2), libppl-c4 (= 1:1.0-7), libppl12 (= 1:1.0-7), libquadmath0 (= 4.8.1-8), libreadline6 (= 6.2+dfsg-0.1), libselinux1 (= 2.1.13-2), libsemanage-common (= 2.1.10-2), libsemanage1 (= 2.1.10-2), libsepol1 (= 2.1.9-2), libsigc++-2.0-0c2a (= 2.2.10-0.2), libslang2 (= 2.2.4-15), libsqlite3-0 (= 3.7.17-1), libss2 (= 1.42.8-1), libstdc++-4.8-dev (= 4.8.1-8), libstdc++6 (= 4.8.1-8), libstdc++6-4.6-dev (= 4.6.4-4), libswitch-perl (= 2.16-2), libtext-charwidth-perl (= 0.04-7+b1), libtext-iconv-perl (= 1.7-5), libtext-wrapi18n-perl (= 0.06-7), libtimedate-perl (= 1.2000-1), libtinfo5 (= 5.9+20130608-1), libtsan0 (= 4.8.1-8), libusb-0.1-4 (= 2:0.1.12-23.2), libustr-1.0-1 (= 1.0.4-3), libuuid1 (= 2.20.1-5.5), libxapian22 (= 1.2.15-2), linux-libc-dev (= 3.10.3-1), login (= 1:4.1.5.1-1), lsb-base (= 4.1+Debian12), make (= 3.81-8.2), mawk (= 1.3.3-17), mount (= 2.20.1-5.5), multiarch-support (= 2.17-92), ncurses-base (= 5.9+20130608-1), ncurses-bin (= 5.9+20130608-1), passwd (= 1:4.1.5.1-1), patch (= 2.7.1-3), perl (= 5.14.2-21), perl-base (= 5.14.2-21), perl-modules (= 5.14.2-21), readline-common (= 6.2+dfsg-0.1), screen (= 4.1.0~20120320gitdb59704-9), sed (= 4.2.2-2), sensible-utils (= 0.0.9), sysv-rc (= 2.88dsf-43), sysvinit (= 2.88dsf-43), sysvinit-utils (= 2.88dsf-43), tar (= 1.26+dfsg-6), tzdata (= 2013d-1), ucf (= 3.0027+nmu1), util-linux (= 2.20.1-5.5), vim (= 2:7.3.923-3), vim-common (= 2:7.3.923-3), vim-runtime (= 2:7.3.923-3), xz-utils (= 5.1.1alpha+20120614-2), zlib1g (= 1:1.2.8.dfsg-1) (Example naively generated by taking all packages installed by pbuilder when building the `hello` package.) * Do we want to trim this list? How? -> use the access time to files in the various packages to determine what was used or not (or another mechanism to be notified of packages that matters) * Do we want to include arch (eg. `:amd64`) in there? Yes - multiarch means we can have cross-arch deps (but not yet - britney needs work) Then, the good news: snapshot.debian.org keeps binary packages! but not .changes make (= 3.81-8.2) => http://snapshot.debian.org/package/make-dfsg/3.81-8.2/#make_3.81-8.2 Is there an easy way to script installing a specific set of binary packages from snapshot? Yes - use a specific date in your sources.list: deb http://snapshot.debian.org/archive/debian/20091004T111800Z/ lenny main deb-src http://snapshot.debian.org/archive/debian/20091004T111800Z/ lenny main deb http://snapshot.debian.org/archive/debian-security/20091004T121501Z/ lenny/updates main deb-src http://snapshot.debian.org/archive/debian-security/20091004T121501Z/ lenny/updates main What's next? ------------ * Do we have a “Champion”?… looks like not. :( * Fill up a page on the wiki * Who wants to have their package build reproducible? - Asheesh: alpine - Lunar: haveged - pabs: iotop (python based) - joeyh: debhelper :D - lindi: magit * [Asheesh] Convince dpkg to fix the 'not calling gzip -n' issue. * Another change needed in dpkg: tar --numeric-owner --owner=0 * [Asheesh, Helmut] Attempt to code a downstream version of dedup.debian.net that lets us detect when files change between uploads of a package, and then run it on the archive. * Automated archive-wide testing of this issue and export to the PTS * [rbalint, lindi] libfaketime updates? advancing time in faketime with each time() call: https://github.com/wolfcw/libfaketime/pull/20 [rbalint] replaying timestamp needs bigger changes in faketime, I'm working on those * [fil] talk to Ganeff about keeping .changes - hash chain from the Release files needed * Script to transform the "Built-Environment" list to links to file in the snapshot archives. * pbuilder like script that install all the packages in a chroot and rebuild the package there. * How about a sprint‽ Yes! Together with Multi-Arch friends? Sponsorship from ARM? Other ideas: * Research other distros (NixOS?) * Research https://build.opensuse.org/package/show/openSUSE:Factory/build-compare * Deterministic virtual machines "ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay" http://www.eecs.umich.edu/virtual/papers/dunlap02.pdf (HTTP 403 currently :-() "Debugging operating systems with time-traveling virtual machines" http://www.eecs.umich.edu/virtual/papers/king05_1.pdf (HTTP 403 currently :-() "A Particular Bug Trap: Execution Replay Using Virtual Machines" http://arxiv.org/pdf/cs.DC/0310030 "ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay" "Execution Replay for Multiprocessor Virtual Machines" http://www.eecs.umich.edu/~pmchen/papers/dunlap08.slides.ppt More post-BoF experiments ------------------------- diff --git a/debian/control b/debian/control index 1ef9ccd..50b5221 100644 --- a/debian/control +++ b/debian/control @@ -7,6 +7,7 @@ Standards-Version: 3.9.4 Homepage: http://www.issihosts.com/haveged/ Vcs-Git: git://git.debian.org/git/collab-maint/haveged.git Vcs-Browser: http://git.debian.org/?p=collab-maint/haveged.git +XC-Build-Environment: ${misc:Build-Environment} Package: haveged Architecture: linux-any diff --git a/debian/rules b/debian/rules index 04d6fcc..cb2cdf3 100755 --- a/debian/rules +++ b/debian/rules @@ -15,3 +15,10 @@ override_dh_auto_configure: override_dh_strip: dh_strip --dbg-package=libhavege1-dbg + +override_dh_gencontrol: + COLUMNS=999 | dpkg -l | awk ' \ + BEGIN { printf "misc:Build-Environment=" } \ + /^ii/ { ORS=", "; print $$2 " (= " $$3 ")" }' | \ + sed -e 's/, $$//' >> debian/substvars + dh_gencontrol This does not work as `dpkg-genchanges` does not substitute the variable before adding the field in debian/changes! :( — Lunar But it is a trivial patch against dpkg: diff --git a/scripts/dpkg-genchanges.pl b/scripts/dpkg-genchanges.pl index 0b004c7..13cedd6 100755 --- a/scripts/dpkg-genchanges.pl +++ b/scripts/dpkg-genchanges.pl @@ -516,4 +516,5 @@ for my $f (keys %remove) { delete $fields->{$f}; } -$fields->output(\*STDOUT); # Note: no substitution of variables +$fields->apply_substvars($substvars); +$fields->output(\*STDOUT); -------------------------------------------------------- -----------------------------------------------------------