apt-get install gobby-infinote
gobby -c gobby.debian.org -n
debconf13/bof/reproducible-builds

Byte-for-byte identical reproducible builds?
============================================

BoF at DebConf13 / Vaumarcus, Switzerland; chair: Lunar

Abstract:

    The Bitcoin client and the upcoming Tor Browser Bundle 3.0 series
    are using a build system that produces “deterministic builds” —
    packages which are byte-for-byte identical no matter who actually
    builds them, or what hardware they use. The idea is that current
    popular software development practices simply cannot survive
    targeted attacks of the scale and scope that we are seeing today.
    With “deterministic builds”, any individual can use an anonymity
    network to download publicly signed and audited source code and
    reproduce the builds exactly, without being subject to such
    targeted attacks. If they notice any differences, they can alert
    the public builders/signers, hopefully anonymously.

    Is such ideas applicable to Debian? To what extent? What would be
    the first stones to pave the way toward reproducible builds of
    Debian packages?

Foreword
--------

Huge, huge thanks to Asheesh for helping me prepare this BoF.

Agenda
------

    “Good news everyone! We are are going to get pwned!”
                                         — Professor Farnsworth

1. Go around: why do you care? (5-10 min.)
2. Mike Perry's work on the Tor Browser Bundle (5 min.)
3. Asheesh's experiments (5 min.)
4. On the technical side, there's two aspects to the problem:
   a. at the package level: How do we guarantee that given the same
      source package and the same build environment, we get the same
      binary results? (5-10 min.)
   b. at the archive level: How to record the build environment of a
      package (and enable its reproduction at a later time)?
      (5 min.)
5. What's next? (15 min.)

Experience from making the Tor Browser Bundle builds reproducible
-----------------------------------------------------------------

Mike Perry worked on making the Tor Browser Bundle builds
reproducible. That's hard work: Tor Browser is based on Firefox
(huge code base) and is built for Linux, Mac OS X and Windows.

  - How:
    - Uses Gitian from Bitcoin
      - Thin layer around Ubuntu virtualization tools
      - Spins up a ubuntu VM with fixed hostname, username,
        path, and fake timestamps (via faketime)
      - List packages and architecture
      - Runs a bash script you specify
      - Cross compiles for Windows (mingw-w64) and Mac (toolchain4)
    - Took about 3-4 days per OS to write a working descriptor set
      for Tor, Firefox and bundling/localization
    - 2 weeks after starting, I was producing matching repeat builds
      on my own hardware
      - Issues:
        - FIPS-140 mode has non-deterministic sigs on Linux
        - Millisecond timestamps encoded by Firefox
        - Mystery 3 bytes of randomness on Windows. Bitstomped
    - 6 more weeks of work to get the builds to match externally
      - Filesytem reordering
        - Affects Zip, Tar, .a, and even aspects of Firefox scripts
          - created wrappers for archives
          - Firefox ordering enforced via sorting inputs in Firefox scripts
      - Localization LC_ALL leaks
        - Alters sort order
      - Permissions differences
        - Even though I set umask...

To sum it up: the key that needs to be controled are the hostname,
username, build path, OS locale, uname output, toolchain version,
and time. We can either make everything deterministic or record on first build and the replay on subsequent builds.

Results from Asheesh's experiments
----------------------------------

Asheesh jumped on the idea and played with the hello package.
Rebuilt using faketime on top of fakeroot.

* When you rebuild that way, the data.tar.gz of the built Debian
  package has the same contents
* Same with control.tar.gz

However, the data.tar.gz and control.tar.gz *both* don't match each
other. This is because of a semi-bug in dpkg, we need convince dpkg
to fix the 'not calling gzip -n' issue.

* ELF binaries like /usr/bin/hello in the "hello" package
  contain *no* timestamp that needs to be stripped.
* gzip files need '-n' to be passed to gzip for avoiding embedding a
  timestamp.
* xz and bzip2 don't have this problem. I'm too pressed for time to
  write a test script, but I did test it.
* dedup.debian.net can be used to detect duplicates, especially if
  we hack it to detect files that change between uploads of a package,
  rather than just between packages.
  - future work: ssdeep hashes, which could be useful for finding files
    that should be duplicates but aren't

NOTE that this might instead be because the *timestamps* of files within
control.tar.gz and data.tar.gz.. testing that theory... I have not finished
testing this theory, sadly, but here is a shell script I use to set up a lab:

http://rose.makesad.us/~paulproteus/tmp/extract_both.sh
 - please provide an index for the PTS :)

Package level issues
--------------------

### time

 * Remove/strip the timestamps for build results.
 * Use faketime (reports faked system time to programs). Time could be
   automatically set to the time of the last debian/changelog entry.
 * Base timestamps on timestamps of the source code, which should be unchanged
 * Record time on first build and replay them later (see below).

(In most case, recording the time of the build is actually
wrong. For documentation, what matters is the time of the last
change in the source package and not the time of the build
itself.)

### Build path

 * Debian buildds use per-build temporary path names; so that any paths accidentally embedded in binaries do not exist on end-user systems (potential security issue).
 * Stripping the path with debugedit (???)
 * Correct solution: patch out where path appears -> use paths relative to the builddir
   instead of having a common build directory for everyone.
   (Because having encoded paths can hide real bugs, anyway.)

### OS locale

 * Use LANG=C.UTF-8 ? -> LC_ALL=C.UTF-8
 * Let's make dpkg-buildpackage export this value
   (or another wrapper? because dpkg-buildpackage is not
   the policy canonical way to build all packages;
   but debian/rules is painful)
   Lets make this an option so that users see translated messages
   and the buildds all build with English 
 * Change the policy to make dpkg-buildpackage be the canonical
   solution to build package.

### hostname, uname output, username

liblietome?

But kernel version is part of the build environment, so
we might need to record that somewhere else. Are kernels used on buildds always available? Or are some using non-standard kernels?

### toolchain version

 * part of the system state and build info

### file ordering issues

Need to patch the build systems to add proper `sort` calls.

### Randomisation

 * Define seed?
 * ASLR?

### pid numbers

Let's patch that out if needed.

### Others issues?


Archive level issues
--------------------

Not all packages are built on the buildds so the build environment isn't going to be the same (for now).

.changes file are not currently kept except on mailing lists.

We want .changes files: they are signed by the maintainer.

If we keep .changes file, we can add a `XC-Built-Environment` field.
It would add to the .changes files something like:

Built-Environment:
 apt (= 0.9.9.4), aptitude (= 0.6.8.2-1), aptitude-common (= 0.6.8.2-1),
 base-files (= 7.2), base-passwd (= 3.5.26), bash (= 4.2+dfsg-1),
 binutils (= 2.23.52.20130727-1), bsdutils (= 1:2.20.1-5.5),
 build-essential (= 11.6), bzip2 (= 1.0.6-4), ccache (= 3.1.9-1),
 coreutils (= 8.21-1), cpp (= 4:4.8.1-2), cpp-4.6 (= 4.6.4-4),
 cpp-4.7 (= 4.7.3-6), cpp-4.8 (= 4.8.1-8), dash (= 0.5.7-3),
 debconf (= 1.5.50), debconf-i18n (= 1.5.50),
 debian-archive-keyring (= 2012.4), debianutils (= 4.4),
 diffutils (= 1:3.2-8), dpkg (= 1.17.1), dpkg-dev (= 1.17.1),
 e2fslibs (= 1.42.8-1), e2fsprogs (= 1.42.8-1), fakeroot (= 1.19-2),
 findutils (= 4.4.2-6), g++ (= 4:4.8.1-2), g++-4.6 (= 4.6.4-4),
 g++-4.8 (= 4.8.1-8), gcc (= 4:4.8.1-2), gcc-4.4-base (= 4.4.7-4),
 gcc-4.5-base (= 4.5.4-1), gcc-4.6 (= 4.6.4-4), gcc-4.6-base (= 4.6.4-4),
 gcc-4.7 (= 4.7.3-6), gcc-4.7-base (= 4.7.3-6), gcc-4.8 (= 4.8.1-8),
 gcc-4.8-base (= 4.8.1-8), gnupg (= 1.4.14-1), gpgv (= 1.4.14-1),
 grep (= 2.14-2), gzip (= 1.6-1), hostname (= 3.13),
 initscripts (= 2.88dsf-43), insserv (= 1.14.0-5), less (= 458-2),
 libacl1 (= 2.2.52-1), libapt-pkg4.12 (= 0.9.9.4), libasan0 (= 4.8.1-8),
 libatomic1 (= 4.8.1-8), libattr1 (= 1:2.4.47-1), libblkid1 (= 2.20.1-5.5),
 libboost-iostreams1.49.0 (= 1.49.0-4), libbz2-1.0 (= 1.0.6-4),
 libc-bin (= 2.17-92), libc-dev-bin (= 2.17-92), libc6 (= 2.17-92),
 libc6-dev (= 2.17-92), libcap2 (= 1:2.22-1.2),
 libclass-isa-perl (= 0.36-5), libcloog-isl4 (= 0.18.0-2),
 libcloog-ppl1 (= 0.16.1-3), libcomerr2 (= 1.42.8-1),
 libcwidget3 (= 0.5.16-3.4), libdb5.1 (= 5.1.29-6), libdpkg-perl (= 1.17.1),
 libept1.4.12 (= 1.0.9), libfile-fcntllock-perl (= 0.14-2),
 libgcc-4.7-dev (= 4.7.3-6), libgcc-4.8-dev (= 4.8.1-8),
 libgcc1 (= 1:4.8.1-8), libgdbm3 (= 1.8.3-12), libgmp10 (= 2:5.1.2+dfsg-2),
 libgmpxx4ldbl (= 2:5.1.2+dfsg-2), libgomp1 (= 4.8.1-8),
 libgpm2 (= 1.20.4-6.1), libisl10 (= 0.11.2-1), libitm1 (= 4.8.1-8),
 liblocale-gettext-perl (= 1.05-7+b1), liblzma5 (= 5.1.1alpha+20120614-2),
 libmount1 (= 2.20.1-5.5), libmpc2 (= 0.9-4), libmpc3 (= 1.0.1-1),
 libmpfr4 (= 3.1.1-1), libncurses5 (= 5.9+20130608-1),
 libncursesw5 (= 5.9+20130608-1), libpam-modules (= 1.1.3-9),
 libpam-modules-bin (= 1.1.3-9), libpam-runtime (= 1.1.3-9),
 libpam0g (= 1.1.3-9), libpcre3 (= 1:8.31-2), libppl-c4 (= 1:1.0-7),
 libppl12 (= 1:1.0-7), libquadmath0 (= 4.8.1-8),
 libreadline6 (= 6.2+dfsg-0.1), libselinux1 (= 2.1.13-2),
 libsemanage-common (= 2.1.10-2), libsemanage1 (= 2.1.10-2),
 libsepol1 (= 2.1.9-2), libsigc++-2.0-0c2a (= 2.2.10-0.2),
 libslang2 (= 2.2.4-15), libsqlite3-0 (= 3.7.17-1),
 libss2 (= 1.42.8-1), libstdc++-4.8-dev (= 4.8.1-8),
 libstdc++6 (= 4.8.1-8), libstdc++6-4.6-dev (= 4.6.4-4),
 libswitch-perl (= 2.16-2), libtext-charwidth-perl (= 0.04-7+b1),
 libtext-iconv-perl (= 1.7-5), libtext-wrapi18n-perl (= 0.06-7),
 libtimedate-perl (= 1.2000-1), libtinfo5 (= 5.9+20130608-1),
 libtsan0 (= 4.8.1-8), libusb-0.1-4 (= 2:0.1.12-23.2),
 libustr-1.0-1 (= 1.0.4-3), libuuid1 (= 2.20.1-5.5),
 libxapian22 (= 1.2.15-2), linux-libc-dev (= 3.10.3-1),
 login (= 1:4.1.5.1-1), lsb-base (= 4.1+Debian12),
 make (= 3.81-8.2), mawk (= 1.3.3-17), mount (= 2.20.1-5.5),
 multiarch-support (= 2.17-92), ncurses-base (= 5.9+20130608-1),
 ncurses-bin (= 5.9+20130608-1), passwd (= 1:4.1.5.1-1), patch (= 2.7.1-3),
 perl (= 5.14.2-21),
 perl-base (= 5.14.2-21), perl-modules (= 5.14.2-21),
 readline-common (= 6.2+dfsg-0.1), screen (= 4.1.0~20120320gitdb59704-9),
 sed (= 4.2.2-2), sensible-utils (= 0.0.9), sysv-rc (= 2.88dsf-43),
 sysvinit (= 2.88dsf-43), sysvinit-utils (= 2.88dsf-43),
 tar (= 1.26+dfsg-6), tzdata (= 2013d-1), ucf (= 3.0027+nmu1),
 util-linux (= 2.20.1-5.5), vim (= 2:7.3.923-3), vim-common (= 2:7.3.923-3),
 vim-runtime (= 2:7.3.923-3), xz-utils (= 5.1.1alpha+20120614-2),
 zlib1g (= 1:1.2.8.dfsg-1)

   (Example naively generated by taking all packages installed
    by pbuilder when building the `hello` package.)

 * Do we want to trim this list? How?
    -> use the access time to files in the various packages
       to determine what was used or not (or another mechanism
       to be notified of packages that matters)
 * Do we want to include arch (eg. `:amd64`) in there? Yes - multiarch means we can have cross-arch deps (but not yet - britney needs work)

Then, the good news: snapshot.debian.org keeps binary packages! but not .changes

make (= 3.81-8.2)
  => http://snapshot.debian.org/package/make-dfsg/3.81-8.2/#make_3.81-8.2

Is there an easy way to script installing a specific set of
binary packages from snapshot? Yes - use a specific date in your sources.list:

deb     http://snapshot.debian.org/archive/debian/20091004T111800Z/ lenny main
deb-src http://snapshot.debian.org/archive/debian/20091004T111800Z/ lenny main
deb     http://snapshot.debian.org/archive/debian-security/20091004T121501Z/ lenny/updates main
deb-src http://snapshot.debian.org/archive/debian-security/20091004T121501Z/ lenny/updates main

What's next?
------------

 * Do we have a “Champion”?… looks like not. :(
 * Fill up a page on the wiki
 * Who wants to have their package build reproducible?
   - Asheesh: alpine
   - Lunar: haveged
   - pabs: iotop (python based)
   - joeyh: debhelper :D
   - lindi: magit
 * [Asheesh] Convince dpkg to fix the 'not calling gzip -n' issue.
 * Another change needed in dpkg: tar --numeric-owner --owner=0
 * [Asheesh, Helmut] Attempt to code a downstream version of dedup.debian.net
   that lets us detect when files change between uploads of a package,
   and then run it on the archive.
 * Automated archive-wide testing of this issue and export to the PTS
 * [rbalint, lindi] libfaketime updates?
   advancing time in faketime with each time() call: https://github.com/wolfcw/libfaketime/pull/20
   [rbalint] replaying timestamp needs bigger changes in faketime, I'm working on those
 * [fil] talk to Ganeff about keeping .changes - hash chain from the Release files needed
 * Script to transform the "Built-Environment" list to
   links to file in the snapshot archives.
 * pbuilder like script that install all the packages in a
   chroot and rebuild the package there.
 * How about a sprint‽ Yes!
   Together with Multi-Arch friends? Sponsorship from ARM?

Other ideas:

 * Research other distros (NixOS?)
 * Research
   https://build.opensuse.org/package/show/openSUSE:Factory/build-compare
 * Deterministic virtual machines
   "ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay" http://www.eecs.umich.edu/virtual/papers/dunlap02.pdf (HTTP 403 currently :-()
   "Debugging operating systems with time-traveling virtual machines" http://www.eecs.umich.edu/virtual/papers/king05_1.pdf (HTTP 403 currently :-()
   "A Particular Bug Trap: Execution Replay Using Virtual Machines" http://arxiv.org/pdf/cs.DC/0310030
   "ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay"
   "Execution Replay for Multiprocessor Virtual Machines" http://www.eecs.umich.edu/~pmchen/papers/dunlap08.slides.ppt



More post-BoF experiments
-------------------------

diff --git a/debian/control b/debian/control
index 1ef9ccd..50b5221 100644
--- a/debian/control
+++ b/debian/control
@@ -7,6 +7,7 @@ Standards-Version: 3.9.4
 Homepage: http://www.issihosts.com/haveged/
 Vcs-Git: git://git.debian.org/git/collab-maint/haveged.git
 Vcs-Browser: http://git.debian.org/?p=collab-maint/haveged.git
+XC-Build-Environment: ${misc:Build-Environment}
 
 Package: haveged
 Architecture: linux-any
diff --git a/debian/rules b/debian/rules
index 04d6fcc..cb2cdf3 100755
--- a/debian/rules
+++ b/debian/rules
@@ -15,3 +15,10 @@ override_dh_auto_configure:
 
 override_dh_strip:
        dh_strip --dbg-package=libhavege1-dbg
+
+override_dh_gencontrol:
+       COLUMNS=999 | dpkg -l | awk ' \
+                       BEGIN { printf "misc:Build-Environment=" } \
+                       /^ii/ { ORS=", "; print $$2 " (= " $$3 ")" }' | \
+               sed -e 's/, $$//' >> debian/substvars
+       dh_gencontrol


This does not work as `dpkg-genchanges` does not substitute
the variable before adding the field in debian/changes! :(
  — Lunar

But it is a trivial patch against dpkg:

diff --git a/scripts/dpkg-genchanges.pl b/scripts/dpkg-genchanges.pl
index 0b004c7..13cedd6 100755
--- a/scripts/dpkg-genchanges.pl
+++ b/scripts/dpkg-genchanges.pl
@@ -516,4 +516,5 @@ for my $f (keys %remove) {
     delete $fields->{$f};
 }
 
-$fields->output(\*STDOUT); # Note: no substitution of variables
+$fields->apply_substvars($substvars);
+$fields->output(\*STDOUT);



--------------------------------------------------------

-----------------------------------------------------------