System testing
==============

system-testing@flarn.net
system-testing-subscribe@flarn.net
http://vlists.pepperfish.net/cgi-bin/mailman/listinfo/system-testing-flarn.net

Goals of conversation
---------------------

* Looking to come up with a standard set of tools for system testing
* Looking to come up with shareable tests where possible
* Slide titles so Lars and I don't look like morons and have to sing during
  our talk slot on Saturday. (SRSLY you don't want me to sing)
   Identify backing track to be played on PA  (The Chicken Song and Mah Nah Mah Nah)

Examples of where system testing is needed
------------------------------------------

* distbuild clusters, for development
* mail clusters, for sysadmins
* hostapd deployment, can't be mocked
* is Debian testing working in supported setups? including with
  upgrades from previous stable, after specific apps have actually
  been used

Real customer project
---------------------

* Customer project consists of a few systems, with various hardware
  attached, including a remote control.
* Automated testing.
    - mostly drive the remote control unit using an Android testing
      framework by sending key events, and observing results by
      inspecting Android events, but trying to be as black box as
      possible: do correct screens occur? suspicious messages in log
      files? is audio playing back correctly (via headphone socket to
      line-in in test pc)? is wifi essid visible? plus maybe more.
    - various tests using those inputs to test various behaviours in
      various scenarios: does audio play from DVD vs USB vs SD, etc
    - automated installation:
        - setup wifi to transfer files
        - "a lot of messing about with serial terminals", using expect
        - expect doesn't work with serial ports, so they have a small
          custom tool instead
    - all tools built from scratch, but kept to a minimum
    - run tests, gather results, put them in the place with the build
    - tests triggered automatically by creating a new directory with
      a new image
    - another script gathers results and produces an HTML summary
* Biggest problem: unreliability of hardware
    - setting up wifi link
    - automating serial link
    - try to automatically notice when things are just failing, rather
      than tests failing, and then rerunning the test
* We may need to worry about having many test controlling machines
  connected to the devices being tested.
    - performance
    - identification of devices (some devices have the same serial
      number)

Use Cases
---------

Physical:

* Developer with a devboard on their desk, running tests against
  the devboard
* CI testing the same devboard
* Developer with a consumer electronics product on their desk
  - devboard might have Ethernet on it, whereas a mobile phone doesn't
* CI testing consumer electronics devices
* Developer testing a server appliance
* CI testing a server appliance
* Each of the above, but multiple devices or appliances
  - running an isolated test vs running a test in a controlled
    environment
* Sysadmin testing a combination of services in pre-production
* Sysadmin testing a production environment

Other:

* Functionality
  * Boundary testing
* Performance
* Stress testing
* Security testing
  - e.g., fuzzing

Some requirements from Baserock
-------------------------------

- configured and triggered entirely in git
   - including test environment
- able to test a multi-system deployment
- able to test unmodified production systems
- able to modify systems for testing
- able to test the hardware: power, audio I/O, video I/O, etc
    - able to use hardware to test the product: switchable power
      supply, robot fingers, video camera, etc
- recoverable test design: can deal with random glitches at the
  hardware level by restarting and resetting to known sequence points
  in the test (SBQS restart)
    - tests and test frameworks need to be reliable in general
- able to cope with many-to-many relationships between tests and
  systems (software and hardware platforms)
- multiple repositories of tests that can be combined into a test
  suite such that the repos have different security domains
- result aggregation needs to be screenable
- intermediate results, tests in progress, need to be viewable
- able to schedule sets of tests differently: every commit, once a
  day, etc
- able to schedule tests based on what was changed: schedule all audio
  tests when audio code got changed, for example; this might mean
  exposing to the test environment enough info (e.g., change set) to
  decide what needs to be run
- able to schedule tests based on how long they take to run (incl. setup/cleanup)
- able to group tests by provisioning requirements?
- priority queue
- interruptible
- dynamic test hardware pools

Some requirements (behaviours) from LAVA
----------------------------------------

- configured and triggered using JSON and YAML, local or in version control (git, bzr)
- requires a serial connection, including SOL
- multi-node support available with many-to-many relationships.
- auto-login support for unmodified systems
- hardware testing via LMP stack in final development: SD, power, eth, hdmi, audio, GPIO, sata, USB
  - Linaro multi probe, open hardware
- recoverable test design via master images / test deployment
- tests definitions collated into single deployments with or without reboots in between
- result aggregation for all test runs immediately available
- filters and image reports based on result bundles, immediately updated.
- CI integration via Jenkins and other tools
- tools in development to select sets of tests from a single repository based on
  criteria including device availability.
- interactive boot command support for kernel and bootloader testing
- android, ubuntu/debian, openembedded & fedora OS support
- emphasis on real hardware but also virtualisation support to run test environments
  on platforms without trashing the base system
- lava-tool support for full command line submission
- result aggregation can be public or private to logged in users or groups of users
- test LAVA inside LAVA
- priority manipulation

Limitations in LAVA
-------------------

- frequently used for development of software for new platforms, so running on
  emulators, models or development boards which have reliability issues.

Developments in LAVA unreleased
-------------------------------

- temporary boards
- devices allocated only to certain groups
- idle time selection of tests

Tools to look at
================

* autopilot (Ubuntu Touch uses this)
* utah project in Ubuntu
  - Utah has an "everything which happened" YAML output thingy
* national instruments labview stuff for hardware (Andy Simpkins)
* TAP for test result aggregation
* piuparts
* autopkgtest (DEP-8) can test reverse dependencies and stipulate dependencies
  needed for the autopkgtest. Entry in debian/control to identify which packages
  support autopkgtest.

Other ideas/requirements
========================

- Ability to re-run old test sets (e.g. if there was a bug in the suites)
- support for snapshots of user systems (dirty systems)
  - storage of anonymised snapshots for others to copy and re-test
- portable suite in case anonymised data does not show the bug, suite
  can be relocated to an environment where the data can be used intact.

Some requirements from conversations at Debian conference
=========================================================

* Able to help test debian-cd
* Easy to get started in a small situation (hacker at home)
* We're not trying to replace manual testing.
* System upgrade testing would be good for Debian Release-Team
  * Including "used" systems
  * Not just stable->testing but perhaps oldstable->stable->testing or even more
* using piuparts...
  * maybe we need directed "pseudo-meta-packages" for combining package tests.
* Modular