System testing ============== system-testing@flarn.net system-testing-subscribe@flarn.net http://vlists.pepperfish.net/cgi-bin/mailman/listinfo/system-testing-flarn.net Goals of conversation --------------------- * Looking to come up with a standard set of tools for system testing * Looking to come up with shareable tests where possible * Slide titles so Lars and I don't look like morons and have to sing during our talk slot on Saturday. (SRSLY you don't want me to sing) Identify backing track to be played on PA (The Chicken Song and Mah Nah Mah Nah) Examples of where system testing is needed ------------------------------------------ * distbuild clusters, for development * mail clusters, for sysadmins * hostapd deployment, can't be mocked * is Debian testing working in supported setups? including with upgrades from previous stable, after specific apps have actually been used Real customer project --------------------- * Customer project consists of a few systems, with various hardware attached, including a remote control. * Automated testing. - mostly drive the remote control unit using an Android testing framework by sending key events, and observing results by inspecting Android events, but trying to be as black box as possible: do correct screens occur? suspicious messages in log files? is audio playing back correctly (via headphone socket to line-in in test pc)? is wifi essid visible? plus maybe more. - various tests using those inputs to test various behaviours in various scenarios: does audio play from DVD vs USB vs SD, etc - automated installation: - setup wifi to transfer files - "a lot of messing about with serial terminals", using expect - expect doesn't work with serial ports, so they have a small custom tool instead - all tools built from scratch, but kept to a minimum - run tests, gather results, put them in the place with the build - tests triggered automatically by creating a new directory with a new image - another script gathers results and produces an HTML summary * Biggest problem: unreliability of hardware - setting up wifi link - automating serial link - try to automatically notice when things are just failing, rather than tests failing, and then rerunning the test * We may need to worry about having many test controlling machines connected to the devices being tested. - performance - identification of devices (some devices have the same serial number) Use Cases --------- Physical: * Developer with a devboard on their desk, running tests against the devboard * CI testing the same devboard * Developer with a consumer electronics product on their desk - devboard might have Ethernet on it, whereas a mobile phone doesn't * CI testing consumer electronics devices * Developer testing a server appliance * CI testing a server appliance * Each of the above, but multiple devices or appliances - running an isolated test vs running a test in a controlled environment * Sysadmin testing a combination of services in pre-production * Sysadmin testing a production environment Other: * Functionality * Boundary testing * Performance * Stress testing * Security testing - e.g., fuzzing Some requirements from Baserock ------------------------------- - configured and triggered entirely in git - including test environment - able to test a multi-system deployment - able to test unmodified production systems - able to modify systems for testing - able to test the hardware: power, audio I/O, video I/O, etc - able to use hardware to test the product: switchable power supply, robot fingers, video camera, etc - recoverable test design: can deal with random glitches at the hardware level by restarting and resetting to known sequence points in the test (SBQS restart) - tests and test frameworks need to be reliable in general - able to cope with many-to-many relationships between tests and systems (software and hardware platforms) - multiple repositories of tests that can be combined into a test suite such that the repos have different security domains - result aggregation needs to be screenable - intermediate results, tests in progress, need to be viewable - able to schedule sets of tests differently: every commit, once a day, etc - able to schedule tests based on what was changed: schedule all audio tests when audio code got changed, for example; this might mean exposing to the test environment enough info (e.g., change set) to decide what needs to be run - able to schedule tests based on how long they take to run (incl. setup/cleanup) - able to group tests by provisioning requirements? - priority queue - interruptible - dynamic test hardware pools Some requirements (behaviours) from LAVA ---------------------------------------- - configured and triggered using JSON and YAML, local or in version control (git, bzr) - requires a serial connection, including SOL - multi-node support available with many-to-many relationships. - auto-login support for unmodified systems - hardware testing via LMP stack in final development: SD, power, eth, hdmi, audio, GPIO, sata, USB - Linaro multi probe, open hardware - recoverable test design via master images / test deployment - tests definitions collated into single deployments with or without reboots in between - result aggregation for all test runs immediately available - filters and image reports based on result bundles, immediately updated. - CI integration via Jenkins and other tools - tools in development to select sets of tests from a single repository based on criteria including device availability. - interactive boot command support for kernel and bootloader testing - android, ubuntu/debian, openembedded & fedora OS support - emphasis on real hardware but also virtualisation support to run test environments on platforms without trashing the base system - lava-tool support for full command line submission - result aggregation can be public or private to logged in users or groups of users - test LAVA inside LAVA - priority manipulation Limitations in LAVA ------------------- - frequently used for development of software for new platforms, so running on emulators, models or development boards which have reliability issues. Developments in LAVA unreleased ------------------------------- - temporary boards - devices allocated only to certain groups - idle time selection of tests Tools to look at ================ * autopilot (Ubuntu Touch uses this) * utah project in Ubuntu - Utah has an "everything which happened" YAML output thingy * national instruments labview stuff for hardware (Andy Simpkins) * TAP for test result aggregation * piuparts * autopkgtest (DEP-8) can test reverse dependencies and stipulate dependencies needed for the autopkgtest. Entry in debian/control to identify which packages support autopkgtest. Other ideas/requirements ======================== - Ability to re-run old test sets (e.g. if there was a bug in the suites) - support for snapshots of user systems (dirty systems) - storage of anonymised snapshots for others to copy and re-test - portable suite in case anonymised data does not show the bug, suite can be relocated to an environment where the data can be used intact. Some requirements from conversations at Debian conference ========================================================= * Able to help test debian-cd * Easy to get started in a small situation (hacker at home) * We're not trying to replace manual testing. * System upgrade testing would be good for Debian Release-Team * Including "used" systems * Not just stable->testing but perhaps oldstable->stable->testing or even more * using piuparts... * maybe we need directed "pseudo-meta-packages" for combining package tests. * Modular