Everything else that is not in other documents Mon: 08:30 Breakfast 09:15 Introduction 09:20 Going through agenda and sorting 09:45 status reports Building and publishing images. We agreed for some things last years, we need to keep working on those Account ownership. DDs (or people interested in working on Clouds) should be able to have access to appropriate accounts Only single variant of image, or many (e.g. minimal, full, integrated with vendor stuff?) Publish as ready-to-use or also to download (e.g. on get.d.org, cloud.d.o). Related to this - publishing images to OpenStack. API, hooks Supervised or automatic push? Status on variuos clouds We have 4 clouds represented in the room. Azure ===== D is running fine, growing in usage and popularity. Support needs addressing. Different policies. Timelines of supports? Cloud kernel - good, interesting, beneficial. Already used for some images on Azure Official images are still built using custom scripts, not FAI. Script derived from Openstack script. SDK, daemons, CLI: own repository. Do we try to integrate it into Debian repositories? Update speed Cloud init - not currently using it, but customers are asking Road map to move to common set of tools (esp. FAI) * For consistency * Easy for end users to build custom images based on official configs * Should evangelise doing this! Daily Images are removed after 14 days after release GCE === D is still the default image Still having pain getting guest code into Debian * Currently own repository and package for both guest (https://github.com/GoogleCloudPlatform/compute-image-packages) and SDK. * The SDK will possibly never will be part of Debian but the guest will be and is in process of being added as a Debian package (https://salsa.debian.org/debian/google-compute-image-packages) * The SDK (gcloud) is not needed to use the images and is optional for users. Don't really care about cloud-init, but some users do. * Slow, (adds an average of 5 seconds to boot time), only boot-time config (not runtime) * Too many dependencies, hard to maintain. * Not against cross-distro, cross-platform tool, but existing one is not good enough. UEFI wanted, with Secure Boot Still using bootstrap-vz for now, going to move to FAI for buster+. Buster image with FAI has issues - can we resolve? Maintaining GCE guest SW in Debian is a worry Security and EOL notices * Reasonable (6m?) notice for EOL of a release * GCE's concept of deprecating images: image is not recommended or chosen by default, but users can choose it (by choosing to see deprecated images) and use it with warnings * Deprecation of weekly images (the same for all clouds) (Azure deletes them after 14 days) Is Debian enterprise distro? Companies support (financially) LTS AWS === Stretch generated using FAI, regenerated regularly Users use marketplace or other way (without statistics via marketplace) D is default for Kubernetes COPS - suggests good quality of images. But uses Jessie with backported kernel We (most probably) run on all instance types. Problems with gov cloud. Requires paperwork; ownership of account by D should help here cloud-init and aws cli is included in the image not using the cloud kernel yet, missing a couple of drivers? see BTS Things are stable, quite OK Openstack ========= Still using the old build script, but happy to move to FAI (with help!) Building regularly for Jessie and Stretch, and weekly buster builds amd64 and arm64 Kernel usage. Currently used backported, suggestion not to use cloud kernel. Open Stack requires more drivers. Many platforms (Xen, kvm, etc.). But not all supported (only kvm, or more?) We might not support everything, but users can build (and maintain) their own variants or configurations virtio is covered (what does it mean) Basic support for arm64 Still wanting to add ppc64el (and s390x?) Brasilian university providing access to hardware? Cloud kernel ============ Cloud kernel. Idea: small footprint, quick start. Removed many drivers (e.g. PCMCIA, etc.) Bit of mismatch of expectations, e.g. removal of filesystems (AFS?) Official kConfig. What else is needed or can be removed. We need to document, especially _why_ something was removed or configuration change We cannot have more variants of kernel. Release/security/kernel team agreed for one kernel for cloud needs, but not more Config for Cloud kernel: https://salsa.debian.org/kernel-team/linux/blob/master/debian/config/amd64/config.cloud-amd64 Workflow of building ==================== qemu-vm project on Salsa Salsa with CI functionality: https://salsa.debian.org/waldi/debian-cloud-images/ Run after each push, so we can see if anything got broken Building images, normal jobs are running on GCE with an option to build things on casulana too. Gitlab CI runner. Runner asks to start CoreOS images x86 (only architecture so far) Docker runs there, and complete builder runs inside this docker Takes 50s to bootstrap; one VM (and one docker inside) per job How do we upload those images? Images are uploaded to artifacts on Salsa. CoreOS is not Debian. But problems with Docker on Debian (is it still current?) Docker machine to maintain VMs. Makefile is not used by Salsa but might be needed to run/build locally Needs documentation We can run on Casulana. Only protected branches are built on Casulana. Building images using UEFI. Debian Grub packages only support installing either BIOS boot or UEFI. grub-cloud adds an option to support both. Many variants: BIOS, UEFI, GPT, MBR, MsDos partitions. Support issues for Grub maintainers EFI hybrid images - use GPT with a protective MBR too. That way boot either via BIOS or UEFI UEFI is the right way in the future, and then Secure Boot GCE has flags that can be added to an image, to define (e.g. UEFI support) Azure does gen1 or gen2 VM format - gen2 is defined as using UEFI AWS doesn't (yet) do UEFI, but we expect it to happen in future Auto-growing images at boot? Not currently working with FAI. Will need extra tweaks to deal with changing the size of a GPT setup. cloud-init can grow partitions - does it grow filesystems? systemd script to expand things. needs to be taught about the two partitions needed in a GPT+UEFI context. it already handles the simpler BIOS case. Locations ========= Currently only GCE. Casulana build is freshly baked in but not working fully yet. CI runs as user cloud? Casulana is used to build CD images Casulana is build box, not publish. pettersson is used to publish; it'll go away but not yet. We need space: 100-200GB. We have space 2nd publish location but not done yet Need for redundancy. Hosting, geography, hardware. Publishing from Casulana to other locations. New jobs. Need for glue to join everything (building, publishing, testing) Supervision of publishing. * registering (storing binary artifacts) * publishing to providers, especially for release, should require human in the loop * publish to all platforms at once * use day inside name (or identifier) not build number (which might differ) Providing New web site. More priority for downloading; might include image finder With CI we have JSON with metadata which should help here Need for volunteers to deal with it Link to providers' images. e.g AWS: link to market place, CLI arguments, AMI IDs, etc. Per region! Ubuntu send you to market place, but also gives AMI ID Ubuntu does not like their own tool, prefer SuSE Look at the Suse "pint" tool which works with AMI metadata to help find the right image to run etc. - https://github.com/SUSE/Enceladus/tree/master/susePublicCloudInfoClient OpenStack providers? 18 providers currently. Bad thing to provide links only to 3 biggest Every image has UUID. Potentially many UUIDs to publish Also private OpenStacks. They should download latest from Casulana, but then we need to provide links to those (appropriate) images Outreachy/GSoC project? come back to this later - let's get the images and metadata first! GOVERNANCE ========== Next level of relationship between Debian/SPI and cloud providers. Delegate to the team instead of going through leader. Delegates vs. assistants More than 1 delegates (preferably 3) Individual can create instance (long-lived) and then lose Debian account. Need for way for DSA (?) to be able to terminate those resources Need for specific per-cloud solution for this. E.g. management of private ssh keys No long-lived instances (?) or long-lived managed by DSA and not by DDs? Software development lifecycle ============================== Testing ------- We have a runner that will test AWS and GCE images. * Starts an instance, runs a script inside it. Roadmap: * Static tests outside the image * Test basic Debian stuff in the image on casulana using kvm to run an instance * Push to the specific provider to test the provider-specific bits Needs to be integrated into our workflow Test framework decoupled from tests Run tests on all combinations of hardware and configuration (e.g. instance types, number of disks, etc.) Do we need package? Good idea for people who build own images, so they can test those images Pipelines: run all tests when building new image. But do we rerun when we push new tests? pipeline is per project google released their own tests as FLOSS We could use this, or at least treat as template github.com/GoogleCloudPlafrom/compute-image-tools GCE Integration Tests * Daisy is a GCE workflow tool- its an API client. Written in Go. (github.com/GoogleCloudPlafrom/compute-image-tools) * The test framework uses the GKE (kubernetes) open source system (Prow) and associated modules. (https://github.com/kubernetes/test-infra/) * Examples of tests: test on 2 instances, one ssh to another, etc. How complex should it be? gitlab/salsa Layout of branches Need for review of merge requestes * then we can remove direct access to master or similar branches and require merge requests Which branch triggers pipeline run? Stable images should be buildable from stable Conflicts of FAI versions (configuration changes) The same policy for cloud images like for Debian. Stable stays stable. Forking/branching FAI and config? * but how then we merge them? * backporting changes to stable * or cherry-picking security fixes (or similar) Unstable can be used for experimentation OpenStack only builds testing and stable currently (ie: no Sid images) Current gitlab runners use the same VMs for all builds. Do we need to split them? Do we need policy for managing those repositories and changes Especially for fai and debian-cloud-images (formerly fai-cloud-images) Bot publishing info about repository changes including merge requests branch names like stretch, buster (not "stable" as it'll change) Similar to gpb process (but not the same). No direct commits to named branches need of cleanup for branches (removal of old, clean naming) We should not have things running in pipeline without somebody doing code review/requests merge Separete repository for building images and publishing it Former is (should be) treated as more stable and not changing (especially for stable) The latter might need change, e.g. cloud providers change API for publishing images and we need to update code during stable Two workflows One automatic for daily Another manual for official images where we'll publish manually after some review (to marketplace) Also removal. Both from market place but also from storage We use 4T on Azure We have all cd images for all images since 3.x No ISO but Jigdo for most, so we can rebuild ISO when needed We point to snapshots so no problem with Ftp cleaning Tuesday -------------------------------------------------------------------------------- Image contents ============== unattended-upgrades ------------------- Need to discuss this again. We're enabling it in AWS and GCP images and people seem happy. Debian-security team don't like this. Need to discuss properly. waldi thinks that d-i in buster is installing enabling u-u by default. Need to check default user name ----------------- openstack adds a "debian" user AWS is using "admin" do we care about the difference? we're using separate images already aws might have an agent in the future, so maybe having a single image for openstack and aws won't be possible. emergency service stuff ----------------------- openstack adds emergency.service and rescue.service. What for? Needed? sulogin on tty0 we think this is obsoleted by newer systemd stuff cloud-initramfs-growroot ------------------------ used to grow the rootfs - maybe not needed any more - growpart script wanted for GCE and anybody else not using cloud-init now in "cloud-guest-utils" partition and fs growing is a common problem. waldi to look into this Some GPT setup has partition 1 at the end. Previously parition 1 had to be at beginning to boot from Now it changed. But there are (maybe?) still some scripts which only grow partition 1 (hardcoded in source) If cloud-initramfs-growtoot might be removed if we check that it's not used anymore network bringup --------------- different providers have different network startup scripts, mostly very similar use systemd-networkd? IPv6 poses some problems, and there are hacks to fix it But it adds many seconds to boot time Problems between ifup/systemd. systemd might fix it now (for Buster) but it'll require switching to systemd networkd It does not fit well for WiFi, and also cloud but not hard data, mostly anegdotes of people with problems Now desktops depend on network-manager We still install ifup/down but it's disabled gce/azure - diff only blank line aws: configures 8 interfaces, IPv6 helper script to determine v6 or not ifup does not work for dynamic attachment; but dynamic is possible on clouds We'll leave differences for now; we might return to this once there is one good solution There will be differences, e.g. configuration of 25/40Gbps for AWS, etc. All should include e/n/interfaces.d/* Also all VPCs and point-to-point configuration so /e/n/i.d/* should be done SuSE has cloud-net-config. It's framework for configuring it. Should we look into this, or leave for the future Need for stable networking interfaces' names It looks like there is many solutions, each with own problems and deficiencies Noah proposed testing udev rules serial-getty@.service Zigo uses it to have serial console in OpenStack Now systemd should start getty automatically so it might not be needed anymore but we still need to check. If it does not work, it's bug which needs to be reported and fixed ttyS0 is not always first. ARM64 has ttyAMA0 Buster ====== Any changes we need or will affect us? systemd-networkd? no grub_cloud - all the x86 images should use hybrid images by default now We don't need one metapackage (to pull dependencies) as we control what we install anyway hybrid image? All clouds use (should use) EFI uboot should not be used. usage of grub with it is problematic, depends on uboot details add a grub-cloud-arm64 package with simple config - maybe just install grub-efi-arm64? cloud kernel image only amd64 for now want an arm64 equivalent too need some work to make it more generic before adding new architectures add ppc64el? do that when somebody cares kernel-clod has problems on AWS. Does not have drivers for ENI (more performant networking) Should work on lower instances though, but people might want to use better machine types ;-) cloud-kernel might be default for clouds might be problematic for OpenStack but we should at least test it on AWS/AZU/GCE May need to add some cloud-specific drivers? OS images are using this image already for buster, no complaints so far OpenStack bare metal provider would need to have generic kernel Build both when we have automation? We might need to add back some small amount of HW drivers, e.g. NVMe for AWS instances (like m5d) Back-porting such drivers to stable kernels? Python3 Remove everything for Buster which is not Python3 There is already lintian check suggesting only python3 Python2 will stay in Buster, will be removed from Bullseye Buster Freeze: Q1 2019 Should we try to release cloud images at release day? Azure has long rollout time (up to 1 week) But we should announce that we're releasing Prelease images, beta during freeze We have D-I alpha/beta, so we could (should) do the same for cloud images We already have weekly images so we could (re-)use them But we might want to rename them or give more publicity Release team has release criteria; do we want similar for clouds? Removal of RC-buggy packages. There are key packages which are not removed; we should add cloud-critical packages to it Take packages from fai config space Non-x86 ports ============= arm64 OpenStack There is ppc64el in Brasil Uni. OpenStack Only OpenStack has non-x86 AWS/AZU/GCP are only doing amd64? Cross-building FAI checks if target is the same as host, uses qemu-static if they differ Emulation: slow, but works for now casulana uses kvm for x86 or qemu static for other architectures (1/20 of speed) Summary: we support other architectures, we'll solve problems if they arrive The biggest problem might be grub/booting We won't be adding new architectures unless there are real providers using them Secure boot =========== All providers want it, as customer start expecting (demanding) it It'll be opt-in for customers Debian want to have Secure boot for Buster (release target) Much work done during DebConf 2018. Next alpha installer should have it Software is ready Additional kernel modules. GPU drivers require them Others - we don't know Still problem with bootstraping and adding new keys Kernel and then later stages verification Debian should cope with it; software should be ready, might require scaling up Platforms should have MS key and that's all (just like desktops) qemu secure boot? is ready, just need to add command line arguments with key details (OVMF) Non-x86 differs No root CA, like MS is for x86 HPE shipped HW with ability to add new keys (non-enforcing mode?) Debian LTS ========== If there are packages we'll build new images? First. Do we want to provide LTS images? Cloud users are usually short-lived. OTOH COPS still uses Jessie (with custom kernel) Problem with environment/community They use old, outdated images (e.g. Jessie) as basis. Also. Debian volunteer does not provide LTS Paid people provide support We should state that we're not providing support exceeding oldstable support LTS vs backports security as best effort Users want to have more than 3 years Requires effort to keep images supported Keeping images for some time, but reduce discoverability We won't provide images for LTS, but won't prevent users from pointing to LTS repository if they want to But they won't be official D images Images including packages from backports ======================================== Mostly the kernel, but in the past has also included things like ssh Most of other packages are doable by cloud-init (or similar) Kernel is special as this requires restart (reboot) AZU currently uses backported kernel (kernel from backports) They (Azure or users? Users) want it and use it Proposal to do the same for all providers Backports (unlike LTS) is official part of Debian, hosted on our infrastructure, maintained by DDs or DMs, etc. ssh was used from backports for GCE for performance/cipher reasons We should label it, describe/document it and be OK with it Possibility is included in FAI Multiple variants of images =========================== Variation on previous one But also non-free, not always needed, vendor stuff Debian base (minimum) Amazon Linux provides base image but it's not very popular (or is it?) What's removed? it has cloud-init but people want to remove it No official list with removed stuff, or difference to standard one We have FAI class with differences We might want to have cloud-init, some basic agents But don't include SDKs or cloud CLI, etc. Be good citizen (VM) Putting SDK (GCE one) to Debian Google people don't care about Debian (are not DDs) GCE puts new version every week, and that won't change Put it to unstable and prevent from migrating? But then we won't have backports (they should be releasable) We should have one config space to build all variants Otherwise it'll be divergence OTOH we'll be diluting "Debian" by providing images with non-debian repository Also, those repositories might end up containing non-free software Cannonical was republishing SDKs in own repository but it was 6-8 weeks out of date Users were complaining about missing support for new features, regions, etc. Use cloud-init to add other repositories? It'll increase boot time There is no one clean solution We (Debian) are not doing good job maintaing fast moving software (docker, kubernetes, GCE SDK, etc.) It's not one-time effort but continuous We might include FAI classes for vendor repositories, without doing them Do we want to provide many variants of images, or just make it easy to people to build own images? We've chosen FAI to make easy to build own variants We might build some important variants but not to many Having tests will also make sure that user can be assured that their changes did not break anything We might revisit it in the future The cloud providers should write the FAI configs and include them into the cloud repo. Any ideas for class names? SSL/TLS 1.2? Recent OpenSSL release disables some ciphers and TLS below 1.2 It might break some software deployed in the cloud We shall not prevent users from e.g. enabling old, insecure ciphers But at the same time we should have sensible default configuration cloud-init ---------- 1 RC bug - current version in unstable FTBFS a few other bugs to fix upstream work - development is ongoing is it good for us? forking / replacing has been attempted by various folks, but never takes off sprint happened recently notes - https://docs.google.com/document/d/1-gctZNXA9oshxyDsMqoWAw66ll5MBBpvRkdA6wzMYDk/edit CLA no longer a problem, it seems maybe slow to get patched accepted upstream cloud.debian.org Put small page there with links to built images Not we must use salsa artifactory which is non really intuitive It's also temporary space cloud.d.org (alias of pettersson) can also be used to host WRAP UP ============================ We have FAI config files for all the providers We want debian accounts SPI umbrella for accounts We will probably have delegation from DPL We have salsa pipelines to build images We don't have an image finder Timeline: Buster. Freeze - Jan 2019, we should have: accounts mirror names beta images with those Bastian: mirror EFI - secure boot cloud kernel config cleanup Luca + Jimmy as SPI Debian accounts and legal agreements with the providers Luca delegation authentication, authorisation (not required for Buster) driving the cloud mirrors conversation through DSA Send email about reimbursement Jimmy help luca with auth Ross integrating tests with the build pipeline re-org of test framework Tomasz Delegate Summarize notes of the sprint integration tests Planning to go through the mirror Helen work on tests - framework, integration etc. nm work :-) secure boot (+steve + lucas) Martin SPI and DSA work, not much free time otherwise Debian-cloud-announcement ML Steve M Delegation images vs cloud team on casulana cloud-kernel for arm64 help zigo run arm64 OpenStack arm64 host (not sure here) secure boot Organizing monthly meetings Thomas L likely to be occupied - surgery review pull requests, etc Thomas G openstack ppc64el build and test Noah continue maintaining the stretch images for AWS developing buster aws images automatic building/registering from casulana coordinate creating new AWS account (including publishing on gov cloud) Lucas secure boot image finder :-D David Duncan New account structure help with mirrors help with rollout of tests on AWS And try to open source existing tests Set up monthly sync with SPI Steve Z accounts for publishing user account federation (for DDs etc.) EFI help for gen v2 (new HW) - make sure it works on Buster whitelist waldi to access gen-2 capable region access to team for test infrastructure (not defined timeline) DONT REMOVE THE EMPTY LINES, it makes sure zobel does not to need to scroll all the time....