Everything else that is not in other documents

Mon:
08:30 Breakfast
09:15 Introduction
09:20 Going through agenda and sorting
09:45 status reports


Building and publishing images. We agreed for some things last years, we need
to keep working on those

Account ownership. DDs (or people interested in working on Clouds) should be
able to have access to appropriate accounts

Only single variant of image, or many (e.g. minimal, full, integrated with vendor
stuff?)

Publish as ready-to-use or also to download (e.g. on get.d.org, cloud.d.o).
Related to this - publishing images to OpenStack. API, hooks
Supervised or automatic push?

Status on variuos clouds
We have 4 clouds represented in the room.

Azure
=====
D is running fine, growing in usage and popularity.
Support needs addressing. Different policies. Timelines of supports?
Cloud kernel - good, interesting, beneficial. Already used for some images
on Azure
Official images are still built using custom scripts, not FAI. Script derived
from Openstack script.
SDK, daemons, CLI: own repository. Do we try to integrate it into Debian repositories?
Update speed
Cloud init - not currently using it, but customers are asking
Road map to move to common set of tools (esp. FAI)
 * For consistency
 * Easy for end users to build custom images based on official configs
   * Should evangelise doing this!
Daily Images are removed after 14 days after release

GCE
===
D is still the default image
Still having pain getting guest code into Debian
 * Currently own repository and package for both guest (https://github.com/GoogleCloudPlatform/compute-image-packages) and SDK.
 * The SDK will possibly never will be part of Debian but the guest will be and is in process of being added as a Debian package (https://salsa.debian.org/debian/google-compute-image-packages)
 * The SDK (gcloud) is not needed to use the images and is optional for users.
Don't really care about cloud-init, but some users do.
 * Slow, (adds an average of 5 seconds to boot time), only boot-time config (not runtime)
 * Too many dependencies, hard to maintain.
 * Not against cross-distro, cross-platform tool, but existing one is not good enough.
UEFI wanted, with Secure Boot
Still using bootstrap-vz for now, going to move to FAI for buster+.
Buster image with FAI has issues - can we resolve?
Maintaining GCE guest SW in Debian is a worry
Security and EOL notices
 * Reasonable (6m?) notice for EOL of a release
 * GCE's concept of deprecating images: image is not recommended or chosen by default, but users can choose it (by choosing to see deprecated images) and use it with warnings
 * Deprecation of weekly images (the same for all clouds) (Azure deletes them after 14 days)

Is Debian enterprise distro? Companies support (financially) LTS 

AWS
===
Stretch generated using FAI, regenerated regularly
Users use marketplace or other way (without statistics via marketplace)
D is default for Kubernetes COPS - suggests good quality of images. But uses Jessie with backported kernel
We (most probably) run on all instance types.
Problems with gov cloud. Requires paperwork; ownership of account by D should help here
cloud-init and aws cli is included in the image
not using the cloud kernel yet, missing a couple of drivers? see BTS
Things are stable, quite OK

Openstack
=========
Still using the old build script, but happy to move to FAI (with help!)
Building regularly for Jessie and Stretch, and weekly buster builds
amd64 and arm64
Kernel usage. Currently used backported, suggestion not to use cloud kernel.
Open Stack requires more drivers.
Many platforms (Xen, kvm, etc.). But not all supported (only kvm, or more?)
We might not support everything, but users can build (and maintain) their own variants or configurations
virtio is covered (what does it mean)
Basic support for arm64
Still wanting to add ppc64el (and s390x?)
Brasilian university providing access to hardware?

Cloud kernel
============
Cloud kernel. Idea: small footprint, quick start. Removed many drivers (e.g. PCMCIA, etc.)
Bit of mismatch of expectations, e.g. removal of filesystems (AFS?)

Official kConfig. What else is needed or can be removed. We need to document,
especially _why_ something was removed or configuration change

We cannot have more variants of kernel. Release/security/kernel team agreed for one kernel
for cloud needs, but not more

Config for Cloud kernel: https://salsa.debian.org/kernel-team/linux/blob/master/debian/config/amd64/config.cloud-amd64

Workflow of building
====================

qemu-vm project on Salsa

Salsa with CI functionality: https://salsa.debian.org/waldi/debian-cloud-images/

Run after each push, so we can see if anything got broken
Building images, normal jobs are running on GCE with an option to build things on casulana too.

Gitlab CI runner.
Runner asks to start CoreOS images x86 (only architecture so far)
Docker runs there, and complete builder runs inside this docker
Takes 50s to bootstrap; one VM (and one docker inside) per job
How do we upload those images?
Images are uploaded to artifacts on Salsa.
CoreOS is not Debian. But problems with Docker on Debian (is it still current?)
Docker machine to maintain VMs.
Makefile is not used by Salsa but might be needed to run/build locally
Needs documentation
We can run on Casulana. Only protected branches are built on Casulana.
Building images using UEFI.
Debian Grub packages only support installing either BIOS boot or UEFI. grub-cloud adds an option to support both.
Many variants: BIOS, UEFI, GPT, MBR, MsDos partitions. Support issues for Grub maintainers

EFI hybrid images - use GPT with a protective MBR too. That way boot either via BIOS or UEFI

UEFI is the right way in the future, and then Secure Boot
GCE has flags that can be added to an image, to define (e.g. UEFI support)
Azure does gen1 or gen2 VM format - gen2 is defined as using UEFI
AWS doesn't (yet) do UEFI, but we expect it to happen in future

Auto-growing images at boot? Not currently working with FAI. Will need extra tweaks to deal with changing the size of a GPT setup.
cloud-init can grow partitions - does it grow filesystems?
systemd script to expand things. needs to be taught about the two partitions needed in a GPT+UEFI context. it already handles the simpler BIOS case.


Locations
=========
Currently only GCE. Casulana build is freshly baked in but not working fully yet.
CI runs as user cloud?
Casulana is used to build CD images
Casulana is build box, not publish. pettersson is used to publish; it'll go away
but not yet.
We need space: 100-200GB. We have space
2nd publish location but not done yet
Need for redundancy. Hosting, geography, hardware.
Publishing from Casulana to other locations.
New jobs. Need for glue to join everything (building, publishing, testing)
Supervision of publishing.
 * registering (storing binary artifacts)
 * publishing to providers, especially for release, should require human in the loop
 * publish to all platforms at once
 * use day inside name (or identifier) not build number (which might differ)
 

Providing 
New web site. More priority for downloading; might include image finder
With CI we have JSON with metadata which should help here
Need for volunteers to deal with it
Link to providers' images.
e.g AWS: link to market place, CLI arguments, AMI IDs, etc. Per region!
Ubuntu send you to market place, but also gives AMI ID
Ubuntu does not like their own tool, prefer SuSE
Look at the Suse "pint" tool which works with AMI metadata to help find the right image to run etc.
- https://github.com/SUSE/Enceladus/tree/master/susePublicCloudInfoClient

OpenStack providers? 18 providers currently. Bad thing to provide links only to 3 biggest
Every image has UUID. Potentially many UUIDs to publish
Also private OpenStacks. They should download latest from Casulana, but then
we need to provide links to those (appropriate) images
Outreachy/GSoC project? come back to this later - let's get the images and metadata first!


GOVERNANCE
==========
Next level of relationship between Debian/SPI and cloud providers.
Delegate to the team instead of going through leader.
Delegates vs. assistants
More than 1 delegates (preferably 3)

Individual can create instance (long-lived) and then lose Debian account.
Need for way for DSA (?) to be able to terminate those resources
Need for specific per-cloud solution for this.
E.g. management of private ssh keys
No long-lived instances (?) or long-lived managed by DSA and not by DDs?

Software development lifecycle
==============================

Testing
-------

We have a runner that will test AWS and GCE images.
* Starts an instance, runs a script inside it.

Roadmap:
* Static tests outside the image
* Test basic Debian stuff in the image on casulana using kvm to run an instance
* Push to the specific provider to test the provider-specific bits

Needs to be integrated into our workflow
Test framework decoupled from tests

Run tests on all combinations of hardware and configuration
(e.g. instance types, number of disks, etc.)

Do we need package?
Good idea for people who build own images, so they can test those images

Pipelines:
run all tests when building new image.
But do we rerun when we push new tests?
pipeline is per project

google released their own tests as FLOSS
We could use this, or at least treat as template
github.com/GoogleCloudPlafrom/compute-image-tools

GCE Integration Tests
* Daisy is a GCE workflow tool- its an API client. Written in Go. (github.com/GoogleCloudPlafrom/compute-image-tools)
* The test framework uses the GKE (kubernetes) open source system (Prow) and associated modules. (https://github.com/kubernetes/test-infra/)
* Examples of tests: test on 2 instances, one ssh to another, etc.

How complex should it be?


gitlab/salsa
Layout of branches
Need for review of merge requestes
 * then we can remove direct access to master or similar branches and require merge requests
Which branch triggers pipeline run?

Stable images should be buildable from stable
Conflicts of FAI versions (configuration changes)

The same policy for cloud images like for Debian.
Stable stays stable.
Forking/branching FAI and config?
 * but how then we merge them?
 * backporting changes to stable
 * or cherry-picking security fixes (or similar)
Unstable can be used for experimentation
OpenStack only builds testing and stable currently (ie: no Sid images)

Current gitlab runners use the same VMs for all builds.
Do we need to split them?

Do we need policy for managing those repositories and changes
Especially for fai and debian-cloud-images (formerly fai-cloud-images)

Bot publishing info about repository changes
including merge requests

branch names like stretch, buster (not "stable" as it'll change)
Similar to gpb process (but not the same).
No direct commits to named branches
need of cleanup for branches (removal of old, clean naming)

We should not have things running in pipeline without somebody doing code review/requests merge

Separete repository for building images and publishing it
Former is (should be) treated as more stable and not changing (especially for stable)
The latter might need change, e.g. cloud providers change API for publishing images
and we need to update code during stable

Two workflows
One automatic for daily
Another manual for official images where we'll publish manually after some review (to marketplace)

Also removal.
Both from market place but also from storage
We use 4T on Azure

We have all cd images for all images since 3.x
No ISO but Jigdo for most, so we can rebuild ISO when needed
We point to snapshots so no problem with Ftp cleaning

Tuesday
--------------------------------------------------------------------------------

Image contents
==============

unattended-upgrades
-------------------
Need to discuss this again. We're enabling it in AWS and GCP images and people seem happy.
Debian-security team don't like this. Need to discuss properly.
waldi thinks that d-i in buster is installing enabling u-u by default. Need to check

default user name
-----------------
openstack adds a "debian" user
AWS is using "admin"
do we care about the difference? we're using separate images already
aws might have an agent in the future, so maybe having a single image for openstack
and aws won't be possible.

emergency service stuff
-----------------------
openstack adds emergency.service and rescue.service. What for? Needed?
sulogin on tty0
we think this is obsoleted by newer systemd stuff

cloud-initramfs-growroot
------------------------
used to grow the rootfs - maybe not needed any more - growpart script
wanted for GCE and anybody else not using cloud-init
now in "cloud-guest-utils"
partition and fs growing is a common problem. waldi to look into this

Some GPT setup has partition 1 at the end.
Previously parition 1 had to be at beginning to boot from
Now it changed.
But there are (maybe?) still some scripts which only grow partition 1 (hardcoded in source)

If cloud-initramfs-growtoot might be removed if we check that it's not used anymore

network bringup
---------------
different providers have different network startup scripts, mostly very similar
use systemd-networkd?

IPv6 poses some problems, and there are hacks to fix it
But it adds many seconds to boot time
Problems between ifup/systemd.
systemd might fix it now (for Buster) but it'll require switching to systemd networkd
It does not fit well for WiFi, and also cloud
but not hard data, mostly anegdotes of people with problems
Now desktops depend on network-manager
We still install ifup/down but it's disabled

gce/azure - diff only blank line
aws: configures 8 interfaces, IPv6 helper script to determine v6 or not
ifup does not work for dynamic attachment; but dynamic is possible on clouds

We'll leave differences for now; we might return to this once there is one good solution
There will be differences, e.g. configuration of 25/40Gbps for AWS, etc.
All should include e/n/interfaces.d/*
Also all VPCs and point-to-point configuration so /e/n/i.d/* should be done
SuSE has cloud-net-config. It's framework for configuring it.
Should we look into this, or leave for the future

Need for stable networking interfaces' names

It looks like there is many solutions, each with own problems and deficiencies

Noah proposed testing udev rules


serial-getty@.service
Zigo uses it to have serial console in OpenStack
Now systemd should start getty automatically so it might not be needed anymore
but we still need to check.
If it does not work, it's bug which needs to be reported and fixed

ttyS0 is not always first. ARM64 has ttyAMA0


Buster
======

Any changes we need or will affect us?

systemd-networkd? no
grub_cloud - all the x86 images should use hybrid images by default now

We don't need one metapackage (to pull dependencies) as we control what we install anyway
hybrid image?

All clouds use (should use) EFI
uboot should not be used. usage of grub with it is problematic, depends
on uboot details

add a grub-cloud-arm64 package with simple config - maybe just install grub-efi-arm64?

cloud kernel image
only amd64 for now
want an arm64 equivalent too
need some work to make it more generic before adding new architectures
add ppc64el? do that when somebody cares

kernel-clod has problems on AWS. Does not have drivers for ENI (more performant networking)
Should work on lower instances though, but people might want to use better machine
types ;-)

cloud-kernel might be default for clouds
might be problematic for OpenStack but we should at least test it on AWS/AZU/GCE
May need to add some cloud-specific drivers?
OS images are using this image already for buster, no complaints so far

OpenStack bare metal provider would need to have generic kernel
Build both when we have automation?
We might need to add back some small amount of HW drivers, e.g. NVMe for AWS instances (like m5d)

Back-porting such drivers to stable kernels?

Python3
Remove everything for Buster which is not Python3
There is already lintian check suggesting only python3
Python2 will stay in Buster, will be removed from Bullseye


Buster
Freeze: Q1 2019
Should we try to release cloud images at release day?
Azure has long rollout time (up to 1 week)
But we should announce that we're releasing

Prelease images, beta during freeze
We have D-I alpha/beta, so we could (should) do the same for cloud images
We already have weekly images so we could (re-)use them
But we might want to rename them or give more publicity

Release team has release criteria; do we want similar for clouds?
Removal of RC-buggy packages.
There are key packages which are not removed; we should add cloud-critical packages to it
Take packages from fai config space


Non-x86 ports
=============
arm64 OpenStack
There is ppc64el in Brasil Uni. OpenStack

Only OpenStack has non-x86
AWS/AZU/GCP are only doing amd64?

Cross-building
FAI checks if target is the same as host, uses qemu-static if they differ

Emulation:
slow, but works for now
casulana uses kvm for x86 or qemu static for other architectures (1/20 of speed)


Summary:
we support other architectures, we'll solve problems if they arrive
The biggest problem might be grub/booting
We won't be adding new architectures unless there are real providers using them


Secure boot
===========

All providers want it, as customer start expecting (demanding) it
It'll be opt-in for customers

Debian want to have Secure boot for Buster (release target)
Much work done during DebConf 2018.

Next alpha installer should have it
Software is ready

Additional kernel modules.
GPU drivers require them
Others - we don't know

Still problem with bootstraping and adding new keys

Kernel and then later stages verification
Debian should cope with it; software should be ready, might require scaling up

Platforms should have MS key and that's all
(just like desktops)

qemu secure boot?
is ready, just need to add command line arguments with key details (OVMF)

Non-x86 differs
No root CA, like MS is for x86
HPE shipped HW with ability to add new keys (non-enforcing mode?)


Debian LTS
==========

If there are packages we'll build new images?

First. Do we want to provide LTS images?
Cloud users are usually short-lived.
OTOH COPS still uses Jessie (with custom kernel)

Problem with environment/community
They use old, outdated images (e.g. Jessie) as basis.

Also. Debian volunteer does not provide LTS
Paid people provide support

We should state that we're not providing support exceeding oldstable support

LTS vs backports

security as best effort

Users want to have more than 3 years

Requires effort to keep images supported

Keeping images for some time, but reduce discoverability
We won't provide images for LTS, but won't prevent users from pointing to LTS repository
if they want to
But they won't be official D images

Images including packages from backports
========================================

Mostly the kernel, but in the past has also included things like ssh
Most of other packages are doable by cloud-init (or similar)
Kernel is special as this requires restart (reboot)

AZU currently uses backported kernel (kernel from backports)
They (Azure or users? Users) want it and use it
Proposal to do the same for all providers

Backports (unlike LTS) is official part of Debian, hosted
on our infrastructure, maintained by DDs or DMs, etc.

ssh was used from backports for GCE for performance/cipher reasons

We should label it, describe/document it and be OK with it
Possibility is included in FAI


Multiple variants of images
===========================

Variation on previous one

But also non-free, not always needed, vendor stuff

Debian base (minimum)

Amazon Linux provides base image but it's not very popular (or is it?)
What's removed?
it has cloud-init but people want to remove it
No official list with removed stuff, or difference to standard one

We have FAI class with differences

We might want to have cloud-init, some basic agents
But don't include SDKs or cloud CLI, etc.

Be good citizen (VM)

Putting SDK (GCE one) to Debian
Google people don't care about Debian (are not DDs)
GCE puts new version every week, and that won't change

Put it to unstable and prevent from migrating?
But then we won't have backports (they should be releasable)

We should have one config space to build all variants
Otherwise it'll be divergence

OTOH we'll be diluting "Debian" by providing images with non-debian repository
Also, those repositories might end up containing non-free software

Cannonical was republishing SDKs in own repository but it was 6-8 weeks out of date
Users were complaining about missing support for new features, regions, etc.

Use cloud-init to add other repositories?
It'll increase boot time

There is no one clean solution

We (Debian) are not doing good job maintaing fast moving software
(docker, kubernetes, GCE SDK, etc.)
It's not one-time effort but continuous

We might include FAI classes for vendor repositories, without doing them

Do we want to provide many variants of images, or just make it easy to people to build own images?

We've chosen FAI to make easy to build own variants
We might build some important variants but not to many

Having tests will also make sure that user can be assured that their changes did not break anything

We might revisit it in the future

The cloud providers should write the FAI configs and include them into the cloud repo. Any ideas for class names?


SSL/TLS 1.2?
Recent OpenSSL release disables some ciphers and TLS below 1.2
It might break some software deployed in the cloud

We shall not prevent users from e.g. enabling old, insecure ciphers
But at the same time we should have sensible default configuration

cloud-init
----------

1 RC bug - current version in unstable FTBFS
a few other bugs to fix
upstream work - development is ongoing
is it good for us?
forking / replacing has been attempted by various folks, but never takes off
sprint happened recently
notes - https://docs.google.com/document/d/1-gctZNXA9oshxyDsMqoWAw66ll5MBBpvRkdA6wzMYDk/edit

CLA no longer a problem, it seems
maybe slow to get patched accepted upstream


cloud.debian.org
Put small page there with links to built images
Not we must use salsa artifactory which is non really intuitive
It's also temporary space
cloud.d.org (alias of pettersson) can also be used to host


WRAP UP
============================
We have FAI config files for all the providers
We want debian accounts
SPI umbrella for accounts
We will probably have delegation from DPL
We have salsa pipelines to build images
We don't have an image finder


Timeline:
Buster.
Freeze - Jan 2019, we should have:
accounts
mirror names
beta images with those

Bastian:
mirror
EFI - secure boot
cloud kernel config cleanup

Luca + Jimmy as SPI
Debian accounts and legal agreements with the providers

Luca
delegation
authentication, authorisation (not required for Buster)
driving the cloud mirrors conversation through DSA
Send email about reimbursement

Jimmy
help luca with auth

Ross
integrating tests with the build pipeline
re-org of test framework

Tomasz
Delegate
Summarize notes of the sprint
integration tests
Planning to go through the mirror

Helen
work on tests - framework, integration etc.
nm work :-)
secure boot (+steve + lucas)

Martin
SPI and DSA work, not much free time otherwise
Debian-cloud-announcement ML

Steve M
Delegation
images vs cloud team on casulana
cloud-kernel for arm64
help zigo run arm64 OpenStack
arm64 host (not sure here)
secure boot
Organizing monthly meetings

Thomas L
likely to be occupied - surgery
review pull requests, etc

Thomas G
openstack ppc64el build and test

Noah
continue maintaining the stretch images for AWS
developing buster aws images
automatic building/registering from casulana
coordinate creating new AWS account (including publishing on gov cloud)

Lucas
secure boot
image finder :-D

David Duncan
New account structure
help with mirrors
help with rollout of tests on AWS
And try to open source existing tests
Set up monthly sync with SPI

Steve Z
accounts for publishing
user account federation (for DDs etc.)
EFI help for gen v2 (new HW) - make sure it works on Buster
  whitelist waldi to access gen-2 capable region
access to team for test infrastructure (not defined timeline)


DONT REMOVE THE EMPTY LINES, it makes sure zobel does not to need to scroll all the time....