Cambridge, MA

Self Introductions of attendees
===============================
Noah Meyerhans <noahm@debian.org> - AWS, EC2
David Duncan <davdunc@amazon.com>, AWS
Arthur Diniz <arthurbdiniz@gmail.com>, GSoC student, image finder author and maintainer
Zach Marano <zmarano@google.com>, Google
Liam Hopkins <liamh@google.com>, Google
Ross Vandegrift <rvandegrift@debian.org>, Clear Data, cloud user
Bastian Blank <waldi@debian.org>, credativ, Azure images
Thomas Goirand <zigo@debian.org>, OpenStack package maintainer since 2011
Tomasz Rybak <serpent@debian.org> - cloud team delegate
Thomas Lange aka MrFAI <lange@debian.org>
Lucas Kanashiro <kanashiro@debian.org>, DD, mentor of GSoC, moved to Canonical
Thomas Stringer <thomas.stringer@microsoft.com>
Matt Bearup <mbearup@microsoft.com>
Jon Proulx <jon@csail.mit.edu>, MIT CSAIL OpenStack architect/operator

Agenda
======
* Official account status and user management

* Building images (go through all steps from debian-cloud-image repo commit to publishing it)
New team at Salsa
  - Salsa workflow and CI setup
  - Live testing of images on cloud providers' infrastructure
  - Better directory layout on cdimage.debian.org/images/cloud/
  - Secure Boot class for cloud images. 
Relationship to CD-builder?

* Image finder: deployment, code and database review)
  - Publishing to Open Stack providers 

* Package mirrors on cloud provider CDNs
"Professionalize" it - CDNs, not only academic institutions
in AWS Cloud Front, etc. - currently old account


Day 2, 2019-10-15
=================

Overview of CI setup

4 different projects
Main project:
Pipeline
Source tests: flake8, pytest
Then builds images - architectures X providers
Only builds unstable by default; 11 images from above
15-20 minutes
.gitlab-ci.yml
Stages
artifact expires in 7 days

Running pipeline on branch

Separate group for sensitive stuff
Daily images: Debian Cloud Images - Daily
Almost all errors are ignored here; it tries to have something, even if other
artifacts fail

Runner runs on casulana
gitlab runner using kvm
Limit of 8 in parallel; casulana has 384G, but we use max 64G
Need to synchronize with cd image building not to use too much RAM and IO
at the same time

Uploading to patterson using sftp
Every job produces output

It builds Stretch till Sid, uploads all to patterson, all to Azure,
Buster and later to AWS - as older use different tooling

Debian cloud Images - Housekeeping
For Azure we need to "click" manually "Go Live" as part of publishing. This is
script that does it
It also cleans up old images (daily)
Does not clean up on AWS - not implemented yet


Debian Cloud Images - Release
It is for publising "release" images.
Uses images built by "Daily".
2 variables: DIST (only Buster for now) and VERSION (from Daily)
Runs on master branch
Pipeline: upload to Azure and AWS, publish

qemu used for arm64, ppc64

Separate project - as we use this for publishing images and we don't want
to allow for people to mess with it. Also - need to manage secrets to access
cloud providers: we need to protect those as they are used to publish our _official_
images

Daily - housekeeping is run from cron, 6h after daily build
It automatically publishes images

Release - manual click on "Go live", after receiving email that image
was published (only for Azure) (email is sent to Bastian now, maybe should be
to debian-cloud-images@spi?)

When we have live tests - we'll make pipeline longer:
* upload, mark as private
* run tests
* make public
* publish on marketplace (for AWS, similar for other providers)

Plan to have gitlab page with summary of images which were uploaded
(after pipeline finishes)

Ability to skip pipeline after commit

Test; currently some tests are failing for Stretch
We should either fix it, or drop Stretch

Pipelines for devel - run on Salsa, not on Casulana (as we don't know what's there)

When we publish image (on AWS), we should make snapshot publicly readable.
It'll allow for other users to copy images to their own accounts

Azure - gen 2 images available
Gen 2 - UEFI, modern basis. Required for Secure Boot
Daily already support it
Problem with resource disks (virtual) - waiting for upstream (Azure agent) to apply patches


* Agents, and additional features for clouds - cloud specific vs. stable release.
  - qemu-guest-agent?
* Release cadence vs. vendors' speed

Changelog of waagent
Getting into stable takes months - too long for need of cloud provider needs
Should we get package version from unstable? pinning

PPA? (bikeshed)
backports - still takes some time
fai-config space: we could put packages there.
(but only for testing, not for release)

Cloud Blend (not real Blend, but close)
Stable with backports

Backports (with pinning)
Does not work for oldstable

up-to-date SDK or up-to-date kernel?

SDK and agents. SDK to access cloud, agents to talk from image to cloud

proposed-updates, we can access it, test it, and then put it to updates
*-updates
proposed-updates -(ACK by Release Team)-> updates
stable-updates; former volatile
https://wiki.debian.org/StableUpdates

proposed-updates
updates
stable

package -> proposed-updates
on point release packages from proposed-updates get copied to updates

stable-NEW

see criteria for stable-updates:
https://www.debian.org/News/2011/20110215
7-10 days to go through path

We need to talk with Release Team about getting those packages into stable-updates
We'll need to do correct packaging
No vendoring, proper dependencies (and build-deps)
Means more work, and possibly need to upload not only leaf packages

Azure - quite OK state
GCE - old agent, repo at Salsa (https://tracker.debian.org/pkg/google-compute-image-packages)
new agent in Go, we'll try to upload it.
serpent will be initial sponsor and reviewer, Arthur is member of Go team,
could also help
AWS - needs Cloud-init, other agents are not so critical; nice to have, but
less priority
We might use Azure and GCE as test grounds for uploading to stable{-updates}


Photo
https://www.dropbox.com/s/nlqlg8s4v9ud39a/cloud_sprint_group.jpg?dl=0


* Debian cloud kernel bugs and additions.
New features, new HW - patches

RTC, new drivers in cloud kernel
HW support - no problems by release team
bug, with "important"

Example hardware support update:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941291
https://salsa.debian.org/kernel-team/linux/merge_requests/172

kernel team - do we need to help them?
they need manpower

Kernel - one source tree, different kernels are just different configurations
Drivers to go to new version should be already upstream


Team - Delegates, communication, 


Checksums
Nice to have, used (as human data) by OpenStack
Handled automatically when uploading to S3/Glacier

We should calculate - but from which file and at which stage?
We should compute at many stages - raw image, tar (compressed), something else? (e.g. qcow)

Merging. Thumbs up? Comment is better.
If sure - merge it. If not 100% sure, then add comment "Looks OK, but..."
And then somebody else should look at it


Day 3, 2019-10-16
=================

Live testing

Vagrant
Popular, is being slowly replaced by docker
Can be run across different infrastructure - i.e. universal image, able
to run one image everywhere
Emanuel works on it, noone from present people uses it
Maybe it would be good to have official Debian account; for now
it looks like Emanuel uses his own account to upload

Docker
Maintainers of current images 
Managing accounts: through delegates, we'll need to contact current maintaiers
of Docker Debian image
We (cloud team and SPI) manage account
Marketplace permissions - not granular enough to allow for just publishing docker
image, and not do anything else
That's why we'll need to create separate (restricted) account for them

Debian Docker image (curated) - as basis of other images
Trusted image
It's also about discoverability of Debian image
Container marketplace

Current maintainers use own custom tool
They have own repo on GitHub , shel script on top of debootstrap

Date of next IRC meeting
November 20th, 19:00UTC


OpenStack operators' meeting
London, January 2020

4x a year
2x - at OpenStack summit, large events (Fall, Spring)
2x - smaller, just operators (Winter, Summer)

User survey - who is running which version of OpenStack
Nova network - problematic for upgrade
Either live-migrate, or create new cluster and move work to there

Discuss/show our image testing
Get their results?
In controlled fashion, not trust their results blindly
Especially if we don't fully know/have access to their environment
More use it as social, not technical measure of our success?


Adding class EXTRA to images, to have a bit more useful tools
curl, networking stuff, etc. Details - TBD
locale-all? big
We are close to Salsa artifact size limit. 40MB below it
If we add EXTRA, we'll be above it
But in any case we'll hit this limit organically soon
We are not only team who needs bigger packages

Many flavours of image
Smallest, then useful
AWS Linux: standard, and minimum
Docker: default and slim
Is any of those more popular? OTOH if both are used, it means that both are needed
Python and others are using docker slim to save space
But not Alpine so glibc works correctly :-)