Cambridge, MA Self Introductions of attendees =============================== Noah Meyerhans - AWS, EC2 David Duncan , AWS Arthur Diniz , GSoC student, image finder author and maintainer Zach Marano , Google Liam Hopkins , Google Ross Vandegrift , Clear Data, cloud user Bastian Blank , credativ, Azure images Thomas Goirand , OpenStack package maintainer since 2011 Tomasz Rybak - cloud team delegate Thomas Lange aka MrFAI Lucas Kanashiro , DD, mentor of GSoC, moved to Canonical Thomas Stringer Matt Bearup Jon Proulx , MIT CSAIL OpenStack architect/operator Agenda ====== * Official account status and user management * Building images (go through all steps from debian-cloud-image repo commit to publishing it) New team at Salsa - Salsa workflow and CI setup - Live testing of images on cloud providers' infrastructure - Better directory layout on cdimage.debian.org/images/cloud/ - Secure Boot class for cloud images. Relationship to CD-builder? * Image finder: deployment, code and database review) - Publishing to Open Stack providers * Package mirrors on cloud provider CDNs "Professionalize" it - CDNs, not only academic institutions in AWS Cloud Front, etc. - currently old account Day 2, 2019-10-15 ================= Overview of CI setup 4 different projects Main project: Pipeline Source tests: flake8, pytest Then builds images - architectures X providers Only builds unstable by default; 11 images from above 15-20 minutes .gitlab-ci.yml Stages artifact expires in 7 days Running pipeline on branch Separate group for sensitive stuff Daily images: Debian Cloud Images - Daily Almost all errors are ignored here; it tries to have something, even if other artifacts fail Runner runs on casulana gitlab runner using kvm Limit of 8 in parallel; casulana has 384G, but we use max 64G Need to synchronize with cd image building not to use too much RAM and IO at the same time Uploading to patterson using sftp Every job produces output It builds Stretch till Sid, uploads all to patterson, all to Azure, Buster and later to AWS - as older use different tooling Debian cloud Images - Housekeeping For Azure we need to "click" manually "Go Live" as part of publishing. This is script that does it It also cleans up old images (daily) Does not clean up on AWS - not implemented yet Debian Cloud Images - Release It is for publising "release" images. Uses images built by "Daily". 2 variables: DIST (only Buster for now) and VERSION (from Daily) Runs on master branch Pipeline: upload to Azure and AWS, publish qemu used for arm64, ppc64 Separate project - as we use this for publishing images and we don't want to allow for people to mess with it. Also - need to manage secrets to access cloud providers: we need to protect those as they are used to publish our _official_ images Daily - housekeeping is run from cron, 6h after daily build It automatically publishes images Release - manual click on "Go live", after receiving email that image was published (only for Azure) (email is sent to Bastian now, maybe should be to debian-cloud-images@spi?) When we have live tests - we'll make pipeline longer: * upload, mark as private * run tests * make public * publish on marketplace (for AWS, similar for other providers) Plan to have gitlab page with summary of images which were uploaded (after pipeline finishes) Ability to skip pipeline after commit Test; currently some tests are failing for Stretch We should either fix it, or drop Stretch Pipelines for devel - run on Salsa, not on Casulana (as we don't know what's there) When we publish image (on AWS), we should make snapshot publicly readable. It'll allow for other users to copy images to their own accounts Azure - gen 2 images available Gen 2 - UEFI, modern basis. Required for Secure Boot Daily already support it Problem with resource disks (virtual) - waiting for upstream (Azure agent) to apply patches * Agents, and additional features for clouds - cloud specific vs. stable release. - qemu-guest-agent? * Release cadence vs. vendors' speed Changelog of waagent Getting into stable takes months - too long for need of cloud provider needs Should we get package version from unstable? pinning PPA? (bikeshed) backports - still takes some time fai-config space: we could put packages there. (but only for testing, not for release) Cloud Blend (not real Blend, but close) Stable with backports Backports (with pinning) Does not work for oldstable up-to-date SDK or up-to-date kernel? SDK and agents. SDK to access cloud, agents to talk from image to cloud proposed-updates, we can access it, test it, and then put it to updates *-updates proposed-updates -(ACK by Release Team)-> updates stable-updates; former volatile https://wiki.debian.org/StableUpdates proposed-updates updates stable package -> proposed-updates on point release packages from proposed-updates get copied to updates stable-NEW see criteria for stable-updates: https://www.debian.org/News/2011/20110215 7-10 days to go through path We need to talk with Release Team about getting those packages into stable-updates We'll need to do correct packaging No vendoring, proper dependencies (and build-deps) Means more work, and possibly need to upload not only leaf packages Azure - quite OK state GCE - old agent, repo at Salsa (https://tracker.debian.org/pkg/google-compute-image-packages) new agent in Go, we'll try to upload it. serpent will be initial sponsor and reviewer, Arthur is member of Go team, could also help AWS - needs Cloud-init, other agents are not so critical; nice to have, but less priority We might use Azure and GCE as test grounds for uploading to stable{-updates} Photo https://www.dropbox.com/s/nlqlg8s4v9ud39a/cloud_sprint_group.jpg?dl=0 * Debian cloud kernel bugs and additions. New features, new HW - patches RTC, new drivers in cloud kernel HW support - no problems by release team bug, with "important" Example hardware support update: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941291 https://salsa.debian.org/kernel-team/linux/merge_requests/172 kernel team - do we need to help them? they need manpower Kernel - one source tree, different kernels are just different configurations Drivers to go to new version should be already upstream Team - Delegates, communication, Checksums Nice to have, used (as human data) by OpenStack Handled automatically when uploading to S3/Glacier We should calculate - but from which file and at which stage? We should compute at many stages - raw image, tar (compressed), something else? (e.g. qcow) Merging. Thumbs up? Comment is better. If sure - merge it. If not 100% sure, then add comment "Looks OK, but..." And then somebody else should look at it Day 3, 2019-10-16 ================= Live testing Vagrant Popular, is being slowly replaced by docker Can be run across different infrastructure - i.e. universal image, able to run one image everywhere Emanuel works on it, noone from present people uses it Maybe it would be good to have official Debian account; for now it looks like Emanuel uses his own account to upload Docker Maintainers of current images Managing accounts: through delegates, we'll need to contact current maintaiers of Docker Debian image We (cloud team and SPI) manage account Marketplace permissions - not granular enough to allow for just publishing docker image, and not do anything else That's why we'll need to create separate (restricted) account for them Debian Docker image (curated) - as basis of other images Trusted image It's also about discoverability of Debian image Container marketplace Current maintainers use own custom tool They have own repo on GitHub , shel script on top of debootstrap Date of next IRC meeting November 20th, 19:00UTC OpenStack operators' meeting London, January 2020 4x a year 2x - at OpenStack summit, large events (Fall, Spring) 2x - smaller, just operators (Winter, Summer) User survey - who is running which version of OpenStack Nova network - problematic for upgrade Either live-migrate, or create new cluster and move work to there Discuss/show our image testing Get their results? In controlled fashion, not trust their results blindly Especially if we don't fully know/have access to their environment More use it as social, not technical measure of our success? Adding class EXTRA to images, to have a bit more useful tools curl, networking stuff, etc. Details - TBD locale-all? big We are close to Salsa artifact size limit. 40MB below it If we add EXTRA, we'll be above it But in any case we'll hit this limit organically soon We are not only team who needs bigger packages Many flavours of image Smallest, then useful AWS Linux: standard, and minimum Docker: default and slim Is any of those more popular? OTOH if both are used, it means that both are needed Python and others are using docker slim to save space But not Alpine so glibc works correctly :-)