* Building images (go through all steps from debian-cloud-image repo commit to publishing it) Buster - quite unified, consistent, built on Salsa CI Salsa coordinates, Casulana is used as runner Salsa is nice, we have logs, it's standard GitLab so people (should) know it gitlab-ci.yml creates VM, installs FAI packges, Makefile calls Python buildscript wrapper around options, mostly for disk size, list of classes Result - artifacts (files), uploaded to ? Upload is part of CI process offline-Testing is part of FAI process using pytest (hooks/test.CLOUD) It tests what is in image, but does not run image Upload to cloud providers, and also to petterson.d.o cloud.d.o - points to petterson.d.o (physical machine on which it's hosted) Logs - are only in Salsa Publicly readable [TODO]: we should host relevant logs (and manifests?) also to cloud.d.o OpenStack: sources file tar of all source files - but without all build-dependencies. We have snapshots, and we could provide information (and documentation) how to rebuild exact image from those files (documentation) Maybe more detailed logs would be useful (with list of all installed packages and their exact versions). We have this for binary packages (manifest) - all information about built image It means that we can rebuilt image (with the same configuration and options) We might need to provide equivalent command used to built particular image Logs - inside image we have file with details (job ID, etc.) It means that we have URL of job which was used to build it. [TODO]: - extract this link (of job), add to manifest If we ever need to migrate Salsa, job output might be missing It means that we might extract logs in order to preserve them (more long-term solution) [TODO]: Add indication which packages are not in stable Salsa workflow and CI setup =========================== Noah used our tools outside of Salsa CI to build image. Good test of building Need to provide new classes Overlay - to integrate config with other one Goal - people can build images by themselves it's easy to add new classes to FAI, but our Python script does not allow for that (not so easily) For now people use FAI without Makefile/Python script [TODO]: add to documentation (readme.md) how to call FAI without help of our scripts Live testing of images on cloud providers' infrastructure ========================================================= We should do that. Long time to have image registered Metadata - static Simple tests vs. many variants - on different HW, instance types, regions, etc The latter * Thomas (zigo) set up OpenStack on Casulana - not used right now, and probably not the best (no monitoring, etc.). * Thomas will provide an account on a amd64 cloud which we can use for testing. This is better, because that's a production cloud which is constantly maintained and monitored. * Salsa workflow is not very adapted for testing different variants of the same. We'll need to have external tools running on a separate VM for that, and Salsa will be coordinator. That VM will run tempest tests. * We need to test on different architectures: amd64, arm64, ppc64el. - We may contact Linaro (through Steve Mc. Intire?) to test on arm64. - For ppc64el, we can contact that univ. in Brazil. Lucas have contact to miniCloud (ppc64el - http://openpower.ic.unicamp.br/minicloud/) - he and zigo will try to contact them and request account, request access via https://forms.gle/9Yf2RkG7ES24JURR7 Result of tests Logs of tests - what was run, what succeeded, what failed Not really useful for end users, but valuable for developers Logs in separate directory on cloud.d.o (or cdimage.d.o) It's hard to find logs in Salsa Daisy has User Experience built in - so it should be easier to find logs there But still - important to have them in one place, and not need to hunt for them Google and Microsoft have own tests. For now it's internal, work on publishing at least some of results. CI/CD system on Kubernetes Link to results of those tests? How to integrate them into our workflow AWS - distro based on RPM and their tools are tighly coupled to it Not sure how useful it would be for us Many dimensions Providers regions architectures hypervisors: kvm, xen, etc. (not so relevant for AWS and so on - it's internal for them; but relevant for OpenStack) Not forcus on all possible variants, but on most popular one OpenStack user survey - how people are using it Let's use it to measure popularity to decide what we support (https://www.openstack.org/analytics 2019 results likely to be added in November) Next OpenStack operators event (sprint-like discussion, not presentations) will be in London at Bloomberg January 7-8 2020. Not yet listed at https://www.openstack.org/community/events/ but should be "soon". Possibly opportunity to discover how OpenStack operator currently consume images & what they like/dislike about that? (weekly planning freenode irc Weekly on Tuesdays at 1400 UTC in #openstack-operators, log and minutes at http://eavesdrop.openstack.org/meetings/ops_meetup_team) We'd like to test on all what Debian supports (in this order of preference): - Qemu / KVM - Ironic - LXC Also - Debian is used not only on cloud. So kernels are tested - and we could assume that they (mostly) work on OpenStack. This might not be true for commercial clouds as they use their own hardware, own forked hypervisors or kernels, etc. Need to gather all results (and logs) in one place, especially to ease debugging and looking for regressions * List of regressions between the image on cdimage.debian.org and the daily image ================================================================================= - /etc/hosts not updated https://bugs.debian.org/942325 cloud-init should update it. What is missing from it? Apparently difference between different image variants - need to check how they differ backports activated twice https://bugs.debian.org/942326 FAI config space had it in both sources.list and sources.list.d Or maybe cloud-init messes it Should we detect it with live-tests? not using the cloud kernel It is generic image, so it should have full kernel 2 images generic image and generic-cloud The former uses normal kernel, the second uses cloud kernel [TODO]: better description of it, add to Image Finder not published depending on security updates No daily image, but only updated when there is need for that New image should be generated only when there is security update Script (by Steve) to analyze security updates and generate new images only when needed. Updating image for all security updates might be too agressive Maybe only create new images when reboot is required (kernel, libc, systemd, etc.) Cloud init can install updates on launch so no need to generate new image for that It slows down boot - but not much on AWS which uses local mirror what is more painful - image churn, or longer boot time? Management answer - shorter time is always better google - publish always monthly, and when critical security update Problem with image identity management How to know when to build image? When security team publishes new version, we need to check if it is relevant for us and build image if needed Need to parse manifest to know which packages we have Trigger mechanism Timing: mirror update before DSA announcement Daily builds but not daily publication It does not make sense to publish daily builds of stable - for weeks they will be the same And we'll be introducing noise and confusion to users (not talking about problems with reconfiguring auto-scaling groups) [TODO]: Comparing manifests between different builds (to decide if we want to publish) We already publish only when there is new security update (only AWS and Azure currently), but images are still published to https://cloud.debian.org/images/cloud/buster/daily lots of packages not installed: locales-all, vim, screen, etc. Stretch had class Extra (with more packages). We don't install them for Buster Images should be small - but also useful Different needs of users. Automatic images, or also for interactive work (cattle vs. pets) Variants of images? curl! instead of wget AWS has larger and minimal wariant of AWS Linux Docker and Docker Slim Slim VM? People can take slim image and extend it; not as advanced as building own image using FAI but with smaller footprint Not OpenStack specific generic images - are they OpenStack specific? OpenStack metadata? cloud-init check where it runs Takes long time if it fails Shortcuts to deal with it Should OpenStack use cloud kernel Better directory layout on cdimage.debian.org/images/cloud/ =========================================================== One directory with date - latest release then daily/ with history When we have image finder - does this layout matter? [TODO]: Add HTML file with some description header.html? We have it in parent directory Soft link to "latest" release? Who is target of that? Why do we keep all those images for last few months? daily builds are not available in gitlab ci It removes artifacts after 7 days How many do we keep, when do we remove? Stable - no sense to publish daily as they are identical. Confusing for users - which one contains fix? For stable - have archive, and move older images there after publishing new one. Do we remove? Published should be kept But who will use old images? Sometimes people use some older software version to compare (e.g. performance) If somebody wants to keep old version - we should not be responsible for keeping all versions. We should focus on providing latest images, with current software. Packages are stil in archive (on snapshots.d.o) so if someone has specific needs, he/she can build their own images. Directory structure vs. what we keep (i.e. not delete) Consensus - we do not need to keep so many images. On cdimage.d.o or in cloud environment? If someone added those to autoscaling, things will be broken [TODO]: We should hide them after short time (14d) and delete after longer time What is released, should stay (possible deletion after entire suite is obsolete) EC2 publishes daily images of Debian from different account - to make them less discoverable. I need to know what to look for. consensus - daily images are OK,but they should be regularly deleted - and it should be documented (explicitly stated that they are ephemeral and you need to know what you're doing when using them). OpenStack 2 directories - one built by old scripts, another by FAI For now we provide both, still some regressions in FAI-built images [TODO]: We should describe current situation and that people can use both - and test them. And describe that since bullseye we'll only have "generic" ones Relationship to CD-builder? =========================== Rebuild (and publish) only when needed (i.e. new version of package) Compare manifests to see if any of installed packages changed Compare with last released image on cdimage.d.o, not parsing Salsa job description cd images are not rebuild after packages are updated - only after point releases We want to be integrated with CD images team to have point releases etc. cdbuilding uses local mirror on casulana, we use deb.debian.org mirror Secure Boot class for cloud images ================================== It works. install grub, shim-signed into removable location. add them to existing class, or new class? own key management? Everything should work, but when we are building images we are not installing *-signed packages. Potential problem - can shim-* (etc.) be installed to non-standard location (e.g. additional EBS volume, etc.) when we bootstrap new image? What about removable locations, and auto-detect? Work will continue, mailing list should be updated on progress The Octavia image: why we need it ================================= Load Balancer aaS in OpenStack We need to use special image to be able to run it. haproxy, keepalived, octavia-agent octavia-agent in Buster is not working, update pending amphora - special image (VM) for Octavia Specific software for specific cloud - should Cloud Team even take care of it? It's common pattern for cloud. commercial providers provide LB, we might want to provide it also. Kubernetes is happy when LB is available Upstream uses Disk Image Builder If we want people to use Debian for amphora, we'll need to provide it ourselves It's needed to OpenStack deployment, not for users Smaller organization who run own OpenStack cluster will (might) need it But not OpenStack users Setup is non-trivial - but they might be able to build own image, having scripts to build it When do we stop with providing specific images (i.e. blends) Bus factor? How stable is Octavia - i.e. will Octavia from Buster work with Bullseye,etc. Upstream commitment. We will do this (provide Octavia image), but reevaluate before Bullseye We can use it as test opportunity for image variants. Blends Should we (if it works) present at OpenStack conference how to use FAI to build own images,variants, etc? Not now, but in some time? 2 big conferences. North America and Europe (in Spring) Should we get (as a team) more contacts with OpenStack community for feedback?