(Big) Debian data packages Current problems - archive can't scale to host large datasets - duplication of data between VCS/.orig/.deb - tar/ar have inherent limitations on file size https://wiki.debian.org/Teams/Dpkg/DebSupport - upgrades (regardless of change size) are again large tarballs Interesting approaches which were taken upstream - ITK, ants use a key-value store to fetch test data identified by checksums Google spreadsheet comparing existing/perspective packaging approaches https://docs.google.com/spreadsheets/d/13o---vHAuyNeLElSp_EUhzefW0dayO8FHGRn4XmTeFs/edit?usp=sharing Interesting in Debian - debdelta support in deb Official data archive : - is almost there - no automated migrated yet planned but some packages are begging for it - would primarily be for "support" data packages (e.g. data for scientific applications, or games) - "value for Debian" would be the criterion to include it - no objections yet against contrib/non-free - current plan to bring up 2TB of storage Related topics Data(sets) copyright/licensing Interesting ideas related to "thin git-anex-based packages" delay actual data fetch until apt hook time debtorrent approach of online debian packages generation