(Big) Debian data packages

Current problems

  - archive can't scale to host large datasets

  - duplication of data between VCS/.orig/.deb

  - tar/ar have inherent limitations on file size https://wiki.debian.org/Teams/Dpkg/DebSupport

  - upgrades (regardless of change size) are again large tarballs


Interesting approaches which were taken upstream

  - ITK, ants use a key-value store to fetch test data identified by checksums


Google spreadsheet comparing existing/perspective packaging approaches

https://docs.google.com/spreadsheets/d/13o---vHAuyNeLElSp_EUhzefW0dayO8FHGRn4XmTeFs/edit?usp=sharing

Interesting in Debian

 - debdelta support in deb 


Official data archive :
- is almost there
- no automated migrated yet planned but some packages are begging for it
- would primarily be for "support" data packages (e.g. data for scientific applications, or games)
- "value for Debian" would be the criterion to include it
- no objections yet against contrib/non-free
- current plan to bring up 2TB of storage

Related topics

    Data(sets) copyright/licensing


Interesting ideas related to "thin git-anex-based packages"

     delay actual data fetch until apt hook time

     debtorrent approach of online debian packages generation