Hello,
haven't seen this topic in a cursory search, hence raising it here demonstrated with one particularly frequent package findutils:
findutils
$ rpm -qi findutils | grep '^\(Version\|Release\|Size\)' Version : 4.7.0 Release : 4.fc33 Size : 1808667
$ rpm -ql findutils | { while read f; do test -d "$f" || echo "$f"; done; } | xargs du -b | tee >(cut -d "$(printf '\t')" -f1 | paste -s -d+ - | bc) 328944 /usr/bin/find 80072 /usr/bin/xargs 25 /usr/lib/.build-id/43/34dd190460c23206f77d2e79168f0164cb5fdf 24 /usr/lib/.build-id/63/c7dd8dd86c5bae0f774c341fe9323bd3a9713c 1375 /usr/share/doc/findutils/AUTHORS 83731 /usr/share/doc/findutils/NEWS 4539 /usr/share/doc/findutils/README 1539 /usr/share/doc/findutils/THANKS 2860 /usr/share/doc/findutils/TODO 24251 /usr/share/info/find-maint.info.gz 89616 /usr/share/info/find.info-1.gz 1878 /usr/share/info/find.info-2.gz 2478 /usr/share/info/find.info.gz 35149 /usr/share/licenses/findutils/COPYING 2343 /usr/share/locale/be/LC_MESSAGES/findutils.mo 48466 /usr/share/locale/bg/LC_MESSAGES/findutils.mo 7982 /usr/share/locale/ca/LC_MESSAGES/findutils.mo 36184 /usr/share/locale/cs/LC_MESSAGES/findutils.mo 34612 /usr/share/locale/da/LC_MESSAGES/findutils.mo 36905 /usr/share/locale/de/LC_MESSAGES/findutils.mo 44457 /usr/share/locale/el/LC_MESSAGES/findutils.mo 34447 /usr/share/locale/eo/LC_MESSAGES/findutils.mo 24941 /usr/share/locale/es/LC_MESSAGES/findutils.mo 33712 /usr/share/locale/et/LC_MESSAGES/findutils.mo 36236 /usr/share/locale/fi/LC_MESSAGES/findutils.mo 37042 /usr/share/locale/fr/LC_MESSAGES/findutils.mo 20984 /usr/share/locale/ga/LC_MESSAGES/findutils.mo 24078 /usr/share/locale/gl/LC_MESSAGES/findutils.mo 35520 /usr/share/locale/hr/LC_MESSAGES/findutils.mo 37131 /usr/share/locale/hu/LC_MESSAGES/findutils.mo 20287 /usr/share/locale/id/LC_MESSAGES/findutils.mo 33636 /usr/share/locale/it/LC_MESSAGES/findutils.mo 28336 /usr/share/locale/ja/LC_MESSAGES/findutils.mo 1916 /usr/share/locale/ko/LC_MESSAGES/findutils.mo 2663 /usr/share/locale/lg/LC_MESSAGES/findutils.mo 6271 /usr/share/locale/lt/LC_MESSAGES/findutils.mo 1514 /usr/share/locale/ms/LC_MESSAGES/findutils.mo 34789 /usr/share/locale/nb/LC_MESSAGES/findutils.mo 35503 /usr/share/locale/nl/LC_MESSAGES/findutils.mo 35962 /usr/share/locale/pl/LC_MESSAGES/findutils.mo 35253 /usr/share/locale/pt/LC_MESSAGES/findutils.mo 36212 /usr/share/locale/pt_BR/LC_MESSAGES/findutils.mo 6589 /usr/share/locale/ro/LC_MESSAGES/findutils.mo 46244 /usr/share/locale/ru/LC_MESSAGES/findutils.mo 24148 /usr/share/locale/sk/LC_MESSAGES/findutils.mo 35181 /usr/share/locale/sl/LC_MESSAGES/findutils.mo 46489 /usr/share/locale/sr/LC_MESSAGES/findutils.mo 34848 /usr/share/locale/sv/LC_MESSAGES/findutils.mo 33280 /usr/share/locale/tr/LC_MESSAGES/findutils.mo 46292 /usr/share/locale/uk/LC_MESSAGES/findutils.mo 38059 /usr/share/locale/vi/LC_MESSAGES/findutils.mo 32873 /usr/share/locale/zh_CN/LC_MESSAGES/findutils.mo 13436 /usr/share/locale/zh_TW/LC_MESSAGES/findutils.mo 21948 /usr/share/man/man1/find.1.gz 5466 /usr/share/man/man1/xargs.1.gz 1808716
(note: regarding 1808667 vs. 1808716 discrepancy; it must be accounted to .build-id, it seems, EDIT: filed a bug)
1808667
1808716
.build-id
We can easily see that, barring find and xargs split, must-have portion is: 328944 + 80072 + 25 + 24 = 409065, or ~23% of the whole package.
find
xargs
Rest is:
documentation (perhaps except for %license droppable, see also rpm --nodocs): 1375+83731+4539+1539+2860+24251+89616+1878+2478+35149+21948+5466 = 274830, or ~15%
%license
rpm --nodocs
localization: the rest, i.e. 1124821, or ~62%
At least for quick containers, mockbuilds, etc. only about 1/4 of the packaged bits are useful. Would there be a room for improvement regarding minimization?
Something like rpm --locale-filter=CMD, perhaps?
rpm --locale-filter=CMD
Btw. this "content demultiplexing" is what I had in mind that would nicely combine with cleverly chunked RPMs: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/EQ2UDRE6NA7IUC7IA7VZMEHIUJQ7H2K6/
You can already set the %_install_langs rpm macro to only install the relevant files. This is common on container builds. Indeed, this is not save the download bandwidth. Only the disk space.
Oh, thanks, so that's something thought of, perfect! Just the interface towards users is rather buried.
This idea, beside eclipsed with %_install_langs as mentioned, is mostly surpassed with existing --excludepath that I missed originally (the idea was to "functionize" which language identifiers to allow, where CMD could be something like:
%_install_langs
--excludepath
CMD
cut -z -c1-1024 /etc/locale.conf /home/*/.config/locale.conf | xargs -0 -I{} sh -x -c "echo '{}' | sed -nE 's|^LANG=([\"]?)([[:alnum:]]+)[[:alnum:]._-]*\1|\2|p'"
). Problem with exclusion approach is that it's harder to work with than with the list of the desired languanges -- but making %_install_langs trigger something like the above command would be doable, nonetheless. Note: the command would need more hardenings for sure.
Sidenote, .build-id links will be explicitly avoidable at install time with --excludeartifacts option to rpm.
rpm
Log in to comment on this ticket.