This blog post has been referenced a few times on fedora-devel-list, along with the suggestion that its approach could be used in Fedora for libraries with hand-crafted AVX2 code, for example:
We used to use this facility on %ix86 for libraries built with SSE2 instructions. If I am reading sysdeps/x86/dl-procinfo.c and sysdeps/x86/cpu-features.c in the glibc sources correctly, then the following could be supported:
Other combinations could be supported, but I do not think any of them are useful.
My question is whether we want to support any of these in Fedora and, if so, which ones, and what restrictions or guidelines should be placed around their use. Also note that no package currently owns any of the magic directories. Since the filesystem package owns /usr/lib64, it seems like a natural choice.
I ask because I maintain a fairly large number of packages that can use the extended instructions if they are available, and the performance difference can be considerable in some cases. For example, the ntl package can use POPCNT, AVX2, FMA, and AVX512F instructions if they are available, and ntl sits underneath some of our heavy computational packages (e.g., Singular). I have been carrying a Fedora-specific patch for some years now that determines availability at runtime and enables the corresponding code, but upstream doesn't want to accept it, and the burden of porting to every new version of ntl is quite heavy. I would like to drop that patch and use this facility instead.
- Add /usr/lib64/haswell and /usr/lib64/haswell/avx512_1 to the filesystem package. Skip the xeon_phi directory for now, as it has more limited applicability.
- Add rpm macros %haswell_optflags and %haswell512_optflags containing modified -march and -mtune flags for each.
- Mention the directories and optflags in the guidelines, perhaps in or near https://docs.fedoraproject.org/en-US/packaging-guidelines/#_architecture_support. State that libraries installed in those directories MUST contain code that can make explicit use of one or more of the CPU instruction sets unique to that directory. Simply building every library multiple times with different -march flags in the hopes that it will perform better is not enough.
- It is tempting to require benchmarks showing a significant performance advantage, but then one has to deal with questions such as: "What constitutes signficant?", "Who is going to check the benchmark results?", "Whose hardware is going to run the benchmarks?", and "Who is going to check the benchmark code to see that it isn't cheating?" I think this is a rathole that nobody will want to go down.
The suggestion sounds reasonable.
My opinion: this seems reasonable but I think it's premature for us to think too hard about it until someone has actually discussed this with the glibc maintainers and verified that this all works as intended. When I've talked to glibc folks about it in the past, there didn't appear to be much confidence that what support exists in the glibc source should be used or would work properly.
We spoke about this at today's meeting:
Thank you for considering it. I wasn't aware of the glibc maintainer's attitude towards the facility. I'll be interested to hear their take on it.
to comment on this ticket.