#352 BLAS and LAPACK packaging
Closed: Fixed None Opened 5 years ago by kkofler.

It looks like there is significant contention on how to best package different BLAS and LAPACK implementations. Susi Lehtola thinks this should be an FPC matter, so I am bringing this up here.

For some background, see the discussions in Fedora: [https://lists.fedoraproject.org/pipermail/devel/2013-September/189467.html]
and at OpenBLAS upstream: [https://github.com/xianyi/OpenBLAS/issues/296]. (I don't think this has been taken up with ATLAS upstream yet. It probably should.)

So in short (summing up the mailing list thread), the status quo for ATLAS in Fedora 18 and 19 is this: ATLAS ships a %{_libdir}/atlas* subdirectory (with a suffix depending on the precise subpackage (instruction set) selected), which contains:
libatlas.so.3
libcblas.so.3
libclapack.so.3
libf77blas.so.3
liblapack.so.3
libptcblas.so.3
* libptf77blas.so.3
This directory is prepended to the linker search path through /etc/ld.so.conf.d/, so anything linked against liblapack.so.3 will get the ATLAS implementation, but libblas.so.3 is not overridden that way, because only libatlas.so.3 is shipped. This is inconsistent and unexpected. What BLAS symbols you get can even depend on linking order (-llapack -lblas vs. -lblas -llapack)!

The new ATLAS packaging in Rawhide which started the mailing list thread changes this to use the shared library support upstream added in their latest release. This ships ATLAS as a monolithic libsatlas.so (serial) and libtatlas.so (threaded) instead. This means only stuff linked against ATLAS at compile time will pick ATLAS. So as a packager, one needs to pick a BLAS/LAPACK implementation at compile time, and probably patch the upstream build system, which almost certainly will '''not''' support the new libsatlas yet.

The situation with OpenBLAS is similar as with the new ATLAS: Upstream ships a monolithic libopenblas.so. We would have to make a decision at compile time and patch build systems to use it. For those who have not followed the developments there: GotoBLAS was released under a BSD-style license in 2011 (when Goto discontinued it), OpenBLAS is the community fork that continues development of the now Free codebase, making it an interesting alternative to ATLAS.

Another concern I have is library/symbol conflicts, if (through indirect linking) programs accidentally end up at runtime with BLAS and LAPACK implementations incompatible with the program (or one of its libraries) or with each other. All the libraries have different names, but export the same symbols, which just asks for symbol conflicts.

Therefore, I am proposing a guideline that would mandate packaging BLAS and LAPACK implementations the way ATLAS is now packaged in Fedora 18 and 19, except that libblas.so.3 '''MUST''' also be overridden. The advantages of that setup are obvious:
Packages can simply build against the reference BLAS and LAPACK, and will automatically end up with whatever optimized implementation is installed on the system: ATLAS (with or without SSE etc.), OpenBLAS (which should probably become our default on x86/x86_64, because it can do runtime CPU detection) or whatever third-party implementation the user installs on their own.
The BuildRequires also do not depend on the architecture. For x86, OpenBLAS is probably the best option nowadays, but e.g. s390x is not supported by OpenBLAS at all. If packages need to pick their BLAS/LAPACK at compile time, we'll end up with a big %ifarch mess.
Packages are binary-compatible with older Fedora releases (assuming we keep a libatlas.so.3 symlink around; we could even have OpenBLAS ship such a thing too to support the transition), with Debian, and with third-party packages expecting the common libblas.so.3 and liblapack.so.3 sonames.

Debian has patches to support that setup with the latest ATLAS:
[http://patch-tracker.debian.org/package/atlas/3.10.1-2]. (For OpenBLAS, they currently use a simpler workaround where they symlink both libblas.so.3 and liblapack.so.3 to the monolithic libopenblas instead of building truly separate libraries.)

The only alternative to the above proposal I can see is to ship one and only default implementation (probably OpenBLAS, due to the runtime detection support ATLAS is sorely missing), have it Obsolete all the others, and rebuild all the packages against the implementation we picked. It is just not practical to support multiple BLAS and LAPACK implementations in any other way than the above proposal.


I'll briefly stipulate my view as well.

First of all, this would only apply to the sequential libraries, which are slowly fading out, because architectures are nowadays multicore. You can't make the parallel versions switchable, because the parallellism is controlled in a different fashion in different libraries (environment variables), and because there are many approaches to parallellism that are mutually incompatible (i.e. if you use them together you'll wreak havoc on your performance).

IMHO the current (to be) scheme is much clearer, because you can be 100% sure at link time which library your program will be using. This is relevant e.g. if you want to do benchmarks, or if some library is misbehaving (e.g. there's a memory leak in OpenBLAS that upstream hasn't managed to fix, and it produced incorrect results at one time).

Linking -lsatlas or -lopenblas is much easier for the user than linking to -llapack -lblas. Also, the latter is something that you shouldn't accustom end users to, because -llapack -lblas is usually the netlib reference version, which is orders of magnitude slower than an optimized version. Changing the scheme to something else would only confuse the majority of the users and developers. And what is to guarantee that the liblapack and libblas will actually point to something else than the reference version?

This also has a degree of reproducibility. I don't really look forward to bug reports where the library can be different than the one I've explicitly chosen for the package.

Having had a touch in the majority of packages that use linear algebra in Fedora, I can tell you that ATLAS is already used in every package where performance matters. And, if you want out the last drop of performance from your hardware, there's nothing to prevent you from building a native ATLAS package for your machine, which is also a drop-in replacement - nothing to do on the Fedora side. IMHO Kevin's argument is nil also in this perspect.

The only issue I've seen in Kevin's arguments is that if you have multiple libraries that all link to BLAS/LAPACK routines, then you might end up in a clash if library A uses ATLAS and library B uses OpenBLAS. This is currently mostly a theoretical argument. Also, it can be overcome simply by establishing a guideline on what library should be used on Fedora. This might be OpenBLAS on ix86 and x86_64, and ATLAS on others. Regardless of the chosen approach, this WILL lead to conditionals in BLAS/LAPACK enabled spec files, e.g.

{{{
%ifarch %ix86 x86_64
BuildRequires: openblas-devel
%else
BuildRequires: atlas-devel
%endif
}}}

On top of that, picking the library to be either -lsatlas or -lopenblas is a trivial matter. If the project's configure does not support manual selection of the LAPACK/BLAS libraries, then the project is at fault, as there are many different vendor proprietary libraries out there that are (or at least have) been more efficient than OpenBLAS or ATLAS.

Lastly, because the clash is caused by having different implementations of the same standard, the only proper way to resolve it would be to follow the MPI route and encase everything with environment modules, which I think is overkill.

In summary: although the proposal does address a single real problem (possible library clash), it's 5-10 years late. Every performance issue has already been fixed in Fedora (everything uses ATLAS which you can easily replace with a native version). As multicore CPUs become the norm packages should switch to parallel libraries when relevant; and in any case these can't be made switchable. The proposed changes would only cause a major deal of confusion.

A concrete issue that kkofler proposal will fix is for example the Scilab case:

Scilab depends on arpack for some features and arpack maintainers select the atlas implementation as a BLAS/LAPACK one. As Scilab maintainer, I'm forced to use the atlas implementation and to track any modification on the arpack package to avoid library clash. '''This is not a theorical issue'''.

Another way to support this use case is the Debian approach. They use the ''alternatives'' system to manage the multiples implementations.

https://wiki.debian.org/DebianScience/LinearAlgebraLibraries

IMHO the current (to be) scheme is much clearer, because you can be 100% sure at link
time which library your program will be using.

Well, that's exactly its weakness: You have to make the decision at build (link) time.

Linking -lsatlas or -lopenblas is much easier for the user than linking to
-llapack -lblas.

Whether you use 1 or 2 -l commands isn't going to make a big difference effort-wise, and historically, BLAS and LAPACK have always been separate. The much bigger practical difference is that build systems of existing packages know about -llapack and -lblas, and if you're lucky about the old -latlas, but most definitely not about -lsatlas or -lopenblas.

Also, the latter is something that you shouldn't accustom end users to, because
-llapack -lblas is usually the netlib reference version, which is orders of magnitude
slower than an optimized version.

You missed the train by years there. -llapack -lblas is already what everything defaults to.

Changing the scheme to something else would only confuse the majority of the users and
developers. And what is to guarantee that the liblapack and libblas will actually point
to something else than the reference version?

That's really the most interesting question: How do we select a default version? Installing the optimized implementation by default through comps would be an option, but I'm not sure it's used widely enough for that to make sense. Otherwise, I guess we would have to play with the depsolver's resolution priority rules. Of course, that assumes we have an implementation with runtime CPU detection. With ATLAS, the user is really the only one who can pick the right package.

This also has a degree of reproducibility. I don't really look forward to bug reports
where the library can be different than the one I've explicitly chosen for the package.

Reports with unsupported libraries are what we have CLOSED CANTFIX for, this is already the situation with libGL, for example.

Having had a touch in the majority of packages that use linear algebra in Fedora, I can
tell you that ATLAS is already used in every package where performance matters.

And then you get an "optimized" build that doesn't even use the full instruction set of your machine, because ATLAS does not do any runtime CPU detection and thus we have to build the default package for the lowest common denominator.

And several of the packages that BR atlas-devel in Fedora only do so because our packaging of the reference implementation is not complete, we are missing at least cblas.

And, if you want out the last drop of performance from your hardware, there's nothing to
prevent you from building a native ATLAS package for your machine, which is also a
drop-in replacement - nothing to do on the Fedora side. IMHO Kevin's argument is nil also
in this perspect.

But we are still locked into ATLAS, which is not necessarily the fastest implementation. I have read several times that GotoBLAS/OpenBLAS is significantly faster at least on some machines. But even if we switch to OpenBLAS, maybe there are also machines where ATLAS performs better? (And I'm not even considering the third-party proprietary implementations there. I'm no fan of proprietary software.)

The only issue I've seen in Kevin's arguments is that if you have multiple libraries that
all link to BLAS/LAPACK routines, then you might end up in a clash if library A uses
ATLAS and library B uses OpenBLAS. This is currently mostly a theoretical argument. Also,
it can be overcome simply by establishing a guideline on what library should be used on
Fedora.

Yes, that would be a necessity, but we would also need to enforce it, and be prepared to patch build systems which do not know about the correct library to link to.

This might be OpenBLAS on ix86 and x86_64, and ATLAS on others. Regardless of the chosen
approach, this WILL lead to conditionals in BLAS/LAPACK enabled spec files, e.g.
{{{
%ifarch %ix86 x86_64
BuildRequires: openblas-devel
%else
BuildRequires: atlas-devel
%endif
}}}

And then the arch list changes from release to release depending on what gets added to OpenBLAS and/or ATLAS, and then maybe somebody comes up with a pure assembly BLAS for some architecture and we end up with another %ifarch etc.

And with my proposed approach of "build against the reference implementation and override it at runtime", there is no %ifarch needed, only:
{{{
BuildRequires: lapack-devel
}}}
(which also drags in blas-devel).

On top of that, picking the library to be either -lsatlas or -lopenblas is a trivial
matter. If the project's configure does not support manual selection of the LAPACK/BLAS
libraries, then the project is at fault, as there are many different vendor proprietary
libraries out there that are (or at least have) been more efficient than OpenBLAS or
ATLAS.

According to Sam Halliday (fommil), most of them actually install as libblas.so.3 and libatlas.so.3, due to the same binary compatibility concerns.

Many packages will just look for -llapack and -lblas, '''maybe''' -latlas, and that's it.

Lastly, because the clash is caused by having different implementations of the same
standard, the only proper way to resolve it would be to follow the MPI route and encase
everything with environment modules, which I think is overkill.

I don't like the MPI packaging approach at all. It is necessary there exactly because you have to pick the MPI implementation at compile time, and people could not agree on selecting only one for Fedora. With BLAS/LAPACK, it should be possible to build against the reference implementation and use another one at runtime.

In summary: although the proposal does address a single real problem (possible library
clash), it's 5-10 years late. Every performance issue has already been fixed in Fedora
(everything uses ATLAS which you can easily replace with a native version). As multicore
CPUs become the norm packages should switch to parallel libraries when relevant; and in
any case these can't be made switchable. The proposed changes would only cause a major
deal of confusion.

I'm only proposing '''one''' change compared to the ATLAS packages we are shipping in stable Fedora releases right now: add a symlink between libblas.so.3 and libatlas.so.3 (I don't care which is the canonical name as long as both work), to ensure libblas.so.3 will be overridden by ATLAS the way liblapack.so.3 already is. Anything else would stay exactly as it is in released Fedora.

Another way to support this use case is the Debian approach. They use the
''alternatives'' system to manage the multiples implementations.

I know, but I think ld.so.conf.d is a better/simpler approach than ''alternatives'', having the symlinks in %{_libdir} is not necessary.

Replying to [comment:7 kkofler]:

IMHO the current (to be) scheme is much clearer, because you can be 100% sure at link
time which library your program will be using.

Well, that's exactly its weakness: You have to make the decision at build (link) time.

Depends if you value predictability or not.

Linking -lsatlas or -lopenblas is much easier for the user than linking to
-llapack -lblas.

Whether you use 1 or 2 -l commands isn't going to make a big difference effort-wise, and historically, BLAS and LAPACK have always been separate. The much bigger practical difference is that build systems of existing packages know about -llapack and -lblas, and if you're lucky about the old -latlas, but most definitely not about -lsatlas or -lopenblas.

Packages where linear algebra performance matters who do not support a configure time option of the linear algebra library are pretty much broken by definition.

Also, the latter is something that you shouldn't accustom end users to, because
-llapack -lblas is usually the netlib reference version, which is orders of magnitude
slower than an optimized version.

You missed the train by years there. -llapack -lblas is already what everything defaults to.

Uhh what do you mean?

Also, incidentally I had to help a friend just yesterday with an Ubuntu system. Marvelously, the alternatives were configured in such a way that libblas was the OpenBLAS version, while liblapack was the ATLAS one. And this is exactly the system you are proposing - how do you intend to prevent this kind of a situation from happening?

Now, because the both ATLAS and OpenBLAS versions of LAPACK have some functions overridden with more efficient versions, it's clear that if you want to be able to swap the BLAS implementation, you're going to have to drop out the optimized LAPACK libraries altogether, and only ship the reference LAPACK library that needs to be combined with a BLAS library.

Changing the scheme to something else would only confuse the majority of the users and
developers. And what is to guarantee that the liblapack and libblas will actually point
to something else than the reference version?

That's really the most interesting question: How do we select a default version? Installing the optimized implementation by default through comps would be an option, but I'm not sure it's used widely enough for that to make sense. Otherwise, I guess we would have to play with the depsolver's resolution priority rules. Of course, that assumes we have an implementation with runtime CPU detection. With ATLAS, the user is really the only one who can pick the right package.

Exactly, so currently the default would be reference blas because of the shorter name of the package. By default, your proposal would suck for the majority of users.

This also has a degree of reproducibility. I don't really look forward to bug reports
where the library can be different than the one I've explicitly chosen for the package.

Reports with unsupported libraries are what we have CLOSED CANTFIX for, this is already the situation with libGL, for example.

"Unsupported libraries"?
I'd call this a pretty broken policy.

Having had a touch in the majority of packages that use linear algebra in Fedora, I can
tell you that ATLAS is already used in every package where performance matters.

And then you get an "optimized" build that doesn't even use the full instruction set of your machine, because ATLAS does not do any runtime CPU detection and thus we have to build the default package for the lowest common denominator.

And several of the packages that BR atlas-devel in Fedora only do so because our packaging of the reference implementation is not complete, we are missing at least cblas.

And, if you want out the last drop of performance from your hardware, there's nothing to
prevent you from building a native ATLAS package for your machine, which is also a
drop-in replacement - nothing to do on the Fedora side. IMHO Kevin's argument is nil also
in this perspect.

But we are still locked into ATLAS, which is not necessarily the fastest implementation. I have read several times that GotoBLAS/OpenBLAS is significantly faster at least on some machines. But even if we switch to OpenBLAS, maybe there are also machines where ATLAS performs better? (And I'm not even considering the third-party proprietary implementations there. I'm no fan of proprietary software.)

Again, it's been pointed out many times that if you want to crank out 100% of the speed in ATLAS, you need to recompile a native version for your machine.

The only issue I've seen in Kevin's arguments is that if you have multiple libraries that
all link to BLAS/LAPACK routines, then you might end up in a clash if library A uses
ATLAS and library B uses OpenBLAS. This is currently mostly a theoretical argument. Also,
it can be overcome simply by establishing a guideline on what library should be used on
Fedora.

Yes, that would be a necessity, but we would also need to enforce it, and be prepared to patch build systems which do not know about the correct library to link to.

Again, if you can't choose the library you want to use, the build system is pretty much broken and in need of a fix anyway.

On top of that, picking the library to be either -lsatlas or -lopenblas is a trivial
matter. If the project's configure does not support manual selection of the LAPACK/BLAS
libraries, then the project is at fault, as there are many different vendor proprietary
libraries out there that are (or at least have) been more efficient than OpenBLAS or
ATLAS.

According to Sam Halliday (fommil), most of them actually install as libblas.so.3 and libatlas.so.3, due to the same binary compatibility concerns.

Many packages will just look for -llapack and -lblas, '''maybe''' -latlas, and that's it.

Having been involved for years in scientific high performance computing, I can tell you that this is NOT the case. ACML is available as libacml and MKL as libmkl, '''there are no nor have been''' any libblas or liblapack symlinks.

In summary: although the proposal does address a single real problem (possible library
clash), it's 5-10 years late. Every performance issue has already been fixed in Fedora
(everything uses ATLAS which you can easily replace with a native version). As multicore
CPUs become the norm packages should switch to parallel libraries when relevant; and in
any case these can't be made switchable. The proposed changes would only cause a major
deal of confusion.

I'm only proposing '''one''' change compared to the ATLAS packages we are shipping in stable Fedora releases right now: add a symlink between libblas.so.3 and libatlas.so.3 (I don't care which is the canonical name as long as both work), to ensure libblas.so.3 will be overridden by ATLAS the way liblapack.so.3 already is. Anything else would stay exactly as it is in released Fedora.

... really? I think you've asked for a lot more in this thread, namely that packages be compiled against the reference version of BLAS and LAPACK, and that the more optimized implementations be switchable. Otherwise your demand doesn't even make sense - there'd be no sense in overriding libblas, because '''it has never been''' the ATLAS BLAS library.

The problem with your suggested approach is that it's more likely to cause problems than to solve any because by default the reference implementations would be used, causing a slowdown by an order of magnitude or two, and it can't be done for parallel libraries (which should be the norm nowadays).

Replying to [comment:7 kkofler]:

I'm only proposing '''one''' change compared to the ATLAS packages we are shipping in stable Fedora releases right now: add a symlink between libblas.so.3 and libatlas.so.3 (I don't care which is the canonical name as long as both work), to ensure libblas.so.3 will be overridden by ATLAS the way liblapack.so.3 already is. Anything else would stay exactly as it is in released Fedora.

Surely you mean symlink between libblas.so.3 and libf77blas.so.3. libatlas.so.3 is not equivalent.
Still I need advice how to do it correctly.

ln -s /lib64/atlas/libf77blas.so.3 /lib64/libblas.so.3; /sbin/ldconfig does not work for me. ldconfig -v then shows a weird broken link libf77blas.so.3 -> libblas.so.3.

ln -s /lib64/atlas/libf77blas.so.3 /lib64/atlas/libblas.so.3; /sbin/ldconfig does not work as well. ldconfig -v then shows nothing about libblas.

Also, incidentally I had to help a friend just yesterday with an Ubuntu system.
Marvelously, the alternatives were configured in such a way that libblas was the
OpenBLAS version, while liblapack was the ATLAS one. And this is exactly the system you
are proposing - how do you intend to prevent this kind of a situation from happening?

Simple: You don't use alternatives. :-)

Instead, you ship both libraries in one package and in one directory, with a file in ld.so.conf.d which drags the whole directory into the linker search path. Then there's no way to get only one of the 2 libraries unless you're really into messing things up (using rm -f on RPM-owned files or something similarly evil).

Now, because the both ATLAS and OpenBLAS versions of LAPACK have some functions
overridden with more efficient versions, it's clear that if you want to be able to swap
the BLAS implementation, you're going to have to drop out the optimized LAPACK libraries
altogether, and only ship the reference LAPACK library that needs to be combined with a
BLAS library.

No, see above. Just put both libblas.so.3 and liblapack.so.3 into the same directory and use ld.so.conf.d to override both of them at once.

"Unsupported libraries"? I'd call this a pretty broken policy.

Well, you want to desupport them even harder, by forcing people to recompile all their LAPACK-using packages to get another implementation.

ln -s /lib64/atlas/libf77blas.so.3 /lib64/atlas/libblas.so.3; /sbin/ldconfig does not
work as well. ldconfig -v then shows nothing about libblas.

(This one is the correct approach, you don't want to use /lib64/libblas.so.3 or you'll conflict with the reference blas instead of just overriding it.)

Try ldd on something that links libblas.so.3, I think that's more reliable than ldconfig.

Replying to [comment:13 kkofler]:

Also, incidentally I had to help a friend just yesterday with an Ubuntu system.
Marvelously, the alternatives were configured in such a way that libblas was the
OpenBLAS version, while liblapack was the ATLAS one. And this is exactly the system you
are proposing - how do you intend to prevent this kind of a situation from happening?

Simple: You don't use alternatives. :-)

Instead, you ship both libraries in one package and in one directory, with a file in ld.so.conf.d which drags the whole directory into the linker search path. Then there's no way to get only one of the 2 libraries unless you're really into messing things up (using rm -f on RPM-owned files or something similarly evil).

... and this will be done by all packages, which means that ''a priori'' you don't really know what library comes out on top...

Regarding to your complaint that the novel scheme requires packages that use -latlas (which is now broken since the library is -lsatlas), you'll still have to patch out all of these packages, since linking to -latlas will break your interchangeable libraries.

I'd like to hear what other people have to say as well. Orion? Spot?

As this was not mentioned explicitly: I would like to formally ask for a bundling exception.

FPC discussed this at today's meeting and found advantages to both positions. We decided to discuss this more when spot is around (he'll be out for next week's meeting as well) so that he can tell us whether copying the strategy in the MPI guidelines makes sense here.

Also:
@fklknav: We're not sure what you're asking to be able to bundle here...

Atlas source + Lapack source => Atlas binary.
This is the current upstream solution. Some of the alternatives mentioned in previous discussion also involve bundling.

You should be able to replace the lapack sources with binary copies from the lapack static library package.

I had the exact same situation with OpenBLAS, for which an exception was not even considered (see ticket #237).

But this is '''really''' off-topic for the current thread - fkluknav should open another ticket for the bundling exception.

If you ''do'' decide for enforcing LAPACK and BLAS library interchangeability, I suggest the implementation be done with environment modules (akin to the MPI modules). That way the libraries can be changed at the user level per session, and the user can be sure what libraries are used since only one implementation can then enabled at any one time.

One would then e.g.
{{{
$ module load lapack/atlas
}}}
or
{{{
$ module load lapack/openblas
}}}
or
{{{
$ module load lapack/reference
}}}
to get the ATLAS, OpenBLAS or reference implementation.

One library should always be loaded, which could be implemented by autoload of the relevant module. This could be implemented with dummy packages, e.g., atlas-autoload, openblas-autoload and lapack-autoload, which would be mutually conflicting, and whose effect would be a file in /etc/profile.d that loads the corresponding module.

At today's meeting:

info Approved (+1:6, 0:0, -1:0) FPC favors using environment modules for this as it seems to avoid the problem of applications which might want to link to one blas implementation but a dependent library is linked to another. If someone would present us with a draft based on that we'll be happy to review and vote on it.

The packaging committee is still awaiting additional information before it can properly address your request. Please provide that within the next month or this ticket will be closed. (Of course, you can always reopen it if the situation changes) Thanks!

No response; closing. Please feel free to reopen if you have more information to add.

Is it possible to re-open this with any different result? It's not clear to me whether the committee really understood the issue from the point of view of managing the sort of system on which BLAS is presumably most important. (I'm not convinced the MPI model is a good one, as someone who has to deal with it in packaging and operation, and MPI has been a disaster area recently in EL6.)

If so, what's the right way to do it -- open a new ticket or follow up on this one with a specific proposal?

For what it's worth, I have a workaround for the current mess in production. GEMM-based operations in dynamically-linked packages that aren't built against openblas become a factor of several faster on our sandybridge cluster.

Replying to [comment:25 loveshack]:

If so, what's the right way to do it -- open a new ticket or follow up on this one with a specific proposal?

Please file a new ticket.

Login to comment on this ticket.

Metadata