#816 Add notes for changes to Python installation scheme
Merged 2 years ago by bcotton. Opened 2 years ago by bcotton.

@@ -3,3 +3,214 @@ 

  

  [[sect-python]]

  = Python

+ 

+ [[sect-install_scheme]]

+ == New installation scheme of Python packages

+ 

+ In Fedora Linux 36, Python changes the way installation paths of Python packages are handled.

+ These changes affect the main Python in Fedora 36, Python 3.10, as well as any newer Python version included.

+ Most Fedora Linux users should not be affected by the change but there are situations where there might be slight differences.

+ 

+ When Python packages are installed by `sudo pip`, `sudo python setup.py install` or similar methods, Python packages are installed to `/usr/local/lib(64)/python3.10/site-packages/`.

+ This has already been https://fedoraproject.org/wiki/Changes/Making_sudo_pip_safe[happening since Fedora Linux 27]. However the way this is achieved has been significantly re-implemented in Fedora Linux 36 and that has created several minor differences.

+ 

+ The `sysconfig` Python module from the standard library defines several *installation schemes*.

+ By default, the installation scheme used on Fedora 36 when installing Python packages using root privileges (for example via `sudo`) is `{prefix}/local/lib(64)/python3.10/site-packages/` (where `{prefix}` is defined as `/usr` by default).

+ When Python itself runs from a Python virtual environment or when building RPM packages, the installation scheme is `{prefix}/lib(64)/python3.10/site-packages/` (it does not include `/local/`).

+ 

+ Previously, the `/local/` part was only artificially added when installing packages, now it is part of the installation scheme.

+ This was changed to be more consistent with what other Python distributors are doing, so that the scheme is more likely to be accepted in upstream Python and to work well with upstream tools like setuptools or pip, which we cannot modify when they are installed or upgraded using pip directly from the https://pypi.org/[Python Package Index].

+ 

+ Here are the differences that might be observed with the new approach:

+ 

+ 

+ === `sysconfig.get_path(key)` returns paths with `/local/`

+ 

+ Previously, on Fedora Linux 35:

+ 

+ [source,python]

+ ----

+ >>> import sysconfig

+ >>> for key in sysconfig.get_path_names():

+ ...     print(f'{key} = {sysconfig.get_path(key)}')

+ ... 

+ stdlib = /usr/lib64/python3.10

+ platstdlib = /usr/lib64/python3.10

+ purelib = /usr/lib/python3.10/site-packages

+ platlib = /usr/lib64/python3.10/site-packages

+ include = /usr/include/python3.10

+ scripts = /usr/bin

+ data = /usr

+ ----

+ 

+ Now, on Fedora Linux 36 (except during RPM build):

+ 

+ [source,python]

+ ----

+ >>> import sysconfig

+ >>> for key in sysconfig.get_path_names():

+ ...     print(f'{key} = {sysconfig.get_path(key)}')

+ ... 

+ stdlib = /usr/lib64/python3.10

+ platstdlib = /usr/lib64/python3.10

+ purelib = /usr/local/lib/python3.10/site-packages

+ platlib = /usr/local/lib64/python3.10/site-packages

+ include = /usr/include/python3.10

+ scripts = /usr/local/bin

+ data = /usr/local

+ ----

+ 

+ The values now reflect the reality of where packages are actually going to be installed with pip, setuptools, distutils, etc.

+ However, if your Python code uses the values to determine where to load Python packages *from*, it won't see dnf-installed packages, which are installed in `/usr/lib(64)/python3.10/site-packages/`.

+ Generally, `sysconfig.get_path(key)` gives results that determine where the Python packages should be installed *to*.

+ To fix affected code, avoid assumptions that "where to install packages *to*" is the same as "where to load Python modules *from*".

+ 

+ Example fixes from affected open source projects:

+ 

+  - https://github.com/rhinstaller/anaconda/pull/3646[anaconda/dracut: Don't assume Python modules are in sysconfig.get_path('purelib')]

+  - https://github.com/rpm-software-management/dnf/pull/1782[dnf: Determine the default plugin path at configure time, rather than runtime]

+ 

+ If you need to restore the previous behavior of `sysconfig.get_path(key)`, you may explicitly select the `rpm_prefix` installation scheme:

+ 

+ [source,python]

+ ----

+ >>> for key in sysconfig.get_path_names():

+ ...     print(f'{key} = {sysconfig.get_path(key, scheme="rpm_prefix")}')

+ ... 

+ stdlib = /usr/lib64/python3.10

+ platstdlib = /usr/lib64/python3.10

+ purelib = /usr/lib/python3.10/site-packages

+ platlib = /usr/lib64/python3.10/site-packages

+ include = /usr/include/python3.10

+ scripts = /usr/bin

+ data = /usr

+ ----

+ 

+ However this installation scheme is entirely Fedora Linux 36+ specific and such code will not work on other operating systems (or even older Fedora releases).

+ 

+ 

+ === pip/setup.py installation with `--prefix`

+ 

+ When pip or `python setup.py` installation is invoked with the `--prefix` option, the `/usr` part of the standard installation path is replaced with the given `--prefix` value. Note that **`/local/` is not a part of the prefix** but a part of the installation scheme. Hence, [despite some quite reasonable expectations](https://bugzilla.redhat.com/show_bug.cgi?id=2026979), the following invocation:

+ 

+ ----

+ $ sudo pip install --prefix /usr Pello

+ ----

+ 

+ Will *still* install the `Pello` package to `/usr/local/lib/python3.10/site-packages/`.

+ 

+ And this:

+ 

+ ----

+ $ sudo pip install --prefix /usr/local Pello

+ ----

+ 

+ Will even install the `Pello` package to `/usr/local/local/lib/python3.10/site-packages/`.

+ 

+ The only supported way to explicitly install a Python package directly to `/usr/lib(64)/python3.10/site-packages/` is to build an RPM package with it and install it.

+ Python checks the `$RPM_BUILD_ROOT` environment variable and selects the `rpm_prefix` installation scheme when it is set.

+ 

+ Strictly for testing purposes, it is possible to set the variable outside of RPM build environment to simulate installation from an RPM package.

+ **Beware, this usage might have unexpected consequences** on a production system, including an entirely unrecoverable breakage.

+ 

+ ----

+ $ sudo env RPM_BUILD_ROOT=/ pip install Pello

+ ----

+ 

+ ==== Installation with `--prefix` and `--root`

+ 

+ The change in behavior also applies when a custom `--root` value is passed together with `--prefix`.

+ The following command will install `Pello` to `~/myroot/usr/local/lib/python3.10/site-packages/`:

+ 

+ ----

+ $ pip install --prefix /usr --root ~/myroot Pello

+ ----

+ 

+ To install it to `~/myroot/usr/lib/python3.10/site-packages/`, the simulated RPM environment can be used:

+ 

+ ----

+ $ RPM_BUID_ROOT=~/myroot pip install --prefix /usr --root ~/myroot Pello

+ ----

+ 

+ 

+ === RPM build–related caveats

+ 

+ When Python runs during RPM build, it selects the `rpm_prefix` installation scheme.

+ This behavior is triggered when the `$RPM_BUILD_ROOT` environment variable is set. That has several caveats:

+ 

+ ==== Executing Python in Python's subprocess

+ 

+ If the Python code that runs in RPM build (for example in `%check`) executes another Python instance via a subprocess, it is relatively easy to inadvertently unset all environment variables. When this happens, the *inner* Python will not know it runs within RPM build and will return paths with the `/local/` infix.

+ 

+ In the most trivial example, the surrounding environment variables are implicitly passed to subprocess and everything works as expected:

+ 

+ [source,python]

+ ----

+ >>> import os, subprocess, sys

+ >>> 'RPM_BUILD_ROOT' in os.environ

+ True

+ >>> command = [sys.executable, '-c', 'import sysconfig; print(sysconfig.get_path("purelib"))']

+ >>> subprocess.check_output(command)

+ b'/usr/lib/python3.10/site-packages\n'

+ ----

+ 

+ But when a custom environment is passed, it breaks the detection, because `$RPM_BUILD_ROOT` is no longer set:

+ 

+ [source,python]

+ ----

+ >>> subprocess.check_output(command, env={'FOO': 'bar'})

+ b'/usr/local/lib/python3.10/site-packages\n'

+ ----

+ 

+ A solution is to always make a copy of the surrounding environment, then editing it and passing it to the subprocess.

+ That is a generally valid Python advice.

+ 

+ [source,python]

+ ----

+ >>> env = os.environ | {'FOO': 'bar'}

+ >>> subprocess.check_output(command, env=env)

+ b'/usr/lib/python3.10/site-packages\n'

+ ----

+ 

+ Example fixes from affected open source projects:

+ 

+  - https://github.com/praiskup/argparse-manpage/pull/39[argparse-manpage: Preserve the environment when running pip or setup.py from within the tests]

+ 

+ ==== `%(...)` RPM macros

+ 

+ When RPM macros in the form of `%(command ...)` are expanded, `$RPM_BUILD_ROOT` is not yet set. Hence, Python does not know it is invoked from RPM build and the paths returned by `sysconfig.get_path(...)` contain `/local/`. To fix this, set  `$RPM_BUILD_ROOT` in the macro definition (to any value, even empty). For example a macro defined like this:

+ 

+ ----

+ %global python3_scripts_dir %(python3 "import sysconfig; print(sysconfig.get_path('scripts'))")

+ ----

+ 

+ Needs to be changed to this:

+ 

+ ----

+ %global python3_scripts_dir %(RPM_BUILD_ROOT= python3 "import sysconfig; print(sysconfig.get_path('scripts'))")

+ ----

+ 

+ The affected RPM macros supplied by Fedora's `python-rpm-macros` packages have all been https://src.fedoraproject.org/rpms/python-rpm-macros/pull-request/110[changed accordingly].

+ 

+ === Paths for bootstrapping Python virtual environments

+ 

+ If your Python code uses installation schemes to determine paths to be used in created virtual environments, and the Python interpreter executing that code does not run from a Python virtual environment itself, the paths will not match.

+ 

+ To bootstrap Python virtual environments, the code should use the `venv` installation scheme (but only if it exists).

+ 

+ [source,python]

+ ----

+ >>> scheme = 'venv' if 'venv' in sysconfig.get_scheme_names() else None

+ >>> sysconfig.get_path('purelib', scheme=scheme, vars={'base': '<venv>'})

+ '<venv>/lib/python3.10/site-packages'

+ >>> sysconfig.get_path('scripts', scheme=scheme, vars={'base': '<venv>'})

+ '<venv>/bin'

+ ----

+ 

+ The `venv` scheme is currently Fedora specific, but other Python distributors (such as https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa[Ubuntu Deadsnakes]) may define it as well.

The venv scheme on Python 3.10 is ... as well. The next major Python version (Python 3.11) will include the venv scheme by default.

+ Due to checking if the `venv` install scheme exists the code is functional on other operating systems as well, as it falls back to a backwards compatible behavior.

+ 

+ Example fixes from affected open source projects:

+ 

+  - https://github.com/pypa/virtualenv/pull/2209[virtualenv: Favor the "venv" sysconfig install scheme over the default and distutils scheme]

+  - https://github.com/pypa/build/pull/434[build: Prefer the venv installation scheme if it exists]

Looks good. Thanks. One update, will post inline, but can send a PR after this PR to change it.

The venv scheme on Python 3.10 is ... as well. The next major Python version (Python 3.11) will include the venv scheme by default.

Pull-Request has been merged by bcotton

2 years ago

I'm not sure what @pbokoc's process is for when release notes start getting published.