Issue #580: Limiting/reducing/rethinking our Python packaging/packages - NeuroFedora

neuro-sig / NeuroFedora

#580 Limiting/reducing/rethinking our Python packaging/packages

Opened 2 months ago by ankursinha. Modified a month ago

Hi folks,

I've been thinking of this a while but was waiting to get some feedback from folks at Flock etc. before putting this up for discussion. At the moment, we (the neuro-sig) is at ~500 packages. Many of these we've inherited, others we've taken over because they're deps for ours, and so on. It's quite a large number, and it can get a bit tricky to keep up with them, especially with newer releases of Python etc. when a bunch of things break simultaneously.

A lot of our packages are python libraries but I'm seeing more and more users just rely on pip (or anaconda etc.) to install their packages instead of using dnf installed packages. I was therefore, wondering if packaging up Python libraries off PyPi was worth really doing.

I managed to catch up with Karolina from the Python SIG here at Flock today (Miro couldn't make it, unfortunately), and I had a good chat with them about this particular issue. Karolina said that the Python SIG's priority tends to be to make sure that the core python packages work---i.e., the interpreter, pip, things like setuptools. For libraries etc., it is really up to us packagers to see if our use-cases required system packages, and I think it is unlikely that users will rely on system packages for their libraries, especially given that most upstream documentation will suggest using pip etc. directly. The other advantage of installing bits from pip is also that one can install different versions of packages in isolated virtual environments. This cannot be done with the dnf installed system packages.

So, I was wondering if in the future, we should consider only packaging python packages that are not available on Pypi, and perhaps slowly dropping our packages that are on pypi already. I.e., if something is pip installable, recommend that people use these directly off pypi.

Pros:

fewer packages to maintain
allows us to focus on other complex packages that are not installable off pypi (packages in other languages like c/c++ which require building from source)
?

Cons:

fewer neuro packages in Fedora
upstreams don't benefit from our contributions (for example when we jump to a new python version early in Fedora and send fixes upstream)
?

Things to keep in mind:

our users/researchers don't really care about the latest versions of Python (in fact they tend to use a stable release and stick to it until it's absolutely necessary to move to the next one)
for most python packages off pypi, users ask upstream for troubleshooting help directly anyway

Questions:

is there a use case where people/users will prefer a system installed version?

What do you think? (I may also discuss this with the Fedora python community to solicit wider community feedback, but I wanted our team's views first).

PS: the python sig have attempted to automatically convert all of pypi to rpms in a COPR, but Karolina said that the quality of packages there is not good enough to suggest its use.

music commented 2 months ago

As in Fedora at large, the main reason to package PyPI libraries in the neuro-sig should probably be to support packaged versions of Python-based tools and applications. If we want to maintain those, then we need their dependencies to be packaged too.

For example, python-fslpy ships a collection of command-line tools itself, plus it is a dependency for the FSLeyes application (python-fsleyes), for python-nifti-mrs (which ships its own collection of tools), and indirectly for the command-line tool spec2nii, so it has a pretty strong justification for spending time on it.

On the other hand, python-bioframe is a pure-Python library package that does not provide any command-line tools and is a leaf in Fedora, so – unless it’s a dependency for something someone is still working on packaging – it has the weakest justification.

There’s still a valid question for tools and applications: what kinds of tools and applications are most valuable for our users to have packaged directly in Fedora, versus installing them in a development virtualenv or using something like pipx or uvx/uv run?

Obviously, anyone can spend time packaging anything they want for any reason they want, but it does make sense to try to define and agree on the SIG’s goals, and to use them to prioritize the application of the limited shared pool of collaborative packaging effort.

Edited 2 months ago by music

ankursinha commented 2 months ago

Thanks for the quick reply, Ben.

I think our goal can be summarised as "make Fedora an excellent platform (if not the 'goto platform') for neuroscience". It's suitably (intentionally) vague, and the implementation details are something we need to figure out. :D

I agree with your view that PyPi libraries should be included as dnf packages primarily to support packaged versions of tools/applications. I think that's a good guideline to follow. So, things like bioframe should perhaps not be packaged, since users can install them directly off PyPi.

Looking at the particular example of spec2nii, I just noticed that it is also available on PyPi, and one can install it (and deps) using pip:

$ pip install spec2nii
...
Installing collected packages: pytz, tzdata, tqdm, six, pyyaml, pyparsing, pydicom, pillow, packaging, numpy, kiwisolver, fonttools, dill, cycler, scipy, python-dateutil, nibabel, h5py, contourpy, brukerapi, pandas, matplotlib, fslpy, pyMapVBVD, nifti-mrs, spec2nii
Successfully installed brukerapi-0.1.9 contourpy-1.3.2 cycler-0.12.1 dill-0.4.0 fonttools-4.58.1 fslpy-3.22.1 h5py-3.14.0 kiwisolver-1.4.8 matplotlib-3.10.3 nibabel-5.3.2 nifti-mrs-1.3.3 numpy-2.2.6 packaging-25.0 pandas-2.3.0 pillow-11.2.1 pyMapVBVD-0.6.1 pydicom-3.0.1 pyparsing-3.2.3 python-dateutil-2.9.0.post0 pytz-2025.2 pyyaml-6.0.2 scipy-1.15.3 six-1.17.0 spec2nii-0.8.6 tqdm-4.67.1 tzdata-2025.2

$ which spec2nii
~/.local/share/virtualenvs/spec2nii/bin/spec2nii

$ which fsl_abspath 
~/.local/share/virtualenvs/spec2nii/bin/fsl_abspat

So, is there a case for keeping it iinstallable via dnf, or is this another case where we should ask users to install it off PyPi?

I note that even NEURON can be installed using pip now---they've managed to build wheels sometime ago and these do provide the various commands/libraries and interfaces (but these won't follow our compiler flags and guidelines, of course). Other simulation engines, and non-python tools like NEST/Arbor/STEPS are not, though, and so they present a use case where it'll be beneficial for our users for us to make them available via dnf.

An edge case is fsleyes: fsleyes is also pip installable but requires gtk3-devel to build wxpython as part of the install process (wxpython don't provide wheels for Linux from the looks of it)

  Building wheel for wxpython (pyproject.toml) ... error
  error: subprocess-exited-with-error
...
  checking for GTK+ - version >= 3.0.0... no

So, we could say "we should package fsleyes", or we could say "we should clearly document how to install and run fsleyes on Fedora in docs". (or are there other solutions between these extremes?).

music commented 2 months ago

Looking at the particular example of spec2nii, I just noticed that it is also available on PyPi, and one can install it (and deps) using pip:

[…]

So, is there a case for keeping it iinstallable via dnf, or is this another case where we should ask users to install it off PyPi?

Right, and to add complexity to that particular case, the original goal of packaging it was as a dependency for bidscoin, which now has most of its dependencies packaged but is still on the NeuroFedora wish list. But bidscoin itself is on PyPI, and it’s perfectly possible to use it directly from PyPI:

$ uvx bidscoin --help
[…]
usage: bidscoin [-h] [-l] [-p] [-i NAME [NAME ...]] [-u NAME [NAME ...]] [-d FOLDER] [-t [TEMPLATE]]
                [-b BIDSMAP] [-c OPTIONS [OPTIONS ...]] [-r] [--tracking {yes,no,show}] [-v]
[…]
$ uvx --from bidscoin bidsmapper --help
usage: bidsmapper [-h] [-b NAME] [-t NAME] [-p NAME [NAME ...]] [-n PREFIX] [-m PREFIX] [-u PATTERN]
                  [-s] [-a] [-f] [--no-update]
                  sourcefolder bidsfolder
[…]
$ pipx install bidscoin
[…]
$ bidsmapper --help
usage: bidsmapper [-h] [-b NAME] [-t NAME] [-p NAME [NAME ...]] [-n PREFIX] [-m PREFIX] [-u PATTERN]
                  [-s] [-a] [-f] [--no-update]
                  sourcefolder bidsfolder
[…]

There’s a lot of nuance here, and a lot of room for positions between minimalist (package nothing that is on PyPI unless absolutely forced to) and maximalist (package everything the light touches).

In general, I personally find some value in having command-line tools packaged so that people can use them without caring what language they are written in, and without having to use and understand language-specific tools and repositories. However, distribution packaging is certainly much more valuable for software that is difficult to install, with things like system-wide configuration files, daemons, and awkward dependencies, and for software that is not available directly from language-specific indexes like PyPI.

ankursinha commented 2 months ago

I guess the goal for us, for example, in this case is:

"Ensure that bidscoin/spec2nii/fsleyes/... can be used easily on Fedora (current Fedora releases)".

This can be achieved in multiple ways:

package it all up (current strategy): packaging gives us greater control---we run tests etc. and that gives us confidence that it all works
install all of this stuff from pypi, test that it installs, and runs

We know how do the first one well, since that's how we do it now.

The second moves us more towards testing than maintaining. We know how to do it manually---install, run the command lines, perhaps run import checks for libraries (run unit tests?)---but I reckon it's possible to come up with a pipeline to automate this process to scale it to a large number of packages (at least Python to begin with) and generate a searchable list that can be published on our docs. (if it works, we could even do this for non neuro-sig packages as a Fedora wide thing?)

Here's an incomplete list of python packages that we maintain. It's incomplete because it's only checking for packages named python-*, so it'll miss "applications" like NEURON, but it should include all libraries:

https://pagure.io/neuro-sig/NeuroFedora/blob/main/f/package-list/python-list.txt

I wonder if a middle ground would work (leaning slightly more towards "don't package unless necessary). Something like this:

if a package is not installable from a software ecosystem forge, like pypi, it is worth packaging. Eg: NEST/Arbor/other non-python software that needs to be built from source
if a package is directly installable from a forge, consider if packaging it is necessary (what is the use case for a user preferring/requiring a dnf installed version of this software?)
- instead consider testing it to ensure that it is installable/usable on Fedora + documenting this in the neuro-sig documentation
- avoid packaging libraries, unless they are dependencies of other software that should be packaged

Whatever we come up with, they will not be rules. We'll leave it to individuals to decide where on the spectrum of "package everything ---- don't package anything that's on PyPi) they prefer to be for each package.

I think I'd personally lean more towards packaging less, and focussing on packages that do need to be dnf installable---primarily because I'm struggling with time, and I'm not necessarily seeing advantages of the "package everything" approach---at least not anecdotally.

I'd love to hear from more packagers here. Is it worth putting this on the python-sig list for feedback too?

ankursinha commented 2 months ago

I spoke to more folks here at the packaging workshop. Fabio pointed out that one important purpose of distribution packages is that it enables us maintainers to push updates to users in a timely manner (ideally/theoretically). When installing from forges directly, there doesn't seem to be a way of users being notified if packages that they're using have new updates. (Users have to do it themselves. For example, for Python, one needs to go: pip list --outdated)

This certainly applies to lower level libraries such as ITK etc. (in addition to languages where everything is always statically compiled---rust (which is where Fabio was coming from) and golang), where we will continue to provide system packages anyway. I'm uncertain if it applies to high level Python libraries. With them, in the general case, users may not want these libraries to be automatically updated to begin with.

ankursinha commented 2 months ago

Bugs from the Python mass-rebuild have just been filed. I'll wait a few more days for feedback here before working on them.

@gui1ty : not seen you around the past few days, but would love to hear what you think, since you help with so many of the packages.

ankursinha commented 2 months ago

Hrm, i haven't seen @gui1ty around in the past couple of weeks. We can wait a few more days for more input.

Here's what I have in mind:

make a list of all python packages of interest to us
write scripts to:
- test whether they install on Fedora releases via pip
- check if they function correctly (import tests, unit tests: need to think of how this will be done)
- generate a status table that can be added to our docs
- include any additional steps required to use these packages in the table (eg: some packages need wxpython that can't be installed via pip)
orphan these packages

We'll hold on to packages that are not installable by pip.

I think one should be able to use tmt for this, and if these are found to be generally useful by the community, we can run these on the testing farm infrastructure too.

What do you think? I'll e-mail all sig members with these updates too, so everyone is aware and has a chance to chime in.

gui1ty commented 2 months ago

I don't think I have a strong opinion on the matter either way. As I'm not in the field of science, I have no use case myself for the software we provide as part of Neuro Fedora, nor do I know how others approach installing (Python) software in scientific environments. Let me provide some general feedback on the concerns raised.

Number of packages maintained by SIG

I think it makes sense reducing the number of packages maintained by the SIG. With only three active members in the SIG currently trying to stay on top of 500+ packages can prove challenging.

What to package

I agree that we should prioritize packages that are a direct dependency of some other package which cannot be directly installed from an upstream index. Maybe packages providing common command line tools might also be worth considering.

As to the mix of packages we currently have, I believe a good chunk is test only dependencies for other packages.

Having a good guideline of what we want to package will be helpful.

Instructions for local installation

Maybe we should decide on a specific tool (pipx, uv) that we recommend and write instructions for when it comes to installing packages / libraries from upstream repos locally. Much like Python packaging tools, this will likely boil down to personal preferences. However, I think supporting (and understanding) one tool well is better than supporting many tools half-heartedly.

Testing and updating

I'm not sure how I feel about manually testing packages. As part of packaging Python software for Fedora it certainly made sense to put some effort into running tests and figuring out and reporting on issues with tests. It is our responsibility as packagers to make sure the RPM packages we ship are usable and compatible with other RPM packages. When it comes to local installation my first instinct is that the onus is with the user.

A good compromise might be providing information and recommendations in the documentation regarding running tests. From my experience the degree of difficulty may vary greatly depending on upstream documentation as well as packaging quality. I'm thinking of a clear distinction between runtime and test dependencies as well as linters.

Feedback

In addition to gathering feedback from the wider Python packaging community in Fedora it may also be worth reaching out to other distros. For example, Debian has a neuro sciences SIG as well. I would be interested to learn what their approach / motivation is as to what to package and what not.

ankursinha commented 2 months ago

Hi @gui1ty , thanks for your comment.

I agree with your notes. How does this sound for a general SIG packaging guideline:

Prioritise software that cannot be installed from an upstream index

Apart from this, I think we can leave individual packages up to individual maintainers. For example, if one uses a tool regularly and thinks there's value in having it as a system package, one can take it on. This ensures that we follow the general community guideline of packaging + maintaining what we use---which will also mean that we test out our packages properly and have a vested interest in maintaining them. If people think command line tools are worth packaging, by all means, they can maintain them---and the team will collectively help out as we do now.

I would like to test if packages installed from indexes also work on Fedora, but not manually. Only if I can figure out if this can be automated in some way will this be doable. Perhaps this can be a "nice to have" that we can think about in phase 2, while phase 1 can be:

Get a list of all our packages (done here already: https://pagure.io/neuro-sig/NeuroFedora/blob/main/f/package-list/list.txt
Automate testing installation of those that can be installed from packaging indexes
Document these
Orphan them and packages included only as their dependencies in Fedora
Retain packages that cannot be installed easily from packaging indexes

I'm personally not bothered if people use uv or pip---they both pick from upstream indexes and should do the same thing at the end based on the python packaging guidelines. The difference, from what I see, is the interface/features and speed/performance. We should be able to test/document both if we want, or we can test with pip as the "default" and let users do uv on their own. (I know @music maintains uv and the python and rust sigs are involved too, so it's in good hands :P)

How does that sound?

I can post on python-devel to announce/gather feedback on our intentions too, but at the end of the day, it's our decision, since we're going to have to do the work :)

NeuroDebian: my understanding has always been that neurodebian is more neuro-imaging focussed. For example, they don't seem to have any packages related to computational modelling at all. Their package list is also relatively small compared to ours:

http://neuro.debian.net/pkglists/toc_all_pkgs.html#toc-all-pkgs

ankursinha commented 2 months ago

Post to python-devel: https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org/message/L6M4ISCAGCLGPWIGN3PLDG7YUIFUIYC6/

gui1ty commented 2 months ago

I think the big picture is clear and I'm okay with your proposal. Having had some time to ponder, I do have some practical questions. I'll jot 'em down for discussion later.

Regarding NeuroDebian: I knew it existed and I had looked into one or two of their packages at some stage for inspiration, patches or the like. I wasn't aware they are focusing on neuro imaging. I suppose their approach applies to us as well when it comes to the packages not directly installable from upstream indices. That is the trivial Python packages are required dependencies for some other non-trivial package.

ankursinha commented 2 months ago

Sounds good. I'll go make a list of our python packages so that we can see which ones can be dropped in favour of direct installation from pypi.

ankursinha commented 2 months ago

Based on this script (I think it should catch most cases?)

#!/bin/bash

# Copyright 2025 Ankur Sinha
# Author: Ankur Sinha <sanjay DOT ankur AT gmail DOT com> 
# File : get-python-packages.sh
#


# empty out files
echo > python-devel-reqs.txt
echo > python-devel-reqs-leaves.txt

while read package; do
    echo "** Checking ${package}"
    # dnf repoquery is incomplete because sometimes the subpackage requires python3-devel..
    re="python"
    if [[ ${package} =~ $re ]] || dnf repoquery --requires "${package}" --srpm --quiet | grep "python3-devel"
    then
        echo "Requires python"
        echo "${package}" >> python-devel-reqs.txt

        echo "Checking for leaf"
        DEPS=( $(fedrq whatrequires-src -X "${package}") )
        if [ 0 == ${#DEPS[@]} ]
        then
            echo "Yep, is leaf!"
            echo "${package}" >> python-devel-reqs-leaves.txt
        fi
    else
        echo "Does not require python"
    fi
done < list.txt

We get the following list of python packages that are leaf packages:

COPASI
cffconvert
dlib
dolfin
getdp
moose
openmeeg
pydeps
pyplane
python-PyLEMS
python-PyLink
python-SALib
python-airspeed
python-amply
python-annarchy
python-autograd
python-bioframe
python-bioread
python-bluepyopt
python-chaospy
python-cro
python-cyipopt
python-dandischema
python-datrie
python-devicely
python-editdistance
python-elephant
python-ephyviewer
python-exdir
python-fsleyes
python-git-changelog
python-glfw
python-glymur
python-gradunwarp
python-grip
python-hdf5storage
python-hdfs
python-imbalanced-learn
python-intern
python-irodsclient
python-klusta
python-lazy-ops
python-lqrt
python-matplotlib-venn
python-maya
python-missingno
python-mne-bids
python-moss
python-multiecho
python-neatdend
python-netpyne
python-neurodsp
python-neurom
python-neurotune
python-niaclass
python-nipype
python-nixio
python-odml
python-openctm
python-outdated
python-palettable
python-pingouin
python-plotnine
python-probeinterface
python-pyABF
python-pyactivetwo
python-pycatch22
python-pydapsys
python-pydotplus
python-pyfim
python-pygiftiio
python-pylatex
python-pynetdicom
python-pynn
python-pynsgr
python-pyopengltk
python-pyphi
python-pyriemann
python-pysb
python-pysdl2
python-pyspike
python-pyswarms
python-pytest-lazy-fixture
python-pyunicorn
python-pyvhacd
python-pyxdf
python-pyxid
python-ratelimiter
python-ratinabox
python-read-roi
python-resumable-urlretrieve
python-scipy-doctest
python-sciunit
python-simframe
python-sklearn-genetic
python-sklearn-genetic-opt
python-sklearn-nature-inspired-algorithms
python-snakemake-executor-plugin-azure-batch
python-snakemake-executor-plugin-flux
python-snakemake-executor-plugin-kubernetes
python-snakemake-executor-plugin-slurm
python-snakemake-executor-plugin-tes
python-snakemake-storage-plugin-azure
python-snakemake-storage-plugin-ftp
python-snakemake-storage-plugin-gcs
python-snakemake-storage-plugin-webdav
python-snakemake-storage-plugin-xrootd
python-snakemake-storage-plugin-zenodo
python-spyking-circus
python-steps
python-stopit
python-toposort
python-tvb-data
python-tvb-gdist
python-vascpy
shybrid
smoldyn
spec2nii

music commented 2 months ago

I attempted a categorization or initial triage of the above list.

The following are not really Python packages, but (mostly C++) packages that offer Python bindings. They should not be considered under this proposed realignment.

COPASI
dlib
dolfin
getdp
smoldyn

The following are no longer leaf packages in Rawhide and should be retained:

python-scipy-doctest

The following are desktop/GUI tools, which may argue in favor of retaining them:

pyplane: @ankursinha
python-ephyviewer: @gui1ty
python-fsleyes: @ankursinha
python-spyking-circus (circus-artefacts, circus-folders, circus-gui-matlab, circus-gui-python, circus-multi, spyking-circus, spyking-circus-launcher, spyking-circus-subtask): @ankursinha
shybrid: @music

The following are Python command-line tools can be used effectively from PyPI via something like uvx, pipx, or manual pip/uv installation into a temporary virtualenv. They might be candidates for dropping/orphaning under this realignment, depending on their primary maintainers’ wishes. If there are packages where we don’t want to dedicate NeuroFedora resources but the primary maintainer still wants the package, we can ask for the neuro-sig group to be removed from the package.

cffconvert: @iztokf
pydeps: @lbazan
spec2nii: @music : would also be required if we finish packaging bidscoin, https://pagure.io/neuro-sig/NeuroFedora/issue/500

The following are Python libraries that also offer command-line tools, and the command-line tools appear to be usable from PyPI via something like uvx, pipx, or manual pip/uv installation into a temporary virtualenv. They might be candidates for dropping/orphaning under this realignment, depending on their primary maintainers’ wishes. If there are packages where we don’t want to dedicate NeuroFedora resources but the primary maintainer still wants the package, we can ask for the neuro-sig group to be removed from the package. I didn’t necessarily check whether enabling extras works well (e.g. uvx --from mne-bids[full] mne_bids or pipx run mne-bids[full]), and I didn’t try anything more than printing the usage message. The details of these should probably be considered on a case-by-case basis.

python-PyLEMS (pylems) @ankursinha
python-SALib (salib) @ankursinha
python-bioread (acq2hdf5, acq2mat, acq2txt, acq_info, acq_layout, acq_markers) @music
python-git-changelog (git-changelog) @topazus
python-gradunwarp (gradient_unwarp: when run from PyPI, gradient_unwarp.py) @ankursinha
python-grip (grip) @gui1ty
python-mne-bids (mne_bids) @ankursinha
python-multiecho (mecombine) @music : would also be required if we finish packaging bidscoin, https://pagure.io/neuro-sig/NeuroFedora/issue/500
python-nipype (nipypecli) @ankursinha
python-pynetdicom (echoscp, echoscu, findscu, getscu, movescu, qrscp, storescp, storescu) @alciregi
python-pynsgr (nsgr_job, nsgr_submit) @ankursinha
python-pysb (pysb_export) @zbyszek
python-sciunit (sciunit) @ankursinha
python-vascpy (vascpy) @gui1ty

The following are Python libraries that also offer command-line tools, but there may be some difficulties in using them from PyPI via something like uvx, pipx, or manual pip/uv installation into a temporary virtualenv. At least on Python 3.13, the tools might not work as expected or might fail due to obsolete imports, or there might be PyPI dependencies that do not have binary wheels and have to be compiled at install time. These probably merit closer consideration. In some cases the difficulties may indicate that these should be kept, because they are useful but difficult to install and run, and in other cases the difficulties may suggest that it might be time to drop a package because it’s not adequately maintained upstream.

python-bluepyopt (bpopt_tasksdb) @ankursinha
python-glymur (jp2dump, jpeg2jp2, tiff2jp2) @music
python-hdfs (hdfscli, hdfscli-avro) @ankursinha
python-klusta (klusta) @ankursinha
python-moss (check_mni_reg, recon_status, recon_movie, recon_process_stats, ts_movie, warp_qc) @ankursinha This package last released upstream in 2017, and the spec file needs significant attention if we want to keep it in Fedora.
python-neatdend (compilechannels) @ankursinha
python-nixio (nixio) @ankursinha
python-pyxdf (pure Python), @music : This only “kind of” offers a CLI: python3-pyxdf-examples includes a couple of tools that can be run with e.g. python3 -m pyxdf.cli.print_metadata, and this usage is documented in the readme. I have included it in this category because these can be run from a virtualenv, but aren’t a good fit for pipx/uvx.

The following is a Python library that also includes a Jupyter Notebook extension that is installed system-wide.

python-exdir @ankursinha

The following are Python libraries that don’t offer command-line tools and are leaf packages (verified in F42 to make sure we don’t miss dependencies broken in the Python 3.14 transition). If they are not part of a packaging project we are still working on, they are particularly strong candidates for dropping under this realignment.

python-PyLink (pure-Python), @ankursinha
python-airspeed (pure-Python), @ankursinha
python-annarchy (C++/Cython), @gui1ty
python-autograd (pure Python), @gui1ty
python-bioframe (pure Python), @gui1ty
python-chaospy (pure Python), @lbazan
python-cro (pure Python), @iztokf
python-cyipopt (C++/Cython), @music
python-dandischema (pure Python), @mithunveluri
python-elephant (includes C++), @ankursinha
python-glfw (pure Python + ctypes), @gui1ty
python-hdf5storage (pure Python), @ankursinha
python-imbalanced-learn (pure Python), @iztokf
python-intern (pure Python), @music ; originally packaged for https://pagure.io/neuro-sig/NeuroFedora/issue/542
python-lazy-ops (pure Python), @ankursinha
python-lqrt (pure Python), @ankursinha
python-missingno (pure Python), @lbazan
python-netpyne (includes C++), @ankursinha
python-neurotune (pure Python), @ankursinha
python-niaclass (pure Python), @iztokf
python-outdated (pure Python), @ankursinha
python-palettable (pure Python), @gui1ty
python-pingouin (pure Python), @ankursinha
python-plotnine (pure Python), @gui1ty
python-probeinterface (pure Python), @music
python-pyABF (pure Python), @gui1ty
python-pyactivetwo (pure Python), @ankursinha
python-pycatch22 (includes C), @music
python-pydapsys (pure Python), @gui1ty
python-pydotplus (pure Python), @ankursinha
python-pyfim (Cython), @ankursinha
python-pygiftiio (pure Python), @ankursinha
python-pylatex (pure Python), @ankursinha
python-pynn (includes C++ and MPI), @ankursinha
python-pyopengltk (pure Python with ctypes), @gui1ty
python-pyriemann (pure Python), @ankursinha
python-pysdl2 (pure Python with ctypes), @gui1ty
python-pyspike (Cython), @ankursinha
python-pytest-lazy-fixture (pure Python), @ankursinha
python-pyunicorn (Cython), @ankursinha
python-pyxid (pure Python), @ankursinha
python-ratinabox (pure Python), @music ; originally packaged for https://pagure.io/neuro-sig/NeuroFedora/issue/527
python-read-roi (pure Python), @ankursinha
python-resumable-urlretrieve (pure Python), @lbazan
python-simframe (pure Python), @iztokf
python-sklearn-genetic (pure Python), @iztokf
python-sklearn-genetic-opt (pure Python), @gui1ty
python-sklearn-nature-inspired-algorithms (pure Python), @iztokf @ankursinha
python-steps (C++/Cython), @ankursinha
python-tvb-data (pure Python), @ankursinha
python-tvb-gdist (C++/Cython), @ankursinha

The following packages were already at least orphaned, and in most cases retired, for F43 or earlier.

python-amply
python-datrie
python-devicely
python-editdistance
python-matplotlib-venn
python-maya
python-neurodsp
python-odml
python-openctm
python-pyphi
python-pyswarms
python-pyvhacd
python-ratelimiter
python-stopit
python-toposort

The following packages are Snakemake plugins, or were packaged to support a Snakemake plugin. They should be retained, assuming we want to keep Snakemake.

python-irodsclient @music : packaged as a dependency for a future python-snakemake-storage-plugin-irods
python-snakemake-executor-plugin-azure-batch
python-snakemake-executor-plugin-flux
python-snakemake-executor-plugin-kubernetes
python-snakemake-executor-plugin-slurm
python-snakemake-executor-plugin-tes
python-snakemake-storage-plugin-azure
python-snakemake-storage-plugin-ftp
python-snakemake-storage-plugin-gcs
python-snakemake-storage-plugin-webdav
python-snakemake-storage-plugin-xrootd
python-snakemake-storage-plugin-zenodo

music commented 2 months ago

Based on that analysis, the packages that I personally should consider dropping in the short term are python-cyipopt, python-intern, python-probeinterface, python-pycatch22, python-ratinabox, and python-pyxdf.

I’ll probably keep these for now:

python-glymur’s CLI tools still seem useful and it doesn’t install well from PyPI
spec2nii and python-multiecho stay for now because of the possible bidscoin package
python-bioread could probably go, but it’s not hard to maintain, and it’s not bringing in any weird dependencies that we otherwise wouldn’t need
shybrid installs OK, but is a GUI app, and it’s nice to at least have a real .desktop launcher; it also hasn’t been much trouble and doesn’t keep weird dependencies around

gui1ty commented 2 months ago

Thanks for the overview, @music. I do have a question regarding the Python libraries list: What's your definition of pure Python?

I thought pure Python referred to packages only using libraries that are shipped with Python itself. There are packages on the list depending on third party libraries/modules. Hence my question.

Some other questions I wanted to ask:

How are we going to approach venvs?

Will it be one venv per package? Or will it be one big venv for all things neuro science? I suppose the latter is not feasible, because of version conflicts in dependencies since we are no longer or less in control of upstream version pinning. Having one venv per package undoubtedly will increase duplication of packages as the number of installed packages increases. I'm thinking of very common large(r) dependencies like NumPy, SciPy, Pandas, Matplotlib, etc.

What Python version to use for testing?

Currently the Fedora release dictates what Python version all Python modules are built against. Thanks to the work of the Python SIG and Python package maintainers, this closely follows upstream's release cycle. However, upstream maintainers are not always interested in the most recent Python releases and are sometimes slow to adapt to changes introduced in recent releases. Are we going to follow the bleeding edge here or will we stick to the oldest supported version? By that I mean the default Python version in the oldest still supported Fedora release. Currently that would be Python 3.13. Or do we follow the default that is shipped with a particular Fedora release? Tht would mean we'd have to test on multiple versions.

Earlier today I looked into an issue with the latest release of mizani. I seized the opportunity to install it locally in a Python 3.14 venv. Simply running pip install . didn't work since mizani depends on SciPy and there appears to be no binary package (yet?). Thus I couldn't use --prefer-binary. Installing as a dependency of mizani failed due to OpenBLAS not being found. In Fedora SciPy is built with FlexiBLAS. To make this work, flexiblas-devel needs to be installed and two options need to be passed on to the build backend (-Csetup-args=-Dblas=flexiblas -Csetup-args=-Dlapack=flexiblas) and SciPy needs to be installed separately in the venv.

Would this be part of documenting how to install mizani? Or would this provide justification to keep mizani as a system installable RPM package? In the former case, how do we ensure that we catch all required dependencies? I tripped over OpenBLAS/FlexiBLAS, but there are other dependencies which I just happened to have installed?

music commented 2 months ago

Thanks for the overview, @music. I do have a question regarding the Python libraries list: What's your definition of pure Python?

I thought pure Python referred to packages only using libraries that are shipped with Python itself. There are packages on the list depending on third party libraries/modules. Hence my question.

By pure Python, I mean (and I understand this to be the usual meaning) that a package is written solely in Python, without including anything in other languages (C, C++, Cython), particularly those that would be compiled into a shared-library extension module. I would tend to exclude those that wrap shared libraries with ctypes due to the tight coupling with a dependency outside the Python ecosystem, but this is a marginal case. Pure Python packages may certainly have dependencies on other packages that do contain compiled code, like pandas or numpy.

Some other questions I wanted to ask:

How are we going to approach venvs?

Will it be one venv per package? Or will it be one big venv for all things neuro science?

We can’t and won’t ship venvs. My assumption is that we would test individual tools in their own venvs. If we wanted to make everything work together in one environment, then we might as well just package everything. At least then we would actually control it.

I suppose the latter is not feasible, because of version conflicts in dependencies since we are no longer or less in control of upstream version pinning. Having one venv per package undoubtedly will increase duplication of packages as the number of installed packages increases. I'm thinking of very common large(r) dependencies like NumPy, SciPy, Pandas, Matplotlib, etc.

The idea as I understand it is not that we ship venvs with bundled dependencies (which is problematic in many ways from a guidelines perspective, plus it runs into the technical barrier that virtualenvs are not relocatable), but that we would stop packaging some things and just tell people to install some things directly from PyPI themselves, perhaps using one of the tools like pipx or uv run/uvx that can help manage tools installed this way.

What Python version to use for testing?

Currently the Fedora release dictates what Python version all Python modules are built against. Thanks to the work of the Python SIG and Python package maintainers, this closely follows upstream's release cycle. However, upstream maintainers are not always interested in the most recent Python releases and are sometimes slow to adapt to changes introduced in recent releases. Are we going to follow the bleeding edge here or will we stick to the oldest supported version? By that I mean the default Python version in the oldest still supported Fedora release. Currently that would be Python 3.13. Or do we follow the default that is shipped with a particular Fedora release? Tht would mean we'd have to test on multiple versions.

Good question!

ankursinha commented 2 months ago

Thanks for that @music. I've started a shared doc here now, since that'll be a little bit easier than comments on pagure:

https://hackmd.io/@sanjayankur31/HJ8V-_-Qee

Please edit it as required

Venvs: yes, we won't ship these---I didn't think we could. The idea is to test that packages can be installed on Fedora installations, ideally in virtual envs. (I know some folks that don't use venvs and just install everything into user directories, but as we know that this is bad practice, we will not recommend this)

What python version(s) to use for testing is a good question indeed. The default answer could be "whatever versions are supported on Fedora" as we do follow upstream Python quite closely---the only caveat being that we include the newest version of Python before it's released.

https://devguide.python.org/versions/

From what I see, most researchers aren't really too quick to jump to the latest python because it generally takes the various packages some time to catch up. If we want to continue testing with the latest python to help/inform upstreams, we can do so, but we can assume that most of our users will not be on the latest Python (rather, they'd be on a "stable" python).

How about perhaps (and this can be tweaked as we go): "default python version in Fedora rawhide and the previous 4 releases"? For example, we're at Python 3.14 in Fedora (even though it hasn't been released), so we test on 3.10--3.14? (or would that be too much?))

@music: did you write any scripts etc. for your thorough analysis, or was it manually for the moment?

Action items for me:

I'm going to tinker with some scripting/automation using tmt to see if I can hack up a pipeline for testing package installation from pypi
I'm going to go through my packages in the "to drop" list
I'll play with setting up a table in our documentation (hopefully searchable) where we can start listing/noting information on our packages.

How does that sound?

music commented 2 months ago

@music: did you write any scripts etc. for your thorough analysis, or was it manually for the moment?

It was all manual skimming of spec files and in some cases pyproject.toml or other source files, and manual testing of command-line tools with uvx. Much of the initial categorization could have perhaps been scripted, but there were enough different things I needed to consider that I really needed to actually look at the packages this time around.

gui1ty commented 2 months ago

I didn't mean to suggest that we are going to ship venvs. I am probably overthinking this. But in absence of any guidelines questions popped up in my head when I was going to setup a new venv for testing some stuff locally. I agree, trying to force everything into one venv is gonna give us headaches of the kind we experienced when packaging for RPM. Though, in some cases, closely related software may benefit from sharing a venv. Let's cross that bridge when we get there.

With regards to Python version, if we want to allow for mixing system packages with locally installed packages, we'll have to stick with the default Python release for any particular release. @music already alluded to that on Matrix with regards to python-steps.

My question regarding mizani remains unanswered, though.

I'll go through my packages and check on what qualifies for retirement. Some leaf packages of mine may have been in preparation for other packages. I'll have to jog my memory on that.

ankursinha commented 2 months ago

I've made a start on using tmt here:

https://pagure.io/neuro-sig/NeuroFedora/blob/feat-python-package-checks/f/python-package-usage-check

You should be able to test this out by installing tmt:

sudo dnf install tmt tmt+provision-container

and then running tmt run in the python-package-usage-check folder. There are tests for neuron and pyneuroml for the moment.

tmt is a little different from things like github actions. Here, one can't really define a matrix of environments to run all the tests in (or I haven't been able to figure out how to do it). It's more hierarchical and file/folder based, so a different folder is used for each python version etc.

I also haven't figured out if tests run (or can run) in parallel. Nothing in the docs about it. I've been asking in the matrix channel, but perhaps I'll also ask on the github discussion to make sure I'm doing this the right way.

I'm also going to see how this can be done using python/pytest with something like this virtualenv plugin for pytest in combination with parameterization:

https://pypi.org/project/pytest-virtualenv/
https://docs.pytest.org/en/7.1.x/example/parametrize.html

We know from experience that pytests does parameterisation + parallelisation very well.

Re: venvs:

Yeh, it' is unlikely that users will install all the packages into one giant venv. In most cases, people will only use a few packages---one for modelling, another for data analysis. Since upstream developers use similar pipelines, they do check that there are no conflicts when it comes to packages that are commonly used together. We do this for the neuroml stack, for example:

https://github.com/NeuroML/pyNeuroML/actions/runs/15591468055/job/43911369538

Re: steps

I see that they use scikit_build as the build-backend, so one should be able to go pip install . in a virtual environment, right? I.e., it should still be installable in a virtual environment? If that's the case, our job becomes to document the non-python system-wide packages required to build steps in that way (cmake etc.?). These non-python shared libraries/tools are still available in virtual environments, and presumably, the python deps will be pulled in from pypi?

Have I got that right?

Re: mizani

For me, the fact that it does not install cleanly on py3.14 is not a good enough reason to keep it as a system package on its own---because I can't see any of our users on py3.14 yet, even if it is the default for Fedora rawhide. If other reasons apply---provides CLI or a GUI---then it's worth keeping (but we leave that decision up to the primary maintainer).

I would be happy enough to have a note in our table in docs under the py3.14 column for mizani that says: "cannot currently be installed from pypi as scipy is not available as a wheel", and reporting this upstream so that they are aware. The assumption is that scipy will make a release for 3.14 once 3.14 is released: https://docs.scipy.org/doc/scipy/dev/toolchain.html#python-versions

This will serve the purpose of us making users and upstream aware that this currently does not install on the latest python version but doesn't require us to jump through the various hoops to maintain it as a system package.

What do you think?

ankursinha commented 2 months ago

Here is a pytest based checker:

https://pagure.io/neuro-sig/NeuroFedora/pull-request/581

Much simpler than the tmt bits. If you create a venv on a Fedora machine (for whatever python you want to use) and then install the requirements, running pytest -n auto -v should check for the example packages in the json file.

This can be tweaked to improve the workflow, but this prototype already works. (We should probably have separate files for each package perhaps, so that one file doesn't get super long etc.)

What do you think?

ankursinha commented 2 months ago

Initial PR for docs (more tweaks needed there to the text, but this starts by putting a searchable table in)

https://pagure.io/neuro-sig/documentation/pull-request/26

I've only used pip install ... for the moment, but we can modify this to whatever we wish

ankursinha commented a month ago

I'm going to merge the pytest PR into our repo now, and we can continue dropping packages as we see fit. Please remember to:

add packages that you drop to the test framework
add them to the docs (I've merged this PR here to start with the "reorganisation": https://pagure.io/neuro-sig/documentation/pull-request/26 )

ankursinha commented a month ago

Reports will be here:

https://ankursinha.fedorapeople.org/neurofedora/package-status/

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Normal

Milestone

None

neuro-sig / NeuroFedora

Source Code

#580 Limiting/reducing/rethinking our Python packaging/packages Opened 2 months ago by ankursinha. Modified a month ago

Close issue as:

How are we going to approach venvs?

What Python version to use for testing?

How are we going to approach venvs?

What Python version to use for testing?

Metadata

T: Community T: Software S: Needs review

#580 Limiting/reducing/rethinking our Python packaging/packages

Opened 2 months ago by ankursinha. Modified a month ago