#11821 rhel7 - sundries servers
Closed: Fixed with Explanation 17 days ago by kevin. Opened 4 months ago by zlopez.

Describe what you would like us to do:


RHEL 7 EOL is approaching (June 2024) and we still have some servers running on RHEL7 machines.

sundries01.iad2.fedoraproject.org, sundries01.stg.iad2.fedoraproject.org sundries02.iad2.fedoraproject.org are one of these servers and we need to do something about.

When do you need this to be done by? (YYYY/MM/DD)


June 2024


Metadata Update from @kevin:
- Issue assigned to kevin

4 months ago

I plan to look at these very soon. (Help welcome!)

We need to see all the things they are still doing and decide if we should just move them to rhel9, or want to try and work things so they can just be dropped.

I know they are used to sync a lot of things to proxies. They are also used I think for geoip requests.

So, I think a good step here might be to create a sundries02.stg that is rhel9 and work through deploying it and see what breaks/needs adjustment.

We can then look at it to test if things work

Then we can redeploy sundries01.stg as rhel9, make sure all the things that use that main node work.

Then we can do production.

Metadata Update from @zlopez:
- Issue assigned to zlopez (was: kevin)

3 months ago

I will start working on that.

Running the playbook on sundries02.stg, it failed on zanata role (missing package). Do we still need zanata? I thought that it's dead for some time.

I also noticed that geoip-city-wsgi/app role is limited to RHEL7. Not sure if we need it.

Yeah, I don't think we use zanata anymore for anything. we use weblate now...

On geoip, I thought it was still used by anaconda to figure out timezone/locations, but that could be long since no longer the case.
We will need to investigate.

Thanks @kevin I will ask anaconda about that.

I got confirmed that they are still using it, let's see if the role will work on RHEL 9.

So for the geoip-city-wsgi/app role only missing package for EPEL9 is python-iso3166. So I created a EPEL9 request and will wait for maintainer to respond.

Just to test it out I tried to build the rawhide spec in EPEL9 and it works without issue.

Currently waiting for EPEL9 package update to be pushed to stable.

After python-iso3166 was pushed to EPEL9 the geoip-city-wsgi/app role was being deployed without issue.

However I got stuck on translate-toolkit package for fedora-docs role. I tried to build it in COPR, but I ended up with tests failing on python-scikit-build. Build log could be found here

I don't currently have much time to work on this, so if anybody want to take over, feel free to do it.

The logs seem to be missing now... perhaps the builds were too long ago? Can you refire it?

@misc you added this package with a commit of "for pocount". Is this really needed? Or could we just move forward without it for now and circle back to get it later?

@kevin Strange the log is still in place for me, here is the direct link.

So I had some time today to continue with the copr project and I was able to get over python-scikit-build by skipping the fortran compile test.

Now I'm stuck on python-rapidfuzz which complains about

error: Empty %files file /builddir/build/BUILD/rapidfuzz-3.5.2/debugsourcefiles.list

Not sure why the file is empty.

Looking into the build machine I can see that debugfiles.list is filed in, but the debugsourcefiles.list is really empty :-/

This is very likely that it's not building with -g (debug symbols). I see it also failing to extract debuginfo...

@kevin After some digging I found out that the CXXFLAGS were missing completely. After adding them I was able to built it.

I was able to build translate-toolkit in COPR for EPEL 9. Here are all the packages that need to be rebuilt in epel9-infra tag to get it running. I needed to fork some of them and do small changes to get them working on EPEL 9.

I will start building them in epel9-infra tag next week as I will be on DevConf.CZ for rest of this week. Hopefully this will be the last obstacle for deployment of sundries playbook on RHEL9.

After starting building packages in epel9-infra I got stuck on python-rapidfuzz as this has build requirement python3-pandas or python3(x86-32), this was satisfied on COPR as it's including python3.i686 from CodeReady builders repository, but in Fedora koji it's failing. I tried to build python3-pandas, but it has about 20 missing dependencies on EPEL9.

I also tried to just comment out the requirements, but it introduces plenty of failed tests (see https://kojipkgs.fedoraproject.org/work/tasks/473/119330473/build.log).

So I'm not sure if we can just get rid of the translate-toolkit dependency from the playbook or I should try to build the python3-pandas in COPR and then rebuild in epel9-infra tag.

So, I think that requires was added when python-pandas dropped i686, according to the comment I guess to keep python-rapidfuzz avaiable on i686?

@churchyard can you recall whats going on there?

This has turned into quite the rabbit hole. ;(

python3-pandas or python3(x86-32) is actually a "clever" way to get pandas on anything but i686 but end up with the dependency available in the repo for repoquery. But it has downsides -- when multilib repo is enabled (e.g. in local mock or Copr), it fetches 32bit Python.

Anyway, pandas is optional. Drop the line if you don't have it.

I also tried to just comment out the requirements, but it introduces plenty of failed tests (see https://kojipkgs.fedoraproject.org/work/tasks/473/119330473/build.log).

The failures are caused by this relevant line in the build.log:

  ==========================================================================
  WARNING: The C extension could not be compiled, speedups are not enabled.
  Plain-Python build succeeded.
  ==========================================================================

Which itself is caused by:

  CMake Error at /usr/share/cmake/Modules/CMakeTestCXXCompiler.cmake:60 (message):
    The C++ compiler
      "/usr/bin/c++"
    is not able to compile a simple test program.
    It fails with the following output:
      Change Dir: /builddir/build/BUILD/rapidfuzz-3.5.2/.pyproject-builddir/pip-req-build-_szb_tpu/_cmake_test_compile/build/CMakeFiles/CMakeScratch/TryCompile-UyMMRP
      Run Build Command(s):/usr/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_11f0b/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_11f0b.dir/build.make CMakeFiles/cmTC_11f0b.dir/build
      gmake[1]: Entering directory '/builddir/build/BUILD/rapidfuzz-3.5.2/.pyproject-builddir/pip-req-build-_szb_tpu/_cmake_test_compile/build/CMakeFiles/CMakeScratch/TryCompile-UyMMRP'
      Building CXX object CMakeFiles/cmTC_11f0b.dir/testCXXCompiler.cxx.o
      /usr/bin/c++   -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection  -o CMakeFiles/cmTC_11f0b.dir/testCXXCompiler.cxx.o -c /builddir/build/BUILD/rapidfuzz-3.5.2/.pyproject-builddir/pip-req-build-_szb_tpu/_cmake_test_compile/build/CMakeFiles/CMakeScratch/TryCompile-UyMMRP/testCXXCompiler.cxx
      Assembler messages:
      Error: unknown architecture `x86-64-v2'
      Error: unrecognized option -march=x86-64-v2
      cc1plus: error: unrecognized command-line option ‘-m64’
      cc1plus: error: unknown value ‘x86-64-v2’ for ‘-march’
      cc1plus: note: valid arguments are: armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a armv8.5-a armv8.6-a armv8-r native
      cc1plus: error: ‘-fcf-protection=full’ is not supported for this target
      gmake[1]: *** [CMakeFiles/cmTC_11f0b.dir/build.make:78: CMakeFiles/cmTC_11f0b.dir/testCXXCompiler.cxx.o] Error 1
      gmake[1]: Leaving directory '/builddir/build/BUILD/rapidfuzz-3.5.2/.pyproject-builddir/pip-req-build-_szb_tpu/_cmake_test_compile/build/CMakeFiles/CMakeScratch/TryCompile-UyMMRP'
      gmake: *** [Makefile:127: cmTC_11f0b/fast] Error 2

Which itself is probably caused by passing x86-specific options to CFLAGS on a noarch package built on aarch64.

Which is caused by having this in the spec:

%build
# Set the CXXFLAGS, those are not set on EPEL9
CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
export CXXFLAGS

%pyproject_wheel

Naughty, naughty...

Use a macro instead. %set_build_flags should be available in RHEL 9.

BTW the fact that %pyproject_wheel sets CFLAGS only and not CXXFLAGS is probably a bug, it only affects RHEL < 10 because Fedora has https://fedoraproject.org/wiki/Changes/SetBuildFlagsBuildCheck

The macro sets CFLAGS and LDFLAGS in a way that was copied from %py3_build which does it in a way that was copied from actual specfiles in the 2000s. If you open an RFE to pass CXXFLAGS as well, it will likely be fixed.

@churchyard Thanks for the help, I saw the issue with CXXFLAGS on COPR, so I assumed that is because EPEL9 build, so it could be COPR specific. Could you point me where to open RFE for that?

I will use %set_build_flags macro for now.

I think I found the naughty hack somewhere in packager docs, but I'm not sure and can't find it now.

You can either open a Fedora bugzilla for pyproject-rpm-macros saying that CXXFLAGS is not set when _auto_set_build_flags is disabled (which is the non-default option in Fedora, but it is still possible) or open a RHEL 9 Jira saying CXXFLAGS is not set by default. Either way, this will be fixed in Fedora first, RHEL 9 later.

I built the translate-toolkit in epel9-infra tag, but I'm unable to install it on sundries02.stg server.

When trying to do so, I will get:

[root@sundries02 ~][STG]$ dnf install translate-toolkit
Last metadata expiration check: 7:24:40 ago on Fri 21 Jun 2024 06:52:16 AM UTC.
No match for argument: translate-toolkit
Error: Unable to find a match: translate-toolkit

But when trying to search for it, it's available:

[zlopez@sundries02 ~][STG]$ dnf search translate-toolkit
Last metadata expiration check: 0:07:21 ago on Fri 21 Jun 2024 02:08:01 PM UTC.
================================================================== Name Exactly Matched: translate-toolkit ==================================================================
translate-toolkit.noarch : Tools to assist with translation and software localization
================================================================= Name & Summary Matched: translate-toolkit =================================================================
translate-toolkit-docs.noarch : Documentation for translate-toolkit

I'm kind of puzzled what is happening here. Will look into that later.

Yeah, user and root caches for dnf are different. Your user one has it, but the root one doesn't.

I usually use --refresh with installs like this to force it to redownload metadata.

[root@sundries02 ~][STG]# dnf --refresh install translate-toolkit
Extras Packages for Enterprise Linux 9 - x86_64                87 kB/s | 4.3 kB     00:00    
Fedora Infrastructure tag 9 - x86_64                           49 kB/s | 3.0 kB     00:00    
Fedora Infrastructure tag 9 - x86_64                           53 kB/s | 3.0 kB     00:00    
Fedora Infrastructure tag 9 - x86_64                          234 kB/s |  41 kB     00:00    
rhel9 baseos dvd                                               57 kB/s | 2.7 kB     00:00    
rhel9 AppStream dvd                                            60 kB/s | 2.8 kB     00:00    
rhel9 BaseOS x86_64                                            89 kB/s | 4.1 kB     00:00    
rhel9 AppStream x86_64                                         98 kB/s | 4.5 kB     00:00    
rhel9 CodeReadyBuilder x86_64                                  99 kB/s | 4.5 kB     00:00    
Dependencies resolved.
==============================================================================================
 Package                      Arch     Version                Repository                 Size
==============================================================================================
Installing:
 translate-toolkit            noarch   3.12.2-2.el9           infrastructure-tags-stg   954 k
Installing dependencies:
 enchant                      x86_64   1:1.6.0-30.el9         rhel9-dvd-AppStream        66 k
 hunspell                     x86_64   1.7.0-11.el9           rhel9-dvd-AppStream       329 k
 hunspell-en-US               noarch   0.20140811.1-20.el9    rhel9-dvd-AppStream       178 k
 hunspell-filesystem          x86_64   1.7.0-11.el9           rhel9-dvd-AppStream       9.0 k
 iso-codes                    noarch   4.16.0-3.el9           infrastructure-tags-stg   3.4 M
 python3-Levenshtein          x86_64   0.23.0-5.el9           infrastructure-tags-stg   140 k
 python3-aeidon               noarch   1.15-1.el9             epel                      187 k
 python3-charset-normalizer   noarch   2.0.10-1.el9           epel                       72 k
 python3-cheroot              noarch   8.6.0-4.el9            epel                      172 k
 python3-enchant              noarch   3.2.0-5.el9            rhel9-dvd-AppStream        90 k
 python3-jaraco               noarch   8.2.1-3.el9            epel                       11 k
 python3-jaraco-functools     noarch   3.5.0-2.el9            epel                       19 k
 python3-more-itertools       noarch   8.12.0-2.el9           epel                       79 k
 python3-phply                noarch   1.2.5-9.el9            infrastructure-tags-stg   115 k
 python3-pycountry            noarch   23.12.7-2.el9          infrastructure-tags-stg    35 k
 python3-rapidfuzz            x86_64   3.5.2-9.el9            infrastructure-tags-stg   2.1 M
 python3-ruamel-yaml          x86_64   0.16.6-7.el9.1         rhel9-AppStream           213 k
 python3-ruamel-yaml-clib     x86_64   0.2.7-3.el9            rhel9-AppStream           148 k
 python3-simplejson           x86_64   3.17.6-1.el9           epel                      264 k
 python3-toml                 noarch   0.10.2-6.el9           rhel9-dvd-AppStream        46 k
 python3-vobject              noarch   0.9.6.1-5.el9          epel                       83 k
 xml-common                   noarch   0.6.3-58.el9           rhel9-dvd-AppStream        36 k
Installing weak dependencies:
 hunspell-en                  noarch   0.20140811.1-20.el9    rhel9-dvd-AppStream       191 k
 hunspell-en-GB               noarch   0.20140811.1-20.el9    rhel9-dvd-AppStream       226 k

Transaction Summary
==============================================================================================
Install  25 Packages

Total download size: 9.1 M
Installed size: 42 M
Is this ok [y/N]: n

So it's OK that playbook is failing on first run?

So after that the next error happened on [nfs/client] role:

"Error mounting /srv/docs: mount.nfs: access denied by server while mounting ntap-iad2-c
02-fedora01-nfs01a:/openshift_stg_docs\n"

@kevin Does this need some special permissions to be mounted?

Yes, it needs to be in the netapp export-policy for that volume.

I added it and ran the playbook and now all the nfs mounts are there.

@kevin That is awesome, that means that I can finally get to testing on sundries instead of just figuring out the issues during deployments :-)

Last think that was missing was reg package for reg-server playbook. Now I was able to run the whole playbook without error. The next thing will be to check the logs for any error.

I have now reinstalled all of them with rhel9.

We will see what if anything is broken... but all nagios checks are fine, and I don't see anything obviously broken yet.

Thanks for all the packaging work on it!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

17 days ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog