#7345 Update the mass-rebuild SOP
Merged 5 years ago by mohanboddu. Opened 6 years ago by kellin.
kellin/releng update-mass-rebuild-sop  into  master

file modified
+137 -197
@@ -20,88 +20,118 @@ 

  This also assumes that the mass rebuild does not need to be done in dependency

  order, and that the mass rebuild does not involve a ABI change.

  

- Set up a web page for maintainers & send notice about rebuild

- =============================================================

+ Considerations

+ ==============

  

- Firstly, describe the mass rebuild for maintainers; why it's being done, and

- how they can opt out in a wiki page. See `the Fedora 26 example`_.

+ * The most important thing to keep in mind while doing a mass rebuild is to

+   communicate clearly what actions are being performed and the status of the

+   rebuild.

+ * Check in on scripts frequently to avoid a long stalled command from adding

+   significant delays in completing the rebuild.

+ * Check with secondary arches, whether they up-to-date enough with primary,

+   create rebuild tag and target when they are. It will then take care of

+   rebuilds of the arch specific packages in appropriate kojis.

  

- == Update releng scripts ==

+ Actions

+ =======

  

- Release engineering scripts for mass rebuilds live in the `releng git

- repository`_. You need to edit the following scripts:

+ Preparatory Steps

+ -----------------

+ The following steps may be completed in the weeks leading up to the

+ scheduled mass rebuild.

  

- * mass-rebuild.py

- * find-failures.py

- * mass-tag.py

- * need-rebuild.py

+ #. Create the Mass Rebuild Pagure Issue

  

- Change the following items:

+     Create an issue on the `Release Engineering issues page`_ that

+     points at the schedule for the current release.

  

- * the build tag, holding tag, and target tag should be updated to reflect the

-   Fedora release you're building for

- * the ``epoch`` tag should be updated to the point at which all features that

-   the mass rebuild is for have landed in the build system (and a newRepo task

-   completed with those features)

- * the comment which is inserted into spec changelogs

+     See `the Fedora 27 mass rebuild issue example`_.

+    

+ #. Set up the Mass Rebuild Wiki Page

+ 

+     The mass rebuild wiki page should answer the following questions for

+     maintainers:

  

- == Create the rebuild holding tag ==

+     * Why the mass rebuild is happening

+     * How to opt out of the mass rebuild

  

- The ``add-tag`` command is used for creating the rebuild holding tag.

+     .. note::

+    

+         See `the Fedora 26 Wiki example`_.

  

- ::

+ #. Send out the Mass Rebuild Notice

  

-     $ koji add-tag --help

-     Usage: koji add-tag [options] name

-     (Specify the --help global option for a list of other help options)

+     Send out the same information posted on the wiki to the

+     `devel-announce@lists.fedoraproject.org` mailing list.

  

-     Options:

-       -h, --help       show this help message and exit

-       --parent=PARENT  Specify parent

-       --arches=ARCHES  Specify arches

+     .. note::

  

+          See `the Fedora 26 e-mail example`_.

  

- The options let you specify a parent for the holding tag.

+ #. Create a Tag to Contain the Mass Rebuild

  

- For example, if we wanted to create a rebuild holding tag for Fedora 26

- development we would issue:

+     Mass rebuilds require their own tag to contain all related builds. The

+     example assumes we are doing a rebuild for Fedora 26.

  

- ::

+     ::

  

-     koji add-tag f26-rebuild --parent f26

+         $ koji add-tag f26-rebuild --parent f26

  

- .. note::

- Please ask someone from infra to enable autosigning for newly created

- ``f26-rebuild`` tag.

+ #. Request Package Auto-Signing for New Mass-Rebuild Tag

  

- Create the rebuild target

- =========================

+     File a ticket with `Fedora Infrastructure`_ requesting the new

+     mass-rebuild tag be enabled for package auto-signing.

  

- The ``add-target`` command is used for creating the rebuild target.

+ #. Create the Koji Target for the Mass Rebuild

  

- ::

+     Using the same `f26-rebuild` tag created in the previous example:

  

-     $ koji add-target --help

-     Usage: koji add-target name build-tag <dest-tag>

-     (Specify the --help global option for a list of other help options)

+     ::

  

-     Options:

-       -h, --help  show this help message and exit

+         $ koji add-target f26-rebuild f26-build

  

- The arguments define the name of the target, the build-tag to use, and what

- tag to apply to builds as they complete.  To continue our example, the

- following command would add the target for the Fedora 26 mass rebuild:

+     .. note::

  

- ::

+         **koji add-target** *target-name* *buildroot-tag* *destination-tag*

+         describes the syntax format above. If the *destination-tag* is not

+         specified then it will be the same as the *target-name*.

  

-     koji add-target f26-rebuild f26-build

  

- When the dest-tag is not specified, it is assumed that the dest-tag is the

- same as the name of the target, in this case ``f26-rebuild``.

+ #. Update Scripts

  

- Building the packages

- =====================

+     The mass rebuild depends on four main scripts from the

+     `releng git repository`_. Each one requires some changes in variables

+     for each new mass rebuild cycle.

  

+     * mass-rebuild.py

+         * buildtag

+         * targets

+         * epoch

+         * comment

+         * target

+     * find-failures.py

+         * buildtag

+         * desttag

+         * epoch

+     * mass-tag.py

+     * need-rebuild.py

+         * buildtag

+         * target

+         * updates

+         * epoch

+ 

+ Change the following items:

+ 

+ * the build tag, holding tag, and target tag should be updated to reflect the

+   Fedora release you're building for

+ * the ``epoch`` should be updated to the point at which all features that

+   the mass rebuild is for have landed in the build system (and a newRepo task

+   completed with those features)

+ * the comment which is inserted into spec changelogs

+ 

+ 

+ Starting the Mass Rebuild

+ -------------------------

  The ``mass-rebuild.py`` script takes care of:

  

  * Discovering available packages in koji
@@ -112,148 +142,72 @@ 

  * git tagging the change

  * Submitting the build request to Koji

  

- The requirements for the script are as follows:

  

- * Ran as a user with a proper koji cert setup

- * Ran as a user with commit access to all packages

- * Ran as a user with a valid ssh agent for git actions

- * Ran on a system with a reliable network connection

+ #. Connect to the mass-rebuild Machine

  

- .. note::

- In Fedora Infra, the user is ``mass-rebuild``

+     ::

  

- The script has error checking at every step of the way and will gracefully

- recover and continue on with the next package.  It does the rebuilds in an

- alphanumerical order (provided by python sorted()) by source package name, and

- it does a complete checkout, bump, commit, tag, and build one package at a

- time. The current bottleneck when mass rebuilding is the git server, but

- generally 4 packages per minute can be processed.

+         $ ssh branched-composer.phx2.fedoraproject.org

  

- The script isn't very resource intensive, once it has discovered the available

- packages and trimmed out the things which have already been rebuilt.  Those

- tasks require a fair amount of cpu time to process the XML data returned by

- koji. Once the script has moved on to the git, bump, tag, build phase the

- resource usage is light, mostly network to do the git checkouts.

  

- Tips

- ----

+ #. Start a terminal multiplexer

  

- The script logs everything to stderr and stdout, so it is generally a good idea

- to redirect and capture the output to a log file, with something like

- ``2>&1 | tee massbuild.out``.

+     ::

  

- Running mass-rebuild.py

- -----------------------

+         $ tmux

  

- * ssh into branched-composer.phx2.fedoraproject.org

- * Change to mass-rebuild user

- * Clone `releng repo`_

- * cd to releng/scripts/

- * ./mass-rebuild.py 2>&1 | tee massbuild.out

+ #. Clone or checkout the latest copy of the `releng git repository`_.

  

- Track the failures

- ------------------

+ #. Run the mass-rebuild.py script from *releng/scripts*

  

- Failures can happen at any stage.  Missing git module, no spec file to bump,

- malformed spec file causing the bump script to exit, git commit failures,

- tagging failures, and even koji outages.  Finally the build itself may fail.

+     ::

  

- The most common failures are build failures, and there is a script to deal

- with those (``find-failures.py``)  More on that later.

+         $ cd path/to/releng_repo/scripts

+         $ ./mass-rebuild.py 2>&1 | tee ~/massbuild.out

  

- Outside of build failures, the rest of the failures happen leading up to the

- submission of the build, and can be tracked via the mass-rebuild script output.

- Any error that the script detects will be output to stderr and will contain the

- "failed" keyword.  Searching the output can find these failures, which will

- look like:

- 

- ::

- 

-     GMT failed tag: Command '['make', 'tag']' returned non-zero exit status -9

- 

-     PyOpenGL failed checkout: Command '['git', '-d', ':ext:jkeating@git.fedoraproject.org:/git/pkgs', 'co', 'PyOpenGL']' returned non-zero exit status -9

- 

-     R-BSgenome.Celegans.UCSC.ce2 failed spec check

- 

-     eggdbus failed commit: Command '['git', 'commit', '-m', '- Rebuilt for https://fedoraproject.org/wiki/Fedora_12_Mass_Rebuild']' returned non-zero exit status 1

- 

-     gupnp-ui failed bumpspec: Command '['rpmdev-bumpspec', '-u', 'Fedora Release Engineering <rel-eng@lists.fedoraproject.org>', '-c', '- Rebuilt for https://fedoraproject.org/wiki/Fedora_12_Mass_Rebuild', '/home/jkeating/massbuild/gupnp-ui/devel/gupnp-ui.spec']' returned non-zero exit status 1

- 

- 

- Because stderr flushes immediately it may be hard to find the stdout that

- matches the error.  However just repeating the command can often enough show

- you what is going on.  Here is a list of common issues and the typical solution:

- 

- * checkout failure: Module may not have been added to git yet, skip it.

- * spec check: Module may have been retired but not blocked from koji.  Verify

-   and block it.

- * bumpspec failed: Bump, commit, tag, build manually.  Optionally fix the spec

-   so that bumspec works in the future.

- * commit failed: Module may have been changed or git / ssh outage.  Repeat

-   manually

- * git tag failed: Most often this is due to NVR collisions with other branches

-   or previous builds.  Re-bump and commit/tag/build manually.

- * build submission failed: usually due to a koji or local network outage.

-   Re-submit the build manually.

- 

- In all cases of fixing failures, verify that no newer build has been done in

- the mean time.

+ Monitoring Mass Rebuilds

+ ------------------------

+ The community has a very high interest in the status of rebuilds and many

+ maintainers will want to know if their build failed right away. The

+ ``find-failures.py`` and ``need-rebuild.py`` scripts are designed to update

+ publicly available URLs for stakeholders to monitor.

  

- find-failures.py

- ----------------

+ #. Connect to a Compose Machine

  

- This script will discover attempted builds that have failed, and then generate

- an html file that lists the failed builds (as a link to the build failure) and

- sorts them by package owner.  It requires koji installed on the host it runs on.

+     ::

  

- As the build logs expire, this script is only useful for the first few weeks

- after the mass rebuild attempt.

+         $ ssh compose-x86-02.phx2.fedoraproject.org

  

- This script should be setup to run often and the output put somewhere public.

- This can be tricky if you are running it and uploading the output via ssh as

- you will need either an active ssh agent or an open shared socket.  The script

- is somewhat resource intensive as it processes a lot of XML from koji.

- Updating once an hour is reasonable.

+ #. Start a terminal multiplexer

  

- Running find-failures.py

- ------------------------

+     ::

  

- * ssh into compose-x86-01.phx2.fedoraproject.org

- * Clone `releng git repository`_

- * cd to releng/scripts/

- * while true; do ./need-rebuild.py > f26-need-rebuild.html && cp f26-need-rebuild.html /mnt/koji/mass-rebuild/f26-need-rebuild.html; sleep 600; done

+         $ tmux

  

- .. note::

- Make sure you run this in screen or tmux

+ #. Clone or checkout the latest copy of the `releng git repository`_

  

- need-rebuild.py

- ---------------

+ #. Set Up the Rebuild Failures Notification Web Site

+     The ``find_failures.py`` script discovers attempted builds that have

+     failed. It lists those failed builds and sorts them by package owner.

  

- This script will discover packages that have a need to be rebuilt and haven't

- been yet.  It will then generate an html file that lists the packages (as a

- link to the package page in koji) and sorts them by package owner.  It requires

- koji installed on the host it runs on.

+     ::

  

- This script should be setup to run often and the output put somewhere public.

- This can be tricky if you are running it and uploading the output via ssh as

- you will need either an active ssh agent or an open shared socket.  The script

- is somewhat resource intensive as it processes a lot of XML from koji.

- Updating once an hour is reasonable.

+         $ while true; do ./find_failures.py > f26-failures.html && cp f26-failures.html /mnt/koji/mass-rebuild/f26-failures.html; sleep 600; done

  

- Running need-rebuild.py

- ------------------------

+ #. Start a second pane in the terminal emulator

  

- * ssh into compose-x86-01.phx2.fedoraproject.org

- * Clone `releng git repository`_

- * cd to releng/scripts/

- * while true; do ./find_failures.py > f26-failures.html && cp f26-failures.html /mnt/koji/mass-rebuild/f26-failures.html; sleep 600; done

+ #. Set up the Site for Packages that Need Rebuilt

+     The ``need-rebuild.py`` script discovers packages that have not yet been

+     rebuilt and generates an html file listing them sorted by package owner.

+     This gives external stakeholders a rough idea of how much work is

+     remaining in the mass rebuild.

  

- .. note::

- Run this in another screen or tmux session from find-failues.py

+     ::

  

- Tag the builds

- ==============

+         $ while true; do ./need-rebuild.py > f26-need-rebuild.html && cp f26-need-rebuild.html /mnt/koji/mass-rebuild/f26-need-rebuild.html; sleep 600; done

  

+ Post Mass Rebuild Tasks

+ -----------------------

  Once the mass rebuild script completes, and all the pending builds have

  finished, the builds will need to be tagged.  The ``mass-tag.py`` script will

  accomplish this task.  The script will:
@@ -262,38 +216,24 @@ 

  * Trim out builds that are older than the latest build for a given package

  * Tag remaining builds into their final destination (without generating email)

  

- The script is fairly fast.  The longest time is taken processing the XML from

- koji to discover the builds and weed out builds that are not the latest.  The

- final tag action is very quick.  Output will go to stdout and should be saved

- for later review.

- 

- Running mass-tag.py

- -------------------

- 

- * Clone `releng git repository`_

- * cd to releng/scripts/

- * ./mass-tag.py --source f26-rebuild --target f26-pending

+ #. Clone or checkout the latest copy of the `releng git repository`_

  

- Consider Before Running

- =======================

- 

- * The most important thing to keep in mind while doing a mass rebuild is to

-   communicate clearly what actions are being performed and the status of the

-   rebuild.

- * Check in on scripts frequently to avoid a long stalled command from adding

-   significant delays in completing the rebuild.

- * Check with secondary arches, whether they up-to-date enough with primary,

-   create rebuild tag and target when they are. It will then take care of

-   rebuilds of the arch specific packages in appropriate kojis.

+ #. Run the ``mass-tag.py`` script (requires koji kerberos authentication)

  

- Email

- -----

+     ::

  

- Once the mass rebuild is done, send an email to ``devel-announce@lists.fedoraproject.org``

+         $ cd path/to/releng_repo/scripts

+         $ ./mass-tag.py --source f26-rebuild --target-f26-pending

  

- `Email Example`_

+ #. Send the final notification to the

+    *devel-announce@lists.fedoraproject.org* list

  

+     The contents should look something like this `example email`_.

  

- .. _the Fedora 26 example: https://fedoraproject.org/wiki/Fedora_26_Mass_Rebuild

+ .. _the Fedora 26 Wiki example: https://fedoraproject.org/wiki/Fedora_26_Mass_Rebuild

+ .. _the Fedora 26 e-mail example: https://lists.fedoraproject.org/archives/list/devel-announce@lists.fedoraproject.org/message/QAMEEWUG7ND5E7LQYXQSQLRUDQPSBINA/

  .. _releng git repository: https://pagure.io/releng

- .. _Email Example: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/QAMEEWUG7ND5E7LQYXQSQLRUDQPSBINA/

+ .. _Release Engineering issues page: https://pagure.io/releng/issues

+ .. _example email: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/QAMEEWUG7ND5E7LQYXQSQLRUDQPSBINA/

+ .. _Fedora Infrastructure: https://pagure.io/fedora-infrastructure/issues

+ .. _the Fedora 27 mass rebuild issue example: https://pagure.io/releng/issue/6898

  • adjust for changes to permissions required
  • make steps more clear for newcomers
  • make step-by-step process more explicit

Signed-off-by: Robert Marshall rmarshall@redhat.com

rebased onto 6297e4a95de183ed53d016f779baf670ded1b3d2

6 years ago

rebased onto 0e8db86f41437e53f6a189c736f99833d9291310

6 years ago

rebased onto 07fb93ef646e69e9465117a4c0819b7ae5ba3f72

5 years ago

@puiterwijk - would you please review for the places we changed?

@mohanboddu - would you please take a first pass look at this for review?

rebased onto 2f19d7a30406e76bc62ad36016bdb5ff8491602f

5 years ago

@mohanboddu made the updates you requested.

rebased onto 24953cedb58a5cf379f6e88e9bd5f8adfee4a44d

5 years ago

rebased onto 630d60b360c597cd25ad6ccd3509bb6a345ab167

5 years ago

rebased again to keep it green.

@puiterwijk - have you had a chance to review per our discussion last week in the releng meeting?

I would like to keep this statement, so that it will help new people to understand where the builds will be going since its not mentioned anywhere.

rebased onto bb0ad8e

5 years ago

Commit 9192f14 fixes this pull-request

Pull-Request has been merged by mohanboddu

5 years ago

Pull-Request has been merged by mohanboddu

5 years ago
Metadata