#1694 Ensure that a build can transition to done when all components are reused
Opened 3 years ago by mikem. Modified 3 years ago
mikem/fm-orchestrator all-components-reused  into  master

@@ -422,7 +422,11 @@ 

          build.transition(db_session, conf, state=models.BUILD_STATES["build"])

          db_session.add(build)

          db_session.commit()

-         return []

+         # Return a KojiRepoChange message so that the build can be transitioned to done

+         # in the repos handler

+         from module_build_service.scheduler.handlers.repos import done as repos_done_handler

+         events.scheduler.add(repos_done_handler, ("fake_msg", builder.module_build_tag["name"]))

+         return

  

      log.debug("Starting build batch 1")

      build.batch = 1

This change addresses an issue where builds can get stuck in the build state.
Without triggering component builds, there is nothing else to spur further action, hence the fake repo message.

Build 1b0eb97 FAILED!
Rebase or make new commits to rebuild.

Can you explain the reasoning behind this? I believe the way that it is supposed to work, is that

  • scheduler/reuse.py:attempt_to_reuse_all_components() calls builder.tag_artifacts()
  • KojiBuilder.tag_artifacts() calls Koji to tag the artifacts into the new tag
  • As messages are received back from Koji, scheduler/handlers/tags/tagged.py:tagged marks the components (which are already in the COMPLETE state) as tagged.
  • When all are tagged, the tagged() handler adds the fake msg

And this is the way it seems to work for modules with just a few components, and (as far as I know) modules in Fedora MBS. Problems only seem to pop up for modules in internal MBS with lots of components and lots of tag events flying around.

If this is a workaround, it would seem like a good idea to document it as such, and what it is working around.

With this patch could a build could be completed before all builds have been tagged? I know some internal Red Hat tests (mini-tps) that trigger on module build completion download module builds by getting all the packages in the module tag in Koji, rather than by looking up the NVR's in the uploaded modulemd - is there a race condition there?

Forgot to say - thanks for working on this! It's definitely a frustrating experience when builds get stuck this way.

@otaylor thanks for the analysis! In the case I was looking at, one of the components was listed as untagged in mbs (but was tagged in brew). I presume it was a missed message scenario. I thought that was a secondary issue at first, but I guess it was the cause.

My reasoning was that this stanza appeared to leave things in a similar state to the previous one, i.e. there are no components to be built, but in light of your analysis, I think I was missing some of what attempt_to_reuse_all_components does.

I'm going to re-evaluate this. Perhaps what is really needed is a producer task that compensates for missed tagging messages.

My reasoning was that this stanza appeared to leave things in a similar state to the previous one, i.e. there are no components to be built, but in light of your analysis, I think I was missing some of what attempt_to_reuse_all_components does.

If I didn't know that it did (sometimes) work, I don't think I would have figured out how it works :-) It's confusing that things work entirely differently in the code path from batches.py and there isn't a clear reason for the difference - the "repos_done_handler" usage there may also be a workaround for dropped messages, or maybe it's actually needed in that case.

Metadata