#1226 backend: move initial createrepo check from dispatcher to worker
Merged 4 years ago by praiskup. Opened 4 years ago by praiskup.
Unknown source fix-1222  into  master

@@ -141,19 +141,6 @@

          worker.start()

          return worker

  

-     def assure_repodata_exist(self, job):

-         repodata = os.path.join(job.destdir, job.chroot, "repodata/repomd.xml")

-         if not os.path.exists(repodata) and job.chroot != "srpm-builds":

-             self.log.error("Executing 'copr-repo' tool for %s job, because copr_base repo is not available yet.", job)

-             if not call_copr_repo(os.path.join(job.destdir, job.chroot), timeout=60):

-                 job.status = BuildStatus.FAILURE

-                 build = job.to_dict()

-                 self.log.error("Build %s failed", build)

-                 data = {"builds": [build]}

-                 try:

-                     self.frontend_client.update(data)

-                 except:

-                     pass

      def run(self):

          """

          Executes build dispatching process.
@@ -199,8 +186,6 @@

                      self.log.debug("Skipped job %s, cached", job)

                      continue

  

-                 self.assure_repodata_exist(job)

- 

                  # ... and if the task is new to us,

                  # allocate new vm and run full build

                  try:

@@ -156,6 +156,32 @@

      #     job.status = BuildStatus.SKIPPED

      #     self._announce_end(job)

  

+     def wait_for_repo(self, job):

+         if job.chroot == 'srpm-builds':

+             # we don't need copr_base repodata for srpm builds

+             return True

+ 

+         repodata = os.path.join(job.chroot_dir, "repodata/repomd.xml")

+         waiting_from = time.time()

+         while time.time() - waiting_from < 60:

+             if os.path.exists(repodata):

+                 return True

+ 

+             # Either (a) the very first copr-repo run in this chroot dir

+             # is still running on background (or failed), or (b) we are

+             # hitting the race condition between

+             # 'rm -rf repodata && mv .repodata repodata' sequence that

+             # is done in createrepo_c.  Try again after some time.

+             self.log.info("waiting for copr_base repository")

+             time.sleep(2)

+ 

+         self.log.error("giving up waiting for copr_base repository")

+ 

+         # This should never happen, but if yes - we need to debug

+         # properly.  Give up waiting, and fail the build.  That should

+         # motivate people to report bugs.

+         return False

+ 

      def do_job(self, job):

          """

          Executes new job.
@@ -178,6 +204,8 @@

                  failed = True

  

          if not self.reattach:

+             if not self.wait_for_repo(job):

+                 failed = True

              self.prepare_result_directory(job)

  

          if not failed:

Previously we faced concurrency issues between post-build copr-repo runs
and these pre-build copr-repo runs. The pre-build copr-repo could hit
the small time window when 'repodata' don't exist when craterepo_c is
run (moving .repodata to repodata) and start running our own 'copr-repo'
process mistakenly. That would be a long-running process (full
createrepo_c run, without --recycle-pkglist), and more, would be blocked
by other concurrent copr-repo processes (so it would be eventually
killed by timeout -> the build would continue -> and it would fail
anyways - regardless of the actual mock results, issue #1222). This
race should be closed now.

Also, stop running copr-repo before build is started (again), and "only"
wait till the repodata become available (and fail if they won't, after
certain time period). We used to have problems with this approach
before -- but we did not fail the builds, but rather kept waiting
indefinitely. Neither approach is ideal, but with the current approach
we should better motivate people to report issues.

Also move the logic out from BuildDispatcher to Worker so we never block
other builds by those checks.

Fixes: #1222

rebased onto 972524fa04d6cb107e14c7ff59b50b0e9a52eda3

4 years ago

LGTM, the wait_for_repo(...) function looks much clearer than the previous code.

Thanks for the review.

rebased onto 43cefbe

4 years ago

Pull-Request has been merged by praiskup

4 years ago