#1128 getAverageBuildDuration returns None for containers
Opened 3 years ago by ktdreyer. Modified 3 hours ago

The getAverageBuildDuration RPC returns None for container packages that come from OSBS.

I suspect this is because the SQL query has AND build.task_id IS NOT NULL, and the OSBS content generator does not set a task_id for builds. It uses container_koji_task_id instead, as described at https://pagure.io/koji/issue/215

It seems like we could drop this task_id check from getAverageBuildDuration.


Metadata Update from @julian8628:
- Issue tagged with: discussion

3 years ago

The issue is not nearly as simple as this makes it out to be.

Koji does not have a built-in process for container builds. Such builds are imported through the content generator api (e.g. by the koji-containerbuild plugin). This does mean that they don't have a task id. However, other imports also lack a task id and should definitely be excluded from this test. We cannot simply drop this condition.

For container builds, the data is substantially different and we'll have to adjust this calculation. I'm not sure if it can remain a simple query like this.

For container builds, we have a "start" and "end" time from the content generator. Can't we trust those values in getAverageBuildDuration?

For container builds, or at least for container builds after a certain time, yes, we have the data. The problem is not in the same place as for rpm builds. I'm not sure we can handle this with a simple db query as we currently do.

So, I think that to do this, we'll need to refactor how this call works. That's probably a good thing, as I don't think we really want to look at kernel builds from 10 years ago to decide what the average build duration is.

Side note: can I ask what the actual end problem here is? As near as I can tell, the only places this call is used are:

  • in rpm buildArch tasks to adjust the task weight
  • in the web ui (taskinfo and buildinfo pages) to show an estimated completion time.

The first does not affect container builds and the second seems like a pretty minor deficiency.

I don't see the call used in the containerbuild plugin code. Is some other external tool relying on this call?

Sure, I use our chatbot plugin to estimate when a current (ongoing) build will complete. It works like this:

03:14 < ktdreyer> helgabot: current ceph build
03:14 < helgabot> ktdreyer, ceph-12.2.1-1.el7cp should finish building in
                  3 min 45 sec
                  https://cbs.centos.org/koji/buildinfo?buildID=20348

Unfortunately Ceph takes a long time to build. Even generating containers in OSBS can take 20-30 minutes. This helps us know at a glance whether a build is stuck. It also lets other teams know when they can expect a build to be ready for testing or delivery, etc.

It can use the same logic to estimate scratch build task completion times as well, since we can parse out the package's name from the build or buildContainer task parameters, and then feed that into getAverageBuildDuration.

Btw, is not it that as easy as dropping task id from there? So, even builds without task will return their durations. (Maybe additional check "where build.completion_time is not null"). I can't imagine what could be bougs data in that:

SELECT EXTRACT(epoch FROM AVG(build.completion_time - events.time))
  FROM build
  JOIN events ON build.create_event = events.id
  WHERE
    build.pkg_id = PACKAGE_ID
    AND build.state = koji.BUILD_STATES['COMPLETE']
    AND build.completion_time IS NOT NULL;

Metadata Update from @tkopecek:
- Custom field Size adjusted to None

2 years ago

Metadata Update from @tkopecek:
- Issue tagged with: scheduler

3 months ago

Simple workaround (before unifying CG task_id references) #3402

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.30

2 months ago

Metadata Update from @tkopecek:
- Issue untagged with: discussion

2 months ago

Metadata Update from @tkopecek:
- Issue tagged with: no_qe

10 days ago

Changes after PR 3402 was returning
<Fault 1: "<class 'psycopg2.ProgrammingError'>: argument formats can't be mixed">

PR 3457 fixes it.

Metadata Update from @jcupova:
- Issue status updated to: Open (was: Closed)

2 days ago

Metadata Update from @jcupova:
- Issue untagged with: no_qe
- Issue tagged with: testing-ready

2 days ago

Metadata Update from @jobrauer:
- Issue tagged with: testing-done

3 hours ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #3457 Last updated 3 hours ago
  • #3402 Merged 10 days ago