#10637 Stuck `fedpkg modul-build`
Closed: Fixed 2 years ago by kevin. Opened 2 years ago by sbergmann.

$ fedpkg clone flatpaks/libreoffice
$ cd libreoffice
$ fedpkg module-build
Submitting the module build...
The build 14242 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14242
$ fedpkg module-build-watch 14242
[Build #14242] libreoffice-stable-3520220412074628-b9d1b5f4 is in "build" state.
  Koji tag: module-libreoffice-stable-3520220412074628-b9d1b5f4
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14242
  Components: [0%]: 1 in building, 0 done, 0 failed
    Building:
      - module-build-macros
        https://koji.fedoraproject.org/koji/taskinfo?taskID=85548062

got stuck and didn't proceed for ~a day (https://koji.fedoraproject.org/koji/taskinfo?taskID=85548062 says "GenericError: Build already in progress (task 85548060)"), and a fresh attempt

$ fedpkg module-build-cancel 14242
$ git commit --allow-empty -m 'Rebuild (in an attempt to make Koji happy)'
$ git push
$ fedpkg module-build
Submitting the module build...
The build 14266 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14266
$ fedpkg module-build-watch 14266
[Build #14266] libreoffice-stable-3520220413084014-b9d1b5f4 is in "build" state.
  Koji tag: module-libreoffice-stable-3520220413084014-b9d1b5f4
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14266
  Components: [0%]: 1 in building, 0 done, 0 failed
    Building:
      - module-build-macros
        https://koji.fedoraproject.org/koji/taskinfo?taskID=85602073

got stuck again in the same way (https://koji.fedoraproject.org/koji/taskinfo?taskID=85602073 also says "GenericError: Build already in progress (task 85602070)").


Metadata Update from @mohanboddu:
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

2 years ago

I have never built a flatpak, but can you try with fedpkg flatpak-build, may be there is something difference in how build is called.

Please also try again, we got builds processing again...

Please also try again, we got builds processing again...

Still does not work. A rebuild attempt still hung for several hours until I cancelled it:

$ fedpkg module-build-watch 14266
[Build #14266] libreoffice-stable-3520220413084014-b9d1b5f4 is in "failed" state.
  Koji tag: module-libreoffice-stable-3520220413084014-b9d1b5f4
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14266
  Components: 0 done, 196 failed
  Reason: Failed to build artifact module-build-macros in Koji
$ fedpkg module-build
Submitting the module build...
The build 14266 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14266
$ fedpkg module-build-watch 14266
[Build #14266] libreoffice-stable-3520220413084014-b9d1b5f4 is in "build" state.
  Koji tag: module-libreoffice-stable-3520220413084014-b9d1b5f4
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14266
  Components: [0%]: 0 in building, 1 done, 0 failed
    No building task.
# hangs
^C
$ fedpkg module-build-cancel 14266
Cancelling module build #14266...
The module build #14266 was cancelled

And then a fresh full rebuild failed in a slightly different way:

$ git commit --allow-empty -m 'Rebuild (in a 2nd attempt to make Koji happy)'
$ git push
$ fedpkg module-build
Submitting the module build...
The build 14303 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14303
$ fedpkg module-build-watch 14303
[Build #14303] libreoffice-stable-3520220414111606-b9d1b5f4 is in "init" state.
  Koji tag: None
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14303
[Build #14303] libreoffice-stable-3520220414111606-b9d1b5f4 is in "wait" state.
  Components: 195
[Build #14303] libreoffice-stable-3520220414111606-b9d1b5f4 is in "build" state.
  Components: [0%]: 1 in building, 0 done, 0 failed
    Building:
      - module-build-macros
        https://koji.fedoraproject.org/koji/taskinfo?taskID=85662211
[Build #14303] libreoffice-stable-3520220414111606-b9d1b5f4 is in "failed" state.
  Components: 0 done, 1 failed
  Reason: Failed to build artifact module-build-macros in Koji

and https://koji.fedoraproject.org/koji/taskinfo?taskID=85662211 is in green "closed" state.

Things appear to work normally again. A fresh

$ fedpkg module-build
Submitting the module build...
The build 14303 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14303

did a full rebuild as expected now (and then ultimately failed for a different, well-understood reason).

Metadata Update from @sbergmann:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

2 years ago

...and now it's broken again,

$ git commit --allow-empty -m 'Rebuild against fixed libreoffice'
$ git push
$ fedpkg module-build
Submitting the module build...
The build 14308 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14308
$ fedpkg module-build-watch 14308
[Build #14308] libreoffice-stable-3520220420200926-b9d1b5f4 is in "init" state.
  Koji tag: None
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14308
[Build #14308] libreoffice-stable-3520220420200926-b9d1b5f4 is in "wait" state.
  Components: 195
[Build #14308] libreoffice-stable-3520220420200926-b9d1b5f4 is in "build" state.
  Components: [0%]: 1 in building, 0 done, 0 failed
    Building:
      - module-build-macros
        https://koji.fedoraproject.org/koji/taskinfo?taskID=85988237
[Build #14308] libreoffice-stable-3520220420200926-b9d1b5f4 is in "failed" state.
  Components: 0 done, 1 failed
  Reason: Failed to build artifact module-build-macros in Koji

and https://koji.fedoraproject.org/koji/taskinfo?taskID=85988237 says "GenericError: Build already in progress (task 85988230)".

Metadata Update from @sbergmann:
- Issue status updated to: Open (was: Closed)

2 years ago

@sbergmann this is not a new issue and in fact I've been facing it for the few weeks already - luckily just running fedpkg module-build is enough to start it again. For the cause - it looks like when fedpkg module-build is ran for the first time after a new commit then it actually spawns two builds for the module-build-macros nearly at the same time - one that is not reported by fedpkg module-build-watch (https://koji.fedoraproject.org/koji/taskinfo?taskID=85988230 in your case) completes first and the one that is reported (https://koji.fedoraproject.org/koji/taskinfo?taskID=85988237 in your case) fails because the other one is in the progress. Also I would expect that the task will fail way earlier, but it fails when the "unreported" build is actually finished.

@otaylor this is the issue that we've talked about few weeks ago, when I thought that it's caused by building multiple modules at the same time, but that's not true - building just one module is enough.

My best guess here is that the "wait" handler that spawns the module-build-macros build is getting run twice - this would be some sort of problem at the FedMsg or celery level.

Would really need to see logs on the mbs-backend node to make any progress in figuring out whta is going on.

Gave this another look today, haven't figured out the root cause yet but while the frontend submits one build as expected, the backend either consumes the same build message twice for module-build-service, or perhaps a duplicate message, and sends out a second build.

I've restarted the processes for now, in case they were in a bad state, but otherwise I will need to investigate this further.

Another data point from my LibreOffice build attempts: Last night I did

$ git commit --allow-empty -m 'Rebuild (in an attempt to make mbs happy)'
$ git push
$ fedpkg module-build
Submitting the module build...
The build 14359 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14359
$ fedpkg module-build-watch 14359
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "init" state.
  Koji tag: None
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14359
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "wait" state.
  Components: 195
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Components: [0%]: 1 in building, 0 done, 0 failed
    Building:
      - module-build-macros
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86258554
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "failed" state.
  Components: 0 done, 1 failed
  Reason: Failed to build artifact module-build-macros in Koji

which apparently ran into this double-build issue, so this morning I picked up again

$ fedpkg module-build-watch 14359
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "failed" state.
  Koji tag: module-libreoffice-stable-3520220426155144-b9d1b5f4
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14359
  Components: 0 done, 196 failed
  Reason: Failed to build artifact module-build-macros in Koji
$ fedpkg module-build
Submitting the module build...
The build 14359 was submitted to the MBS
Build URLs:
https://mbs.fedoraproject.org/module-build-service/2/module-builds/14359
$ fedpkg module-build-watch 14359
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Koji tag: module-libreoffice-stable-3520220426155144-b9d1b5f4
  Link: https://mbs.fedoraproject.org/module-build-service/2/module-builds/14359
  Components: [0%]: 0 in building, 1 done, 0 failed
    No building task.

[...]

[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Components: [97%]: 3 in building, 192 done, 0 failed
    Building:
      - xorg-x11-fonts
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86302948
      - yajl
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86303004
      - zxing-cpp
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86303039
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Components: [98%]: 2 in building, 193 done, 0 failed
    Building:
      - xorg-x11-fonts
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86302948
      - zxing-cpp
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86303039
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Components: [98%]: 1 in building, 194 done, 0 failed
    Building:
      - zxing-cpp
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86303039
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Components: [99%]: 0 in building, 195 done, 0 failed
    No building task.
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "build" state.
  Components: [99%]: 1 in building, 195 done, 0 failed
    Building:
      - libreoffice
        https://koji.fedoraproject.org/koji/taskinfo?taskID=86303904
[Build #14359] libreoffice-stable-3520220426155144-b9d1b5f4 is in "failed" state.
  Components: 195 done, 1 failed
  Reason: Component(s) libreoffice failed to build.

which apparently again ran into this double-build issue:
- The above output claims the build of the libreoffice component is https://koji.fedoraproject.org/koji/taskinfo?taskID=86303904, which is still in progress and in state "open" as of now.
- But https://mbs.fedoraproject.org/module-build-service/2/module-builds/14359 lists the libreoffice component with task_id 86305596, i.e., https://koji.fedoraproject.org/koji/taskinfo?taskID=86305596, which is in state "failed" with "GenericError: Build already in progress (task 86303904)"

For quite some time I experienced the issue in a following way:
when I submitted module build, it created (e.g.) four tasks (one for each Fedora release) and every single one of them failed.

That happened to me recently too.


However I've made builds of different streams of the same module, and in the two different streams, only some of the MBS tasks failed on this issue.

That sound's like there is some race condition about which of those two KOJI tasks will be used.

I do have two builds that are stuck:

https://release-engineering.github.io/mbs-ui/module/14487/components
-> leads to https://koji.fedoraproject.org/koji/taskinfo?taskID=87228438 which says "GenericError: Build already in progress (task 87228435)"

https://release-engineering.github.io/mbs-ui/module/14490/components
-> leads to https://koji.fedoraproject.org/koji/taskinfo?taskID=87234757 which says "GenericError: Build already in progress (task 87234754)"

Can we please raise the priority? While working on Fedora 36 Flatpak runtime and then porting applications to it I hit this many times and it's really becoming a nightmare to do anything with modules in Fedora.

Can we please raise the priority? While working on Fedora 36 Flatpak runtime and then porting applications to it I hit this many times and it's really becoming a nightmare to do anything with modules in Fedora.

Yes, please. I also keep running into this issue.

https://release-engineering.github.io/mbs-ui/module/14490/components
-> leads to https://koji.fedoraproject.org/koji/taskinfo?taskID=87234757 which says "GenericError: Build already in progress (task 87234754)"

I've tried fedpkg module-build-cancel 14490 and fedpkg module-build which was in past a way to go, but it's still the same :/.. I don't want to pollute the repository with empty commits just to get the module built.

There was an additional issue today as the mbs-poller went down on the backend, causing some builds to stall. I've restarted it, that should get some builds moving again.

As for the duplicated build issue, I believe it may be resolved by this change, which is not yet deployed in Fedora infra:
https://pagure.io/fm-orchestrator/pull-request/1711
I'll see about getting a hotfix release out ASAP, potentially stage today and production early next week.

The patch has been deployed to stage, will update once it's been deployed to production.

The patch has now been deployed to production. Will keep this ticket open for some time to confirm that it resolved the issue. Please let me know if you see it reoccur.

As subsequent builds have not had this issue, resolving ticket. Please file a new issue if problems reoccur.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog