#1563 Add cancel-on-fail to batch build feature
Opened 2 years ago by iucar. Modified a year ago

With the initial implementation of #854, batches just follow one another independently of possible failures. It would be nice to add e.g. a --cancel-on-fail feature so that dependent batches are automatically cancelled if any upstream build does not succeed. In this regard, I'm not sure if it's a good name for the flag, because, suppose we have:

  • Batch#1{A, B, C}->Batch#2{D, E}->Batch#3{F}

If build B fails, batch 2 is cancelled, and 3 should be cancelled too. But note that F depends on D and E, which didn't fail, but were cancelled. So "cancel-on-fail" may be misleading?

Anyway, I don't care much about the specific name, but it's a feature that would allow us to stop potential cascades of build failures, saving resources and debugging time.


Well, I'd like to have the behavior you describe as the "default" behavior.
@frostyx, what do you think, would that work for modules?

What I'm don't like about this is that any build failure would lead to the
"cancel-everything" situation. This would potentially require you to re-submit
everything from scratch - and for nontrivial cascades, it may be very painful.
So a bit better solution would be to give user some time period to fix the build
failure, by e.g. something like --replace-batch-build-id.

Metadata Update from @praiskup:
- Issue tagged with: RFE

2 years ago

Yeap, a period to fix the build would be definitely the best of both worlds, but I was proposing this as an intermediate step.

Well, I'd like to have the behavior you describe as the "default" behavior.

What I'm don't like about this is that any build failure would lead to the
"cancel-everything" situation.

I am a bit confused because the proposed behavior sounds to me like a "cancel-everything" situation.

@frostyx, what do you think, would that work for modules?

IMHO we don't need any special behavior for modules. Either everything succeeds and the module is considered to build successfully or some package fail and we don't really care what happens with the rest of the packages, the module is already considered as failed and we won't create a repository for it.

I am a bit confused because the proposed behavior sounds to me like a
"cancel-everything" situation.

What I meant is that we should target to a better solution than just that.
I really don't view that solution as optimal, but still better than what
we have now.

IOW, "cancel-everything" sounds better than continue with building
everything, which will in the best case end up with "fail everything"
situation. The worse situation may happen if some of the builds actually
succeed, but with wrong assumption (miscompilation).

IMHO we don't need any special behavior for modules.

Ok. Sounds good than.

What I meant is that we should target to a better solution than just that.
I really don't view that solution as optimal, but still better than what
we have now.

Ah, I see, and I agree. Though I cannot imagine a better approach than the described "cancel-everything".

A "resubmit cancelled batches" functionality could be added on top of that, which I think would be easier to implement than giving the user some time period to fix the build failure.

The state "canceled" is not assigned to batch but build. So when canceling the
batches, we'll have to go through all the builds and related buildchroots and cancel
them. Switching them back to "pending" is not really safe, that would invalidate
assumptions we have in other parts of our code.

I really think it is easier to allow fixing the build by replacing it with a different one
if the batch is still not finished.

Hi there. I'd really like to have a default behaviour where consequent batches are cancelled if any previous one was cancelled or failed.
This is maybe an extension but you could invert the logic and say: Only continue with the next batch if the previous one succeeded which should be the default IMHO.

Just FYI, I ran into such a situation when building dependent components of LLVM (e.g. llvm, clang, lld, lldb, etc.). There everything is build from one mono-repo and therefore I cannot fix a build without rebuilding the dependencies and rebuilding them as well. Maybe this situation is special. I've talked to @frostyx and @msuchy yesterday about our weird setup.

Hi there. I'd really like to have a default behaviour where consequent batches are cancelled if any previous one was cancelled or failed.
This is maybe an extension but you could invert the logic and say: Only continue with the next batch if the previous one succeeded which should be the default IMHO.

Just FYI, I ran into such a situation when building dependent components of LLVM (e.g. llvm, clang, lld, lldb, etc.). There everything is build from one mono-repo and therefore I cannot fix a build without rebuilding the dependencies and rebuilding them as well. Maybe this situation is special. I've talked to @frostyx and @msuchy yesterday about our weird setup.

After giving it some thought I have a different opinion now. A build should continue to the next batch but only for the chroots that succeeded a build. Makes sense?

Login to comment on this ticket.

Metadata