#1563 Add cancel-on-fail to batch build feature
Opened 4 months ago by iucar. Modified 4 months ago

With the initial implementation of #854, batches just follow one another independently of possible failures. It would be nice to add e.g. a --cancel-on-fail feature so that dependent batches are automatically cancelled if any upstream build does not succeed. In this regard, I'm not sure if it's a good name for the flag, because, suppose we have:

  • Batch#1{A, B, C}->Batch#2{D, E}->Batch#3{F}

If build B fails, batch 2 is cancelled, and 3 should be cancelled too. But note that F depends on D and E, which didn't fail, but were cancelled. So "cancel-on-fail" may be misleading?

Anyway, I don't care much about the specific name, but it's a feature that would allow us to stop potential cascades of build failures, saving resources and debugging time.


Well, I'd like to have the behavior you describe as the "default" behavior.
@frostyx, what do you think, would that work for modules?

What I'm don't like about this is that any build failure would lead to the
"cancel-everything" situation. This would potentially require you to re-submit
everything from scratch - and for nontrivial cascades, it may be very painful.
So a bit better solution would be to give user some time period to fix the build
failure, by e.g. something like --replace-batch-build-id.

Metadata Update from @praiskup:
- Issue tagged with: RFE

4 months ago

Yeap, a period to fix the build would be definitely the best of both worlds, but I was proposing this as an intermediate step.

Well, I'd like to have the behavior you describe as the "default" behavior.

What I'm don't like about this is that any build failure would lead to the
"cancel-everything" situation.

I am a bit confused because the proposed behavior sounds to me like a "cancel-everything" situation.

@frostyx, what do you think, would that work for modules?

IMHO we don't need any special behavior for modules. Either everything succeeds and the module is considered to build successfully or some package fail and we don't really care what happens with the rest of the packages, the module is already considered as failed and we won't create a repository for it.

I am a bit confused because the proposed behavior sounds to me like a
"cancel-everything" situation.

What I meant is that we should target to a better solution than just that.
I really don't view that solution as optimal, but still better than what
we have now.

IOW, "cancel-everything" sounds better than continue with building
everything, which will in the best case end up with "fail everything"
situation. The worse situation may happen if some of the builds actually
succeed, but with wrong assumption (miscompilation).

IMHO we don't need any special behavior for modules.

Ok. Sounds good than.

What I meant is that we should target to a better solution than just that.
I really don't view that solution as optimal, but still better than what
we have now.

Ah, I see, and I agree. Though I cannot imagine a better approach than the described "cancel-everything".

A "resubmit cancelled batches" functionality could be added on top of that, which I think would be easier to implement than giving the user some time period to fix the build failure.

The state "canceled" is not assigned to batch but build. So when canceling the
batches, we'll have to go through all the builds and related buildchroots and cancel
them. Switching them back to "pending" is not really safe, that would invalidate
assumptions we have in other parts of our code.

I really think it is easier to allow fixing the build by replacing it with a different one
if the batch is still not finished.

Login to comment on this ticket.

Metadata