Issue #144: Tasks are not always reproducible - copr

copr / copr

#144 Tasks are not always reproducible

Closed: MIGRATED a year ago by nikromen. Opened 6 years ago by frostyx.

TL;DR: Tasks are not always reproducible with expected parameters

Build tasks are not stored in the database but rather generated on request. Their data are composed of several objects such as Copr, CoprChroot and Build. There is no way for a user to change data stored in build table, but it is possible for him to modify data in copr and copr_chroot tables. This behavior is correct and intended.

The problem

The problem occurs in the following scenario. A user sets some external repositories for the project -> saves -> creates a new build -> removes external repositories from the project settings -> requests a task from the previous build. Such task is not the same task that created the previous build.

This problem was insignificant until recently. Users had no reason to obtain tasks, so nobody cared. Since copr-rpmbuild release, this changed. We now want to provide a way to reproduce (on some level) builds on user's machine. This issue can lead to unexpected behavior of copr-rpmbuild and basically ruins its main purpose.

Solution

There are several possible solutions

1. Save data within Build

The most of task data is stored in Build which is immutable. Only repos, buildroot_pkgs and use_bootstrap_container are stored elsewhere. We can create a copy of those data and store them within every build. We would actually need to store them within BuildChroot because they are chroot-specific, but the point is same.

Pros:
- No need to create any additional tables
- Easy to implement

Cons:
- Don't deal with data duplicity on any level

2. Task entity

Another solution would be to create a proper Task model stored within the database. Such object would contain all the data needed for the frontend to render /backend/get-build-task/<task_id>. This means that some data that are now stored in Build should be moved to Task. A downfall is that task is chroot-related, so the data will be duplicated. This is solvable by not moving the build data and storing only repos, buildroot_pkgs and use_bootstrap_container within the Task. But that way the name Task will be misleading.

Pros:
- Solves data duplicity for reproduced (not resubmitted) builds. We don't have the feature though and maybe will never have.
- Pretty integer task identifier

Cons:
- Needs to create a new database table
- Basically for every BuildChroot there will need to be paired Task
- A duplicity of currently non-duplicated data or not storing all the task data within this object (which kind of ruins the whole idea).

3. Project snapshots

It would be also possible to create a snapshots of Copr and CoprChroot when they change. Then point the BuildChroot to a snapshot.

Pros:
- Deals with the data duplicity on the best possible level

Cons:
- Huge overkill

Conclusion

Which solution do you prefer? Is there anythink better?
From the described solutions, I personally prefer the first one. It would be fairly easy to implement, the data duplicity would still be quite reasonable and it is a good enough solution with a potential to improve once necessary.

Metadata Update from @frostyx:
- Issue assigned to frostyx

6 years ago

praiskup commented 6 years ago

The problem occurs in the following scenario. A user sets some external
repositories for the project -> saves -> creates a new build -> removes
external repositories from the project settings -> requests a task from the
previous build. Such task is not the same task that created the previous
build.

If package build succeeds/fails, it is mostly about some buildroot change
in Fedora/CentOS/. So I've practically never had such problem -- in a
sense that I wanted to reproduce build against old copr config. IMO, it
would be sign that the project is a bit out of control, and it would
naturally lead me to better practices.

In 100% cases when I want to re-submit the build AND "reproduce"
locally is because my package fails to build from source (so I want to
reproduce the failure). And if that accidentally succeeds, it means
that the FTBFS is over and the job is done... (i'm not going to resubmit
against older config).

I'm saying this because playing with "the past" is not the daily job
-- and we really have to keep the "FTBFS resubmit" feature. Since both
cli/webui implement the resubmit "naturally" now, please don't change
that. The "reproduce" feature this PR is about should be optional.

Which solution do you prefer? Is there anythink better?

Since I could use this "reproduce" feature only theoretically (I don't think I'll
ever need that, unless we can guarantee the previous buildroot state), I prefer
the less intrusive implementation -- which is IMO 1) BUT with one slight change.
Instead of generating new mock_chroot row with each build, you should only
generate new row when there's (a) project change and (b) some build against
the new config-set is submitted.

praiskup commented 6 years ago

(a) project change and (b) some build against the new config-set is submitted.

Ah, maybe this is the snapshot way (option 3), but I don't see the overkill
so I think I don't understand.

Anyways, to be 100% clear -- I'm not even against not-implement-this idea :).

clime commented 6 years ago

I guess, I like option one the best as the least intrusive.

By the way, the main problematic scenario is this:

1) user submits a build and it waits in a queue cause there are jobs ahead of it
2) user changes projects settings for the next build
3) the build submitted in step 1 is now affected by the changes as well when a builder is finally allocated for it

I don't think this is particularly desired.

And, yeah, option 1 (the least intrusive) seems the best.

frostyx commented 6 years ago

Since both cli/webui implement the resubmit "naturally" now, please don't change
that. The "reproduce" feature this PR is about should be optional.

Don't worry, this issue is not about to change this. The resubmit feature should work as is. However, the copr-rpmbuild should imho reproduce the build with same properties as the original build had. I wouldn't expect anything different when I pass a build ID to it.

I'm saying this because playing with "the past" is not the daily job

Hmm, ok, I probably shouldn't describe that problem as reproducibility-related. Clime has a much better scenario in the post above ^^.

BUT with one slight change. Instead of generating new mock_chroot row with each build, you should only generate new row when there's (a) project change and (b) some build against
the new config-set is submitted.
Ah, maybe this is the snapshot way (option 3), but I don't see the overkill
so I think I don't understand.

Yeah, it seems like that you described a variation of my option 3. I suppose that you want to store mock confings from the particular settings. This won't be that easy, because mock configs are not generated on the frontend. Many changes would need to be done. For the sake of solving this problem, it is IMHO not worth it.

msuchy commented 6 years ago

Option 3 is massive overkill. Option 2 has a problem when you remove data. I.e., in past you have builds with data, which is not used now. So I vote for an option 1. I only hesitate if we should store the data in BuildChroot as JSON or whether to normalize the data and create table BuildChrootData with (key, values) columns.

frostyx commented 6 years ago

normalize the data and create table BuildChrootData with (key, values) columns.

Please no. This would kill the simplicity of it.

I only hesitate if we should store the data in BuildChroot as JSON or whether to normalize the data

Same here, but with normalizing the data in a different way. Whether to create data (or something) column to BuildChroot which would be a json blob or adding three columns (repos, additional_packages, use_bootstrap_container) to the BuildChroot.

msuchy commented 6 years ago

Then I vote for data column with JSON blob.

praiskup commented 6 years ago

@frostyx

The resubmit feature should work as is.

Ack.

However, the copr-rpmbuild should imho reproduce the build with same properties as the original build had.

I disagree, at least if it will be the only option. If I want to do local build, I want to rebuild the package from copr-dist git against the current copr project configuration, not against the old one.

praiskup commented 6 years ago

I wrote:

I disagree, at least if it will be the only option. If I want to do local build, I want to rebuild the package from copr-dist git against the current copr project configuration, not against the old one.

By that I mean that I'm not against "reproduce" feature at all, I just claim that majority of maintainers will want the "resubmit" action even locally. So the 'copr-rpmbuild' should support both "resubmit" (needed in > 90% cases), and also "reproduce" (needed in < 10% cases).
(edit: that's my safe estimation, if I was talking only about my "maintainer" use-cases, I would want in 99.9% cases the "resubmit" feature)

Edited 6 years ago by praiskup

Metadata Update from @msuchy:
- Issue tagged with: RFE

5 years ago

Metadata Update from @frostyx:
- Assignee reset

5 years ago

nikromen commented a year ago

This issue has been migrated to GitHub: https://github.com/fedora-copr/copr/issues/144

Metadata Update from @nikromen:
- Issue close_status updated to: MIGRATED
- Issue status updated to: Closed (was: Open)

a year ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

None

Milestone

None

blocked

None

copr / copr

Source Code

Documentation

#144 Tasks are not always reproducible Closed: MIGRATED a year ago by nikromen. Opened 6 years ago by frostyx.

The problem

Solution

1. Save data within Build

2. Task entity

3. Project snapshots

Conclusion

Metadata

RFE

#144 Tasks are not always reproducible

Closed: MIGRATED a year ago by nikromen. Opened 6 years ago by frostyx.