Learn more about these different git repos.
Other Git URLs
TL;DR: Tasks are not always reproducible with expected parameters
Build tasks are not stored in the database but rather generated on request. Their data are composed of several objects such as Copr, CoprChroot and Build. There is no way for a user to change data stored in build table, but it is possible for him to modify data in copr and copr_chroot tables. This behavior is correct and intended.
Copr
CoprChroot
Build
build
copr
copr_chroot
The problem occurs in the following scenario. A user sets some external repositories for the project -> saves -> creates a new build -> removes external repositories from the project settings -> requests a task from the previous build. Such task is not the same task that created the previous build.
This problem was insignificant until recently. Users had no reason to obtain tasks, so nobody cared. Since copr-rpmbuild release, this changed. We now want to provide a way to reproduce (on some level) builds on user's machine. This issue can lead to unexpected behavior of copr-rpmbuild and basically ruins its main purpose.
copr-rpmbuild
There are several possible solutions
The most of task data is stored in Build which is immutable. Only repos, buildroot_pkgs and use_bootstrap_container are stored elsewhere. We can create a copy of those data and store them within every build. We would actually need to store them within BuildChroot because they are chroot-specific, but the point is same.
repos
buildroot_pkgs
use_bootstrap_container
BuildChroot
Pros: - No need to create any additional tables - Easy to implement
Cons: - Don't deal with data duplicity on any level
Another solution would be to create a proper Task model stored within the database. Such object would contain all the data needed for the frontend to render /backend/get-build-task/<task_id>. This means that some data that are now stored in Build should be moved to Task. A downfall is that task is chroot-related, so the data will be duplicated. This is solvable by not moving the build data and storing only repos, buildroot_pkgs and use_bootstrap_container within the Task. But that way the name Task will be misleading.
Task
/backend/get-build-task/<task_id>
Pros: - Solves data duplicity for reproduced (not resubmitted) builds. We don't have the feature though and maybe will never have. - Pretty integer task identifier
Cons: - Needs to create a new database table - Basically for every BuildChroot there will need to be paired Task - A duplicity of currently non-duplicated data or not storing all the task data within this object (which kind of ruins the whole idea).
It would be also possible to create a snapshots of Copr and CoprChroot when they change. Then point the BuildChroot to a snapshot.
Pros: - Deals with the data duplicity on the best possible level
Cons: - Huge overkill
Which solution do you prefer? Is there anythink better? From the described solutions, I personally prefer the first one. It would be fairly easy to implement, the data duplicity would still be quite reasonable and it is a good enough solution with a potential to improve once necessary.
Metadata Update from @frostyx: - Issue assigned to frostyx
If package build succeeds/fails, it is mostly about some buildroot change in Fedora/CentOS/. So I've practically never had such problem -- in a sense that I wanted to reproduce build against old copr config. IMO, it would be sign that the project is a bit out of control, and it would naturally lead me to better practices.
In 100% cases when I want to re-submit the build AND "reproduce" locally is because my package fails to build from source (so I want to reproduce the failure). And if that accidentally succeeds, it means that the FTBFS is over and the job is done... (i'm not going to resubmit against older config).
I'm saying this because playing with "the past" is not the daily job -- and we really have to keep the "FTBFS resubmit" feature. Since both cli/webui implement the resubmit "naturally" now, please don't change that. The "reproduce" feature this PR is about should be optional.
Which solution do you prefer? Is there anythink better?
Since I could use this "reproduce" feature only theoretically (I don't think I'll ever need that, unless we can guarantee the previous buildroot state), I prefer the less intrusive implementation -- which is IMO 1) BUT with one slight change. Instead of generating new mock_chroot row with each build, you should only generate new row when there's (a) project change and (b) some build against the new config-set is submitted.
mock_chroot
(a) project change and (b) some build against the new config-set is submitted.
Ah, maybe this is the snapshot way (option 3), but I don't see the overkill so I think I don't understand.
Anyways, to be 100% clear -- I'm not even against not-implement-this idea :).
I guess, I like option one the best as the least intrusive.
By the way, the main problematic scenario is this:
1) user submits a build and it waits in a queue cause there are jobs ahead of it 2) user changes projects settings for the next build 3) the build submitted in step 1 is now affected by the changes as well when a builder is finally allocated for it
I don't think this is particularly desired.
And, yeah, option 1 (the least intrusive) seems the best.
Since both cli/webui implement the resubmit "naturally" now, please don't change that. The "reproduce" feature this PR is about should be optional.
Don't worry, this issue is not about to change this. The resubmit feature should work as is. However, the copr-rpmbuild should imho reproduce the build with same properties as the original build had. I wouldn't expect anything different when I pass a build ID to it.
I'm saying this because playing with "the past" is not the daily job
Hmm, ok, I probably shouldn't describe that problem as reproducibility-related. Clime has a much better scenario in the post above ^^.
BUT with one slight change. Instead of generating new mock_chroot row with each build, you should only generate new row when there's (a) project change and (b) some build against the new config-set is submitted. Ah, maybe this is the snapshot way (option 3), but I don't see the overkill so I think I don't understand.
Yeah, it seems like that you described a variation of my option 3. I suppose that you want to store mock confings from the particular settings. This won't be that easy, because mock configs are not generated on the frontend. Many changes would need to be done. For the sake of solving this problem, it is IMHO not worth it.
Option 3 is massive overkill. Option 2 has a problem when you remove data. I.e., in past you have builds with data, which is not used now. So I vote for an option 1. I only hesitate if we should store the data in BuildChroot as JSON or whether to normalize the data and create table BuildChrootData with (key, values) columns.
normalize the data and create table BuildChrootData with (key, values) columns.
Please no. This would kill the simplicity of it.
I only hesitate if we should store the data in BuildChroot as JSON or whether to normalize the data
Same here, but with normalizing the data in a different way. Whether to create data (or something) column to BuildChroot which would be a json blob or adding three columns (repos, additional_packages, use_bootstrap_container) to the BuildChroot.
data
additional_packages
Then I vote for data column with JSON blob.
@frostyx
The resubmit feature should work as is.
Ack.
However, the copr-rpmbuild should imho reproduce the build with same properties as the original build had.
I disagree, at least if it will be the only option. If I want to do local build, I want to rebuild the package from copr-dist git against the current copr project configuration, not against the old one.
I wrote:
By that I mean that I'm not against "reproduce" feature at all, I just claim that majority of maintainers will want the "resubmit" action even locally. So the 'copr-rpmbuild' should support both "resubmit" (needed in > 90% cases), and also "reproduce" (needed in < 10% cases). (edit: that's my safe estimation, if I was talking only about my "maintainer" use-cases, I would want in 99.9% cases the "resubmit" feature)
Metadata Update from @msuchy: - Issue tagged with: RFE
Metadata Update from @frostyx: - Assignee reset
This issue has been migrated to GitHub: https://github.com/fedora-copr/copr/issues/144
Metadata Update from @nikromen: - Issue close_status updated to: MIGRATED - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.