#2 WIP: design and workflow proposal
Opened 3 years ago by csomh. Modified 3 years ago
fedora-source-git/ csomh/docs design-and-workflow  into  main

file added
+142
@@ -0,0 +1,142 @@ 

+ # Source-git design

+ 

+ ## About source-git

+ 

+ Content of source-git repository is equivalent to dist-git, but uses upstream

+ format. This means that upstream sources are present as upstream Git-history,

+ not an archive created from upstream Git-history as in dist-git.

+ 

+ With this setup we aim to allow package maintainers to use modern Git-tools to

+ patch and update their packages.

+ 

+ Source-git repositories are meant as an add-on on top of dist-git. This means

+ that while dist-git will continue to be the authoritative place for source

+ code, work can be done in source-git and tooling will help with keeping the

+ two places in sync.

+ 

+ ## Design goals

+ 

+ ### Compatibility with dist-git

+ 

+ ### Consistency across packages

+ 

+ ### Respect Fedora Packaging Guidelines

+ 

+ ## Layout and structure

+ 

+ ## Workflow

+ 

+ ### Creating a source-git repository

+ 

+ In order to construct a source-git repository the following steps need to be

+ taken:

+ 

+ 1. Pick the upstream revision (version tag, branch or commit) in the upstream

+    Git repository which serves as the base for downstream development and

+    start a downstream branch from it.

+ 2. Take packaging related files (like spec-file, scripts), place them in a

+    `.distro` subdirectory and commit them.

+ 3. Apply the downstream patches (found in the corresponding dist-git

+    repository, at a selected revision) as defined in the spec-file as Git

+    commits.

+ 

+ Running a command similar to the one below would perform the steps above. The

+ upstream repo and a revision of it and the corresponding dist-git repo and a

+ selected revision of it would serve as input for this command, specified as a

+ local path or URL.

+ 

+     $ packit init --upstream-url <upstream-git-url> --upstream-ref <ref> \

+                   --dist-git-path <path> --dist-git-branch <branch>

+ 

+ ### Introducing a downstream change

+ 

+ Creating a change downstream (in the distribution) would be similar to

+ creating a change in the upstream repository:

+ 

+ 1. Start with the downstream branch to be updated.

+ 2. Do the change or cherry-pick commits from upstream branches.

+ 3. Commit, push and open an PR.

I'd appreciate if 1, 2 and 3 explicitly stated the operations are happening in the source-git repository which may not be obvious to casual reader.

+ 4. Automation takes the source-git PR, transforms it and opens a mirror PR in

+    dist-git. This way CI results produced in dist-git could be mirrored back

+    to source-git. Automation keeps the dist-git PR up to date as the

+    source-git PR is updated.

+ 5. After approval the source-git PR is merged. TODO(csomh): this needs to be

+    better defined, as there might be some non-trivial orchestration needed in

+    order to ensure consistency and avoid race-conditions.

+ 

+ ### Rebasing a distribution branch to a new upstream version

+ 

+ Downstream (distribution) history might diverge from

The same as Tomas's comment above - I'd explicitly state that this means source-git, because it took me quite a while to realize that distribution != dist-git

+ upstream, and so some packages might have a need to rebase these downstream

+ branches.

+ 

+ These rebases can potentially be "hard case" rebases, where conflicts need to

+ be resolved during the rebase.

+ 

+ Tooling should help contributors to review and test such rebases, and to

+ recover their working branches after a rebase.

+ 

+ #### Proposing rebases

+ 

+ Proposing a rebase to a new rev of upstream would be done using a dedicated

+ branch. Lacking a better name, let’s call them rebase-proposal branches.

+ 

+ Rebase-proposal branches would be named as `<distro-token>-<ascending-index>`

+ or(?) `<distro-token>-<ascending index>-<upstream-version>`. For example:

+ `f34-0000001` or `f34-0000001-2.23.4.g1233`.

I'd say that having a branch for (every) upstream release is crucial here and that's how teams already work who use source-git style of work.

Just wondering what customization we want to enable because I presume the naming and branch hygiene would differ between projects.

I'd personally vote to make the naming completely up to users while providing best practices: basically enabling people to shoot themselves in their feet while giving them manual how not to do that :)

+ 

+ These branches would be kept "forever" and would serve as a way to track the

+ history of the rebases. This would be achieved by relying on the

+ `<ascending-index>` component of the branch names. Tooling would take care of

+ implementing this index in a consistent manner.

+ 

+ The `<upstream-version>` component of the branch names could make more obvious

+ which upstream rev was the parent of the rebase-proposal branch. Not sure if

+ this would be actually useful or something wanted.

+ 

+ The process would be as follows:

+ 1. Checkout a rebase-proposal branch from the downstream distribution branch.

+ 2. Rebase the rebase-proposal branch to the new upstream revision.

+ 3. Push the rebase-proposal branch and open a PR.

+ 

+     $ git fetch upstream

+     $ git checkout -b f34-0000001-2.23.4.g1233 f34

checkout -b is semi-deprecated, please use switch -c.

+     $ git rebase v2.23.4 f34-0000001-2.23.4.g1233

+     $ git push -u origin f34-0000001-2.23.4.g1233

+ 

+ Some of the commands above could be wrapped in a rpkg clones. Something like:

+ 

+     $ fedpkg propose-rebase v2.23.4 f34

+ 

+ Open a PR and review. Label the MR with "rebase". This will block the MR to be

+ merged, while it can be used for reviews and testing. After the rebase is

+ approved:

+ 

+     $ git checkout f34

+     $ git reset --hard f34-0000001-2.23.4.g1233

+     $ git push --force-with-lease source-git f34

git checkout f34

git switch f34

git reset --hard f34-0000001-2.23.4.g1233 && git push --force-with-lease source-git f34

That seems very wrong. "History rewriting" is generally only OK when it stays in a local repo,
or when you have a very small group of people who are in constant communication and
can shout "rebasing now!" across the room (or do the IM equivalent). When you have a looser
group of committers, history rewrites become extremely confusing: local patches don't
apply anymore but you don't know where exactly to rebase, you refer to commits in other
places but they are gone, etc.

Then there's the problem that people who are not git experts (i.e. most people) tend
to lose work sooner or later when history rewriting and force pushes are employed.

Instead, a solution that preserves history should be found.
We want something like this:

git switch f34
git merge --strategy=theirs f34-0000001-2.23.4.g1233 -m 'Update to 2.23.4.g1233'

i.e. we generate a commit where the contents are taken exactly from their branch,
but history is preserved and the primary parent is the previous commit on f34.

Sadly, there is --strategy=ours, but no --strategy=theirs.

https://blog.holisticon.de/2021/03/git-merge-strategy-theirs/ suggests the following:

our=f34
their=f34-0000001-2.23.4.g1233
version=2.23.4.g1233

git checkout $our
COMMIT_ID=$(git commit-tree -p HEAD -p $their -m "Update to $version" $their^{tree})
git merge --ff-only $COMMIT_ID

Maybe we could convince git folks to add -s theirs, but even without that the workaround
is not too terrible. We could stash it in fedpkg.

With this, you can look at history and it is meaningful:

$ git diff HEAD~ '**.spec'

and so on.

Once that's in place, the whole procedure with accepting force-pushes when (conditions)
are not necessary anymore. The update is an ff merge.

+ 

+ The Git-forge (i.e. Pagure) would only accept a force push for the f34 branch

+ if the target commit is flagged as tested and approved for rebases and the

+ target commit is the head of a "rebase-proposal" branch.

+ 

+ #### Rebasing WIP after a distribution branch rebase

+ 

+ Recreate the local f34 branch and fetch all other branches from source-git.

+ 

+ On a feature branch:

+ 1. Find the nearest "rebase proposal" branch.

+ 2. Rebase commits since that branch onto f34.

+ 3. Force push.

+ 

+ ### Building a new release

+ 

+ This is just a bump in the release number. Doing it through source-git will

+ introduce unnecessary friction: time will be spent to sync the change to

+ dist-git, and only then the build can be triggered. How can this be improved?

+ 

+ ## Tooling

+ 

+ ### Local (CLI)

+ 

+ ### Remote (aka services)

There are still many things to figure out, but I'll just put it up here for visibility.

I'd appreciate if 1, 2 and 3 explicitly stated the operations are happening in the source-git repository which may not be obvious to casual reader.

I'd say that having a branch for (every) upstream release is crucial here and that's how teams already work who use source-git style of work.

Just wondering what customization we want to enable because I presume the naming and branch hygiene would differ between projects.

I'd personally vote to make the naming completely up to users while providing best practices: basically enabling people to shoot themselves in their feet while giving them manual how not to do that :)

The same as Tomas's comment above - I'd explicitly state that this means source-git, because it took me quite a while to realize that distribution != dist-git

I'd say that having a branch for (every) upstream release is crucial here and that's how teams already work who use source-git style of work.

While this is common, this may not be optimal for every project. For instance, kernel has branches for every upstream base, but not specific stable releases. There is a fedora-5.11 branch which has included 5.11.0-5.11.20 and counting. We can do it this way because it ties in well with upstream and we rebase stable Fedora so that a single release is built across all stable Fedora releases. What happens for upstreams that do not release as often as kernel, and do not necessarily backport major changes within a release? You may have a 1.0 that was used for Fedora 32, another 1.0 which also contains a new feature from development and is only used on Fedora 33, and yet another which has everything from Fedora 33 and a couple more changes for Fedora 34 which were not backported to f33 or f32. In this case, having a 1.0 branch would make less sense than having fedora32 fedora33 and fedora34 branches. Every package should have some sort of convention here which is optimal for that package, and that should be documented within their repository, but should not be dictated by the SIG.

@jforbes thanks for providing the kernel context

I should have elaborated a bit more on my point. I see source-based maintenance being different than dist-git that one wants to track upstream releases and then decide where and how they land in actual downstream branches (as an example, here's how python-maint does it.). You are right this scheme si gonna vary between projects and packages: for some, it may as easy as having one branch for a major release and just sync it to as many downstream branches as possible. For others, it will be more complex.

This conversation just proves to me that we need to have the best practices documented and cover several common scenarios:

  • release often, no backports (packit, cockpit)
  • release often, with backports (kernel, systemd)
  • release infrequently
  • "sane default"

The suggestion:

"Rebase-proposal branches would be named as
<distro-token>-<ascending-index> or(?)
<distro-token>-<ascending index>-<upstream-version>.
For example: f34-0000001 or f34-0000001-2.23.4.g1233."

makes sense as a way to have a way to do review of rebases, but unless I'm mis-reading it seems like the merges into <distro-token> loose history.

Consider initially we have an 'f34' branch with version 1.2 inherited from the 'f33' branch. We do a few more commits in 'f34', then we decide to rebase to 1.4.

So we create a f34-000001-1.4 branch to review a rebase to 1.4 This gets reviewed, approved, and force pushed into 'f34'.

This looses history of commits made between 'f33' and 'f34'.

We do a few more commits on 'f34' to fix issues with 1.4 and then we rebase to 1.6, via a f35-00002-1.6 branch which gets forced pushed into f34. This again looses hisory of the fixes
on 'f34' in between the 1.4 and 1.6 rebases.

The fact that the <distro-token>-<ascending index> branches are kept forever doesn't help us, because that only preserves the rebase commit, not the fixes made on <distro-token> in between rebases.

The only way to preserve history is if we never commit to <distro-token> branch at all. Only ever use the versioned branches <distro-token>-<ascending index> for all commits. src-git would not even need a <distro-token> branch - dist-git would always just sync from the latest <distro-token>-<ascending index>

In our CentOS Stream / RHEL maint work, we have a slightly different approach. All work is done on the <distro-token> branch. When time comes to rebase, the content of <distro-token> branch is copied into a new branch <distro-token>-<old-upstream-version>. Now <distro-token> is force pushed with the new upstream content. This doesn't allow for review of the rebase force push though. If the rebase is complex enough, we do sometimes create <distro-token>-<new-upstream-version> branch in a contributors' fork in order to do review, before it then gets force pushed. I'm not saying this a perfect workflow, but at least the history is always preserved

checkout -b is semi-deprecated, please use switch -c.

git checkout f34

git switch f34

git reset --hard f34-0000001-2.23.4.g1233 && git push --force-with-lease source-git f34

That seems very wrong. "History rewriting" is generally only OK when it stays in a local repo,
or when you have a very small group of people who are in constant communication and
can shout "rebasing now!" across the room (or do the IM equivalent). When you have a looser
group of committers, history rewrites become extremely confusing: local patches don't
apply anymore but you don't know where exactly to rebase, you refer to commits in other
places but they are gone, etc.

Then there's the problem that people who are not git experts (i.e. most people) tend
to lose work sooner or later when history rewriting and force pushes are employed.

Instead, a solution that preserves history should be found.
We want something like this:

git switch f34
git merge --strategy=theirs f34-0000001-2.23.4.g1233 -m 'Update to 2.23.4.g1233'

i.e. we generate a commit where the contents are taken exactly from their branch,
but history is preserved and the primary parent is the previous commit on f34.

Sadly, there is --strategy=ours, but no --strategy=theirs.

https://blog.holisticon.de/2021/03/git-merge-strategy-theirs/ suggests the following:

our=f34
their=f34-0000001-2.23.4.g1233
version=2.23.4.g1233

git checkout $our
COMMIT_ID=$(git commit-tree -p HEAD -p $their -m "Update to $version" $their^{tree})
git merge --ff-only $COMMIT_ID

Maybe we could convince git folks to add -s theirs, but even without that the workaround
is not too terrible. We could stash it in fedpkg.

With this, you can look at history and it is meaningful:

$ git diff HEAD~ '**.spec'

and so on.

Once that's in place, the whole procedure with accepting force-pushes when (conditions)
are not necessary anymore. The update is an ff merge.

Metadata