#4072 Running git hooks on server side
Closed: Won't Fix 3 months ago by pingou. Opened 6 months ago by bkabrda.

Running git hooks (to be more precise the dynamic authentication plugin and subclasses of `BaseRunner) in sshd has several disadvantages:

  • requires a lot of CPU to execute hooks fast (because of importing and initializing the whole Pagure machinery) for sshd and also for workers, which execute hooks when e.g. merging PRs
  • requires that the sshd service actually contains Pagure and all of its libraries (this relates to running in OpenShift as somewhat described in #3555)

The solution that I'd like to propose is that we make hookrunner a very lightweight process that only gathers basic info and triggers an internal endpoint of Pagure server. Running the hooks inside Pagure server will actually be pretty cheap, since it has all the libraries and connections initialized and should be able to execute the hooks very fast.

After some poking into this, I'd split this into two parts:

  • passing an additional writer argument to all hooks and writing to that object instead of using print directly
  • actually implementing the endpoint and making hookrunner trigger it

The advantage of the two-step approach is that even if we later figure out that the approach is generally not viable, we can still use the writer object to better control what/when/how gets printed to user instead of letting hooks print stuff randomly (if we ever do need to control this).


Metadata Update from @pingou:
- Issue tagged with: RFE

6 months ago

Would we need one or two writers then? (stdout vs stderr)

Also, I'm wondering if the first step isn't decoupled from the second (and thus we could separate both issues)

@pingou good point with 2 writers. It'd be possible to just pass a writer object that would have out() and err() methods, but I think this might be a problem, We'd need to record when every line was emitted to be able to print them to user in the correct order. Since user can't really tell which lines are printed to stdout and which to stderr, I'd suggest that we create just writer.out() method and leave the space open for implementing the writer.err() method if we ever see a usecase for it (or we make err an alias of out in our default implementation).

And yes, it was exactly my point to separate the two issues, even though the second one depends on the first.

Metadata Update from @bkabrda:
- Issue untagged with: RFE

6 months ago

Metadata Update from @pingou:
- Issue tagged with: RFE

6 months ago

I'm not quite sure the second depends on the first, the second could just print to the user the output of the API call

Well if the API endpoint has no means of gathering the output of hooks it runs, it can't return the output to hookrunner and that can't display it to user. Technically the tasks don't depend on each other in terms of hook execution, but they do in terms of getting proper feedback to user.

Metadata Update from @bkabrda:
- Issue untagged with: RFE

6 months ago

Well if the API endpoint has no means of gathering the output of hooks it runs, it can't return the output to hookrunner and that can't display it to user.

Duh! You are of course right :)

Metadata Update from @pingou:
- Issue tagged with: RFE

6 months ago

The problem I can see with this is a matter of scaling. You'd bottleneck on the central pagure server, which you can only have one instance of. How do we deal with needing the handle more webhook runs?

@ngompa I've previously successfully scaled Pagure server to more replicas (it's possible if you use redis-based sessions). Is there any new issue with scaling up server that I don't know about?

Metadata Update from @pingou:
- Issue set to the milestone: 5.3

5 months ago

Metadata Update from @karsten:
- Issue assigned to karsten

4 months ago

So I'm going to try to explain why after discussing this with @puiterwijk we believe this is not going to be something that we will be able to fix.

Basically, with repoSpanner the git hook does not have access to the git objects being pushed at the time they run, which means we are not able to rely on them existing, if we create the API endpoint to accept enough information to act, then we end up with basically sending the entire git objects to the API endpoint.

In addition, from the discussion we had it turns out that most of the hooks we have should be moved from update to post-receive and most of the pre-receive should become update (but we can fix this in another commit/PR later).

So I'm going to close this ticket as Won't Fix since it's not something we can fix :(

Metadata Update from @pingou:
- Issue close_status updated to: Won't Fix
- Issue status updated to: Closed (was: Open)

3 months ago

Login to comment on this ticket.

Metadata