#11641 Commits don't end up on the scm-commits list
Opened 6 months ago by abompard. Modified 6 days ago

The old fmn was used to watch all commits on src.fedoraproject.org and send them to the scm-commits@lists.fedoraproject.org list.
Since I took down the old FMN, there's no more emails there. ;(

This list was super high volume, but it was a way to have commits for all packages transparently go out and allow auditing from 3rd parties, etc. It would be super nice if we could make this work again.

I think it would be a nice exercise for someone who wants to learn writing a fedora-messaging consumer. Maybe a toddler would be better btw.


Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: dev, medium-gain, medium-trouble

6 months ago

Probably a good ticket to bring up on I&R standup.

OK, so let me make a summary of this, after a week of work or so. The solution is composed of 3 elements:
1. A git hook running on the server that will send messages on push. This hook already exists.
2. A schema for the messages sent by the git hook. This is what will be used to generate the Subject and the body of the email. This work used to be done by fedmsg_meta_fedora_infrastructure/scm.py on fedmsg.
3. A fedora-messaging consumer that will receive the messages, use the schema to generate the email components, and send the email. One process to rule them all, and in the darkness of the datacenter, bind them.

Part 3 could be done by anyone new to fedora-messaging and Toddlers, it's simple enough I think. I'm happy to give some mentoring on the fedora-messaging part (I haven't ever written a Toddler yet so I don't know about that part).

Part 1 is... trickier. The git hook is actually installed by Ansible, and has two versions, slightly different from each other, one adapted to Python 3 but not the other, one supporting namespaces but not the other. It's a bit of a mess. And hard to evolve into something that would use schemas as required by Part 2. Let's try to minimize the amount of raw code that we deploy with Ansible, please.

I've taken those git hooks and tried to merge them into a proper repo, retaining the features and the Python 3.6 compatibility (wasn't easy, but Pagure is currently running on Python 3.6), and adding some unit tests and QA tools at the same time.

I've also written the schemas required by Part 2 in another repo.

I don't think we want to deploy code directly from Github on pkgs01 and batcave, so I had those two be packaged into RPMs by Packit into a Copr repo: https://copr.fedorainfracloud.org/coprs/abompard/fedora-messaging-git-hook/.

I'm going to rebuild the SRPMs from this repo into Koji's infra tag each time there's an update in the code even if it's an extra step because this is pretty sensitive code, running on batcave and dist-git on each commit and having full access to the repos. Or do you think it's not necessary?

I'll try on staging today and hopefully it'll work out fine enough. We can work on the consumer (Part 3) in parallel.

This all seems kinda complex when it could just send email from the hook, but makes sense. :(

one thing I am a bit worried at with this and toddlers is that it will get a heavy processing flow... if we have a bunch of things in toddlers we need to make sure a problem with one toddler doesn't cause all of them to stop processing... or slow down.

This all seems kinda complex when it could just send email from the hook, but makes sense. :(

Agreed, if the hook has no other use than sending an email to scm-list, then it's very overkill. But I think we want a fedora-messaging message to be sent when a commit is received nonetheless, so that people can get those from FMN, have it be recorded in datanommer, etc.

one thing I am a bit worried at with this and toddlers is that it will get a heavy processing flow... if we have a bunch of things in toddlers we need to make sure a problem with one toddler doesn't cause all of them to stop processing... or slow down.

Good point, I haven't looked at Toddler's code, but I would bet that processing of a message starts when the processing of the previous message has finished. So one slow toddler will slow down the whole processing. We should however be able to run multiple toddlers pods in parallel, and they would not be waiting on each other.

One more thing: Pagure sends a message on each push, so for dist-git we actually have two messages from the same action. Not a big deal, but maybe a bit superfluous.
We still need this git hook for the batcave repos (such as ansible), though.

toddlers is actually very good at scaling horizontally. I've had times where it
was stuck due to messages not being in the expected format and the toddler was
thus crashing. I just bumped the number of pods to 50 and let it seat over the
week-end and waited for the next week to make the code change that allowed the
toddler to handle these messages.

So if you're worried about processing being slow, just increase the number of
pods.

one thing I am a bit worried at with this and toddlers is that it will get a heavy processing flow... if we have a bunch of things in toddlers we need to make sure a problem with one toddler doesn't cause all of them to stop processing... or slow down.

Good point, I haven't looked at Toddler's code, but I would bet that processing of a message starts when the processing of the previous message has finished. So one slow toddler will slow down the whole processing. We should however be able to run multiple toddlers pods in parallel, and they would not be waiting on each other.

toddlers is actually very good at scaling horizontally. I've had times where it
was stuck due to messages not being in the expected format and the toddler was
thus crashing. I just bumped the number of pods to 50 and let it seat over the
week-end and waited for the next week to make the code change that allowed the
toddler to handle these messages.

So if you're worried about processing being slow, just increase the number of
pods.

@abompard Any news here. ;( it's been down a while now... makes me sad. ;(

Metadata Update from @abompard:
- Issue assigned to abompard

4 months ago

Writing a toddler is still up for grabs for anyone who wants to learn how to do that.
That said, no one wants a sad Kevin, so I'll have a look hopefully next week.

@abompard any news here? The lack of emails... so sad. ;)

Got bogged down by mirrormanager, sorry.

The new distgit_commit_processor is merged, but needs new configuration to be merged in Ansible and deployed before it can be tested in staging.

@zlopez thanks for the review!

@kevin I would deploy the changes myself, but I don’t think I’ve ever deployed on our OpenShift before, and I’m not sure if there’s anything I need to keep in mind (compared to non-OpenShift playbooks). Also, no idea how to test the change, for largely the same reason. 😉

@nphilipp The playbooks are being run the same way, you just need to check how the build and deployment finished in openshift.

Yeah... for openshift apps ansible runs and templates things to os-control01 or os-control01.stg (or both) and then runs a 'oc apply -f ...' with some validation to make the thing exist in openshift.

Then use the normal console or oc to debug and such.

Testing something like this is kinda hard. We could deploy to staging, but there's not much traffic there, although I suppose doing a commit to a package on src.stg should result in a commit email going out?

Login to comment on this ticket.

Metadata
Boards 1
dev Status: Backlog