#41 support triggering tasks based on github PRs
Closed: Fixed 6 years ago Opened 6 years ago by kparal.

@jscotka wants to run https://pagure.io/task-mtf-containers on each github PR for a set of projects. Please add this functionality to trigger, to be able to run tasks on github PRs.

The documentation for fedmsgs from github PRs are here:
https://fedora-fedmsg.readthedocs.io/en/latest/topics.html#github-pull-request-closed
(and the sections below this one).

I imagine we'd have to create a new item+type for this, something like type: git_pr and item: https://github.com/fedora-infra/bodhi/pull/1799.


Sure thing, seems easy enough. Just to be sure. The item would be the msg->pull_request->html_url value, right?
What other data is important/interesting for the purposes of triggering? How would the "set of projects" be defined? msg->repository->full_name seems like a reasonable candidate, to me, but it really depends on what (and how) is triggered, and that is something @jscotka needs to clarify.

Metadata Update from @jskladan:
- Issue assigned to jskladan

6 years ago

Hi, yes, this sounds like good candidate.
My idea is then for example do it with asterisks if possible or another matching:
this is example: https://github.com/container-images/memcached/
msg->repository->full_name: container-images/memcached
msg->pull_request->html_url: https://github.com/container-images/memcached/pull/12
I would like to schedule it for all containers under this structure: container-images/* (or similar syntax)

It reminds me if we would like to have it github specific, then this full name is okay, in case there will be possible to do it with similar gitlab repos, then probably more unique will be: msg->repository->html_url like https://github.com/container-images/memcached/

From my perspective I'm interested just in github, so collision should not happen, in case there will be added more similar git* then more unique name should be used.

Good point with the git source, @jscotka. Especially with the item/type combo @kparal suggests, which implies some level of general usage out of github (for example Pagure PRs).

So, to put things in the trigger-perspective, we have a rules-engine, that gets configured with stuff like this:

- when:
    message_type: KojiBuildPackageCompleted
    name:
      $in: ${critpath_pkgs}
      $nin: [firefox, kernel]
  do:
    - {tasks: [abicheck]}

Where the important part (for this ticket) is the when clause, which matches data grabbed from the received fedmsg (here it is the message type, and package name, though the trigger also provides stuff like version, release or distgit_branch).

IIUIC the rule for what @jscotka needs would look something like this:

when:
  message_type: PullRequestCreated
  git_host: github
  repo_full_name: {$regex: '/^container-images\/.+/'}
do:
  - {tasks: [task-mtf]}

That checks the full_name of each "pull request created" message from github against the regexp, and schedules the task-mtf-containers task.

We have an issue though - I might be blind, but I can't find the FedMsg spawned for the "pull request is created" event on gitihub. There are:
- github.pull_request.closed whenever someone comments on a pull request.
- github.pull_request_review
-- whenever someone approves a pull request.
-- whenever someone comments on pull request review but it’s in pending state and only visible to commenter.
-- whenever someone requests changes on pull request review.
- github.pull_request_review_comment whenever someone comments on a pull request.

And I suppose that the task should be ran when a PR is created (in order to test it), not when it's closed. @jscotka, @kparal - do you see a fedmsg on which we could reasonably trigger? If not, then I suppose requesting a new fedmsg for when the PR is created is a must, before we can continue, @jscotka.

Watch this :-)
https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.github.pull_request.opened

( Reported a bug: https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure/issues/449 )

Now, thinking more about this, we might need to do type=github_pr (or github_pull_request) instead of type=git_pr. Because there is no standardized API for working with PRs, it's service specific (you'll access github PRs in a different way than gitlab or pagure PRs). A task will implement support just for a specific one or more services.

So, if this is the case, it might make sense to use msg->pull_request->url, which gives a json that can be directly worked with:

type: github_pull_request
item: https://api.github.com/repos/fedora-infra/github2fedmsg/pulls/6

For example item.diff_url gives you a standard patch you can apply. So the task will know how to operate with this github-specific json and will do whatever it wants with it.

Wdyt?

Watch this :-)
https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.github.pull_request.opened
( Reported a bug: https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure/issues/449 )

Cool!

Now, thinking more about this, we might need to do type=github_pr (or github_pull_request) instead of type=git_pr. Because there is no standardized API for working with PRs, it's service specific (you'll access github PRs in a different way than gitlab or pagure PRs). A task will implement support just for a specific one or more services.

Yeah, that was one of the things that I considered, but then again, and the reason for the git_host key in trigger. But on a general level, I don't think the git_pr or pagure_pr types are necessary. The tasks will only be scheduled for the git-providers they can work with, and if somebody can make use of both pagure and gituhub (for example), and know they only get pagure or github PRs (because of trigger), the code in the task can easily use the right path for either.

This is also another reason for us to really sit down and consider the "we only pass item and type params to tasks" concept. I really, really believe we should not have those limits, and be able to (e.g.) pass stuff like git_source=github to the tasks, if needed. But that is for another ticket.

For example item.diff_url gives you a standard patch you can apply. So the task will know how to operate with this github-specific json and will do whatever it wants with it.
Wdyt?

Json looks great, I'm not necessarily against using that as the item, but it depends more on the task, than us, I think. @jscotka - what would you prefer?

Hmm, this might be a problem:

$ curl 'https://api.github.com/repos/fedora-infra/github2fedmsg/pulls/6'
{
  "message": "API rate limit exceeded for 213.175.37.10. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
  "documentation_url": "https://developer.github.com/v3/#rate-limiting"
}

Anyway, I just wanted to say that the json will surely contain even the "html url" of the PR, if the task wanted to use it for something. So it's a superset of information and probably the most useful thing we can give to the task.

As for the generic type=pull_request and limit the hosts in the trigger, yes, it's also a possible approach. Not sure which one is better but both should work.

@kparal - the reason I'm against github_pr, pagure_pr, whatever_pr types is consistency. Similarly to your issues with https://github.com/fedora-infra/bodhi/pull/1847 I don't really believe that we should be overcoming our limitations with creating multiple "this is PR" types. If we want to pass the "this is from github" or "this is from pagure" (with the current limitations), we already have it in the URL, and the tasks can parse it from there. Or if we don't thing matching (https?|git)://[^/]*github.com/ is viable in the tasks (I don't see a huge problem there, given we probably won't ever have more then a couple of git-pr-sources anyway), we could add the identifier to the item. But it IMO is redundant, especially since we don't even have tasks that want to consume PRs from multiple sources yet :)

Hi,
there is long discussion :-) but yep I agree with some generic type and then have host based filters for trigerring. example: type = pull_request item=https://github.com/container-images/memcached/pull/12 and filter like

when:
  message_type: pull_request
  git_host: github
  repo_full_name: {$regex: '/^container-images\/.+/'}
do:
  - {tasks: [task-mtf-containers]}

It seems like to be very generic and very friendly to my usecase, and to be scheduled on opened event (maybe also for closing one could work for both of them)

Otherwise URL https://github.com/container-images/memcached/pull/12 contains all information (it is github, full name container-images/memcached and that it is PR 12)

otherwise if it will work with pagure, this URL is also unique:
https://pagure.io/taskotron/task-modularity-testing-framework/pull-request/3 just changed pull to pull-request (any idea why this is different on github and pagure?)

@jscotka - are you sure this is going to be enough? As I understand it, that will only trigger on PR creation, not on updates or anything else?

I figure this is a good place to start but also wanted to point out that it may not be a complete end solution for the feature desired.

@tflink Sure, I want to have it trigerred on all PR events, changes updates, but as you mentioned, it is good start and then we can continue then.

I deployed a scratch build of taskotron-trigger and libtaskotron to dev, and github PRs are working!
https://github.com/container-images/dovecot/pull/8

This is now deployed to stg and working. I'm hoping to deploy this to production next week.

This has been deployed to production!
https://taskotron.fedoraproject.org/resultsdb/results?&testcases=dist.mtf-containers&since=2017-11-27
@jscotka now needs to create a fedmsg listener which will comment into their github and link to the logs:
https://pagure.io/task-mtf-containers/issue/2

Also we'll need to respond to github PR updates in the future. But that we can do in a followup ticket. Closing this.

Metadata Update from @kparal:
- Issue close_status updated to: Fixed
- Issue priority set to: Normal

6 years ago

Login to comment on this ticket.

Metadata