#84 Staging pipeline doesn't seem to be running anymore
Closed 4 years ago by pingou. Opened 4 years ago by pingou.

Looking at: https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-rawhide-stage-build-pipeline/ the CI pipeline for rawhide in staging doesn't seem to be running anymore.

I've made a few builds and bodhi updates since November 15th but that is the last build there.


There was an issue with the trigger, that I've just fixed. But there seems to be another issue.

The current issue it appears the default CI parameters are empty when a job gets triggered by CI: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-stage-build-trigger/build?delay=0sec

@jimbair do you think it could be related to plugin upgrade? I think the trigger jobs used in production don't set any parameters so they are not affected by it...

@bgoncalv that seems likely given the timing. I do recall someone was tasked to update the plugin to add some data into the payload, so maybe this was it? I'll try to dig up the specifics for who was assigned the task.

@bgoncalv that seems likely given the timing. I do recall someone was tasked to update the plugin to add some data into the payload, so maybe this was it? I'll try to dig up the specifics for who was assigned the task.

The payload change was asked when we send message to Fedora-Messaging, but in this case is for the trigger and we still trigger on FedMSg, in this case FedMsg stage.

@bstinson is it possible the upgrade may have affected FedMsg stage? That seems unlikely but not entirely impossible...

@bgoncalv none from my side - I'm not sure what the best method is to sort this out, though I am curious if the payloads generated from before/after the migration can be examined in some way? So we can see what, if anything, changed. We already fixed the unix time, so maybe there's something on the fedmsg side that's also been changed in a similar manner?

I'm not sure how to do that short of rolling back the update to see if things behave again. @bstinson any thoughts?

Or maybe stage Bodhi stopped to send message to stage FedMsg?

Trying to figure this out, bodhi had some issues in stg indeed, hopefully it's now fixed.

I'll update this ticket ASAP :)

Ok, so I have a build, an update which was announced by bodhi: https://apps.stg.fedoraproject.org/datagrepper/id?id=2020-59d6de8c-4f48-4468-bc9d-55f845256ab4&is_raw=true&size=extra-large but I'm not seeing the corresponding CI run/messages for it (the NVR is: fedora-gather-easyfix-0.2.1-86.fc32)

Do you have more luck?

The build gets triggered, but it seems since the plugin upgrade the default parameters that we trigger the build are empty, and this causes it to fail on stage env.

https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/JenkinsfileStageBuildTrigger#L30

https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-stage-build-trigger/182/parameters/

I think we need @jimbair or @bstinson to check the plugin. Do we know what plugins got updated?

I reached out to @bstinson and he noticed that the staging pipeline is running the build but it's sending production messages:

https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-rawhide-stage-build-pipeline/176/console

<bstinson> so, the staging pipeline is running the koji build: https://koji.stg.fedoraproject.org/koji/taskinfo?taskID=90003965
<bstinson> but it's sending production messages:
<bstinson> 09:42:45 Message topic: org.centos.prod.ci.pipeline.allpackages-build.package.complete
<bstinson> and trying to download from production koji
<bstinson> 09:42:39 + koji download-task --arch=x86_64 --arch=src --arch=noarch --logs 90003965
<bstinson> 09:42:40 No such task: #90003965

So something, somewhere, is using prod when it should be using stage. Thoughts?

It feels like this should be handled here:

https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/src/org/centos/pipeline/PackagePipelineUtils.groovy#L331-L339

And we have "fedora-fedmsg-stage" and "FedoraMessagingStage" configured, so I'm not sure where the disconnect is...

As I mentioned on my first comment the problem is with the trigger job (https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-stage-build-trigger/) and not with (https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-rawhide-stage-build-pipeline/).

The issue is on trigger job we set parameters that change the default settings for the message provider and koji instance, the default is the production ones, and we change it to stage.
https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/JenkinsfileStageBuildTrigger#L30

But for some reason (it appears since the plugin upgrade) these parameters are not being set in the trigger job when it is triggered by CI message (https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-stage-build-trigger/217/parameters/).

If a manually try to "Build with Parameters" it works:
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-stage-build-trigger/222/
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-rawhide-stage-build-pipeline/177/parameters/

Sorry - I missed that part, :( But wouldn’t we hit the unknown provider block if it was empty?

https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/src/org/centos/pipeline/PackagePipelineUtils.groovy#L338

But thanks for the follow up - I can look a bit more today as well and ping Brian again. :)

We had a long discussion in #fedora-ci this morning to work through this, so I'll give a lengthy summary here:

  • The current plugin is here: https://github.com/bstinsonmhk/jms-messaging-plugin/tree/fix-fedmsg-timestamp (1.1.9)
  • The 2 changes are for Fedora Messaging support (which was not yet merged at the time) and a timestamp bugfix for fedmsg
  • The pipeline was setup to trigger on fedmsg but to publish FedoraMessaging
  • We opted to try and see if a FedoraMessaging trigger would fix the problem
  • @tflink was able to provide an example of a FedoraMessaging trigger, so @bgoncalv was able to update the staging config to use FedoraMessaging for the trigger
  • If the above fix works, we can upgrade to the latest stable module, re-test, and hopefully be good going forward
  • If the staging trigger CI message remains empty, there is a chance that 1.1.10's bugfix of "Message env var is not available in pipeline jobs (#143)" could solve our issue
  • To test, we would update jms-plugin and re-run
  • If still empty, the next step is to either upstream this change, or build our own one-off so we can still trigger off fedmsg and see if this fixes our issue https://github.com/bstinsonmhk/jms-messaging-plugin/commit/e24c97d117a33fb87680c5a7ac332a4243718b63

So hopefully the above gets us back to a working staging pipeline.

Pingou, if you can try another stage koji build so we can see if it triggers, that is our next step.

Thanks!

I've created a Jenkinsfile to test the trigger on Prod FedoraMessaging.

It triggered, but it still had the same issue with default parameters now being set.

Also, I've noticed the CI_MESSAGE changed.

example:

Fedora Messaging

{"deliveryTag":17,"msg":{"agent":"bodhi","artifact":{"builds":[{"component":"nano","id":1457844,"issuer":"kdudka","nvr":"nano-4.8-1.fc32","scratch":false,"task_id":41422439,"type":"koji-build"}],"id":"FEDORA-2020-b20bb0ca98-42150e1a8c4da1a9f1ddd1e75b05b21ac33da6f7","release":"f32","repository":"https://bodhi.fedoraproject.org/updates/FEDORA-2020-b20bb0ca98","type":"koji-build-group"},"contact":{"docs":"https://docs.fedoraproject.org/en-US/ci/","email":"admin@fp.o","name":"Bodhi","team":"Fedora CI"},"generated_at":"2020-02-07T12:05:36.924879Z","re-trigger":false,"version":"0.2.2"},"msg_id":"19eaa876-e0a8-49f4-9745-a6d662f84d75","timestamp":1581077137585,"topic":"org.fedoraproject.prod.bodhi.update.status.testing.koji-build-group.build.complete"}

FedMsg

{"version":"0.2.2","re-trigger":false,"agent":"bodhi","contact":{"docs":"https://docs.fedoraproject.org/en-US/ci/","team":"Fedora CI","name":"Bodhi","email":"admin@fp.o"},"artifact":{"release":"f32","type":"koji-build-group","id":"FEDORA-2020-b20bb0ca98-42150e1a8c4da1a9f1ddd1e75b05b21ac33da6f7","repository":"https://bodhi.fedoraproject.org/updates/FEDORA-2020-b20bb0ca98","builds":[{"nvr":"nano-4.8-1.fc32","task_id":41422439,"scratch":false,"component":"nano","type":"koji-build","id":1457844,"issuer":"kdudka"}]},"generated_at":"2020-02-07T12:05:36.924879Z"}

Basically the CI_MESSAGE from FedMsg is just a subset (the content of msg) from the message that FedoraMessaging sends. I'm not sure if this change was intended because I'd expect the CI_MESSAGE to just contain the message, like FedMsg does.

This is the message in datagrepper: https://apps.fedoraproject.org/datagrepper/id?id=2020-19eaa876-e0a8-49f4-9745-a6d662f84d75&is_raw=true&size=extra-large

Confirmed by my script:

12:18:45 - Retrieving update created                                                                  [DONE]
   Update automatically created : FEDORA-2020-b218270632
12:18:49 - Retrieving koji tags: ['f32-updates-candidate', 'f32-updates-testing-pending']             [DONE]
12:18:50 - bodhi to CI results in datagrepper returned  - ran for: 0s                                 [DONE]
12:33:52 - CI (running) results not found in datagrepper - ran for: 901s                              [FAILED]

Confirmed by my script:
12:18:45 - Retrieving update created [DONE]
Update automatically created : FEDORA-2020-b218270632
12:18:49 - Retrieving koji tags: ['f32-updates-candidate', 'f32-updates-testing-pending'] [DONE]
12:18:50 - bodhi to CI results in datagrepper returned - ran for: 0s [DONE]
12:33:52 - CI (running) results not found in datagrepper - ran for: 901s [FAILED]

Strange, unless I didn't configure https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-stage-build-trigger/ properly we were not even able to trigger using FedoraMessaging Stage...

So it seems there are 3 problems being discussed here, I opened different issues for them :)

  1. this issue we track the default parameters in the trigger that causes stage job to not be triggered properly. This issue happens doesn't matter which provider we use.

  2. Can't trigger on FedoraMessage Stage: https://pagure.io/fedora-ci/general/issue/94

  3. Problem with CI_MESSAGE format on FedoraMessaging: https://github.com/jenkinsci/jms-messaging-plugin/issues/165

This issue should be fixed now after updating jms-messaging-plugin to version 1.1.12.

@pingou can you confirm it?

Looks to be fine:

15:41:04 - bodhi to CI results in datagrepper returned  - ran for: 0s                                 [DONE]
15:41:42 - CI (running) results in datagrepper returned running - ran for: 37s                        [DONE]
15:48:32 - CI (complete) results in datagrepper returned error - ran for: 410s                        [DONE]

Now onto why greenwave in stg isn't sending messages :)

Thanks folks! :)

Metadata Update from @pingou:
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata