Our pipeline can't send message, it seems the problem started at least 5hrs ago...
17:42:08 Sending message for job 'fedora-build-pipeline-trigger'. 17:42:08 FATAL: Unhandled exception in perform: 17:42:08 FATAL: java.lang.NullPointerException 17:42:08 at com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker.sendMessage(RabbitMQMessagingWorker.java:278) 17:42:08 at com.redhat.utils.MessageUtils.sendMessage(MessageUtils.java:133) 17:42:08 at com.redhat.jenkins.plugins.ci.CIMessageNotifier.doMessageNotifier(CIMessageNotifier.java:145) 17:42:08 at com.redhat.jenkins.plugins.ci.pipeline.CIMessageSenderStep$Execution$1.run(CIMessageSenderStep.java:197) 17:42:08 at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) 17:42:08 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 17:42:08 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 17:42:08 at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 17:42:08 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 17:42:08 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 17:42:08 at java.base/java.lang.Thread.run(Thread.java:834) 17:42:08 17:42:08 exception in finally
https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-build-pipeline-trigger/462217/
@siddharthvipul1 @dkirwan Does this look to be an issue with CentOS networking or Fedora networking?
Metadata Update from @smooge: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: ci, groomed, medium-gain, medium-trouble
hmm, initially I had thought it will have something to do with server-side @zlopez, hey! do you think it has something to do with fedora-messaging Jenkins plugin we worked on a few months ago?
@smooge: just to answer, I don't think it's a networking issue on cluster.. but will have to investigate more to say for certain
Another example here: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-build-pipeline-trigger/462283/
I tried re-running it, but it appears to be consistently failing: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-build-pipeline-trigger/462286/console
@siddharthvipul1 Do we have any logs from the server side (RabbitMQ) describing the issue?
We haven't updated the Jenkins plugin for quite some time,, and it worked before, so I wonder what has changed on the server side recently.
Also, this issue completely blocks the Fedora Rawhide gating as we can not send test results for dist-git tests.
We are able to publish to fedmsg, in case we can't solve this issue soon we could workaround it and use fedmsg.
It is 1 line change on https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/src/org/centos/pipeline/PackagePipelineUtils.groovy#L266 and set fedora-fedmsg.
fedora-fedmsg
If it helps, the error says to check the server logs as @bookwar suggestioned:
13:54:41 ERROR: Exception sending message. Please check server logs.
I can look at the logs, what username are you using to connect to rabbitmq?
So new issue: our jenkins instance went down at some point in the last hour:
https://jenkins-continuous-infra.apps.ci.centos.org/
But it appears the user is centos-ci per:
https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/config/s2i/jenkins/master/configuration/init.groovy#L37
Something is going on with pods erroring out:
https://console.apps.ci.centos.org:8443/console/project/continuous-infra/browse/pods
Update: Jenkins is finally up, but the workers aren't up yet...
Update 2: okay, workers are up now too. Not sure what happened - overloaded if I had to guess?
Huh, I tried restarting one job after the pod-restart, and it seems to have worked:
https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-build-pipeline-trigger/462458
So that's...interesting? I'm not sure if you found anything or if a pod restart did something
I see no error for the centos-ci user in the rabbitmq logs :-/
Well it appears to be working for now, so we can close this out and we can re-open and reference it if it returns....
It is working now, before we close the ticket do we actually know what caused the problem?
@bgoncalv I don't believe so - jenkins died at random, then when it came back up, it just worked. As @abompard mentioned, they didn't see anything in the logs from our user either. :(
Let's close this as there is nothing else we can do.
Metadata Update from @bgoncalv: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.