#9104 https://jenkins-continuous-infra.apps.ci.centos.org can't send message to FedoraMessaging
Closed: Fixed 3 years ago by bgoncalv. Opened 3 years ago by bgoncalv.

Our pipeline can't send message, it seems the problem started at least 5hrs ago...

17:42:08  Sending message for job 'fedora-build-pipeline-trigger'.
17:42:08  FATAL: Unhandled exception in perform: 
17:42:08  FATAL: java.lang.NullPointerException
17:42:08    at com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker.sendMessage(RabbitMQMessagingWorker.java:278)
17:42:08    at com.redhat.utils.MessageUtils.sendMessage(MessageUtils.java:133)
17:42:08    at com.redhat.jenkins.plugins.ci.CIMessageNotifier.doMessageNotifier(CIMessageNotifier.java:145)
17:42:08    at com.redhat.jenkins.plugins.ci.pipeline.CIMessageSenderStep$Execution$1.run(CIMessageSenderStep.java:197)
17:42:08    at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
17:42:08    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
17:42:08    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
17:42:08    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
17:42:08    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
17:42:08    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
17:42:08    at java.base/java.lang.Thread.run(Thread.java:834)
17:42:08  
17:42:08  exception in finally

https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-build-pipeline-trigger/462217/


@siddharthvipul1 @dkirwan Does this look to be an issue with CentOS networking or Fedora networking?

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: ci, groomed, medium-gain, medium-trouble

3 years ago

hmm, initially I had thought it will have something to do with server-side
@zlopez, hey! do you think it has something to do with fedora-messaging Jenkins plugin we worked on a few months ago?

@smooge: just to answer, I don't think it's a networking issue on cluster.. but will have to investigate more to say for certain

@siddharthvipul1 Do we have any logs from the server side (RabbitMQ) describing the issue?

We haven't updated the Jenkins plugin for quite some time,, and it worked before, so I wonder what has changed on the server side recently.

Also, this issue completely blocks the Fedora Rawhide gating as we can not send test results for dist-git tests.

We are able to publish to fedmsg, in case we can't solve this issue soon we could workaround it and use fedmsg.

It is 1 line change on https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/src/org/centos/pipeline/PackagePipelineUtils.groovy#L266 and set fedora-fedmsg.

If it helps, the error says to check the server logs as @bookwar suggestioned:

13:54:41  ERROR: Exception sending message. Please check server logs.

I can look at the logs, what username are you using to connect to rabbitmq?

Something is going on with pods erroring out:

https://console.apps.ci.centos.org:8443/console/project/continuous-infra/browse/pods

Update: Jenkins is finally up, but the workers aren't up yet...

Update 2: okay, workers are up now too. Not sure what happened - overloaded if I had to guess?

Huh, I tried restarting one job after the pod-restart, and it seems to have worked:

https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-build-pipeline-trigger/462458

So that's...interesting? I'm not sure if you found anything or if a pod restart did something

I see no error for the centos-ci user in the rabbitmq logs :-/

Well it appears to be working for now, so we can close this out and we can re-open and reference it if it returns....

It is working now, before we close the ticket do we actually know what caused the problem?

@bgoncalv I don't believe so - jenkins died at random, then when it came back up, it just worked. As @abompard mentioned, they didn't see anything in the logs from our user either. :(

Let's close this as there is nothing else we can do.

Metadata Update from @bgoncalv:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata