#8821 FMN workers crashing
Closed: Fixed 3 months ago by cverna. Opened 3 months ago by kevin.

workers on notifs-backend01 are crashing. ;(

Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]: [2020-04-08 23:52:45][celery.worker CRITICAL] Unrecoverabl
e error: TypeError("'NoneType' object is not callable",)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]: Traceback (most recent call last):
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/wor
ker.py", line 203, in start
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     self.blueprint.start(self)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     step.start(parent)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/bootsteps.
py", line 370, in start
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     return self.obj.start()
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/con
sumer/consumer.py", line 318, in start
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     blueprint.start(self)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/bootsteps.
py", line 119, in start
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     step.start(parent)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/con
sumer/consumer.py", line 594, in start
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     c.loop(*c.loop_args())
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/loo
ps.py", line 47, in asynloop
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     consumer.consume()
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/kombu/messaging.p
y", line 476, in consume
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     self._basic_consume(T, no_ack=no_ack, nowait=False)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/kombu/messaging.p
y", line 597, in _basic_consume
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     no_ack=no_ack, nowait=nowait)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/kombu/entity.py",
 line 737, in consume
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     arguments=self.consumer_arguments)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/channel.py",
 line 1564, in basic_consume
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     wait=None if nowait else spec.Basic.ConsumeOk,
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 59, in send_method
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     return self.wait(wait, returns_tuple=returns_tuple)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/abstract_cha
nnel.py", line 79, in wait
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     self.connection.drain_events(timeout=timeout)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/connection.p
y", line 471, in drain_events
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     while not self.blocking_read(timeout):
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/connection.p
y", line 477, in blocking_read
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     return self.on_inbound_frame(frame)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/method_frami
ng.py", line 77, in on_frame
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     callback(channel, msg.frame_method, msg.frame_args, ms
g)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/connection.p
y", line 481, in on_inbound_method
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     method_sig, payload, content,
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/abstract_cha
nnel.py", line 128, in dispatch_method
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     listener(*args)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/amqp/channel.py",
 line 1599, in _on_basic_deliver
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     fun(msg)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/kombu/messaging.p
y", line 623, in _receive_callback
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     return on_m(message) if on_m else self.receive(decoded
, message)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/con
sumer/consumer.py", line 568, in on_task_received
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     callbacks,
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/str
ategy.py", line 145, in task_message_handler
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     handle(req)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/wor
ker.py", line 221, in _process_task_sem
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     return self._quick_acquire(self._process_task, req)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/kombu/async/semap
hore.py", line 62, in acquire
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     callback(*partial_args, **partial_kwargs)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/wor
ker.py", line 226, in _process_task
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     req.execute_using_pool(self.pool)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/worker/req
uest.py", line 531, in execute_using_pool
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     correlation_id=task_id,
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib/python2.7/site-packages/celery/concurrenc
y/base.py", line 155, in apply_async
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     **options)
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:   File "/usr/lib64/python2.7/site-packages/billiard/pool.p
y", line 1486, in apply_async
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]:     self._quick_put((TASK, (result._job, None, func, args,
 kwds)))
Apr 08 23:52:45 notifs-backend01.phx2.fedoraproject.org celery[6031]: TypeError: 'NoneType' object is not callable

Perhaps we can figure out what messages are causing this and purge them?

Failing that, updating to a supported release and fixing the traceback would be lovely too.

```


Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-trouble, medium-gain

3 months ago

I used a bit of the nuclear option , and did a celery purge -A fmn to clear the tasks in the queue, before that I ran celery inspect scheduled and there were only couple tasks in waiting so I don't think we have lost much things.

All workers are back and running now.

I have opened https://github.com/fedora-infra/fmn/issues/307 to see if we can at least catch the exception.

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 months ago

Login to comment on this ticket.

Metadata