#98 long running tasks are timing out in our deployments
Closed: Fixed None Opened 9 years ago by tflink.

At the moment, it seems like some of the tags we're working with are large and buildbot keeps killing the runtask step when it times out (1200 seconds is the default).

An example in dev is:
https://taskotron.fedoraproject.org/taskmaster/builders/x86_64/builds/56742

However, this has been happening in stg and prod as well.

Increase the command timeout to see if the problem is solved - start in dev, move to stg and prod if it is successful


This ticket had assigned some Differential requests:
D339
D344

change deployed to dev, waiting to see if it helps

Changed dev, seems to be working - the last large depcheck job just failed in stg and production.

I'm deploying to stg and asking for a freeze break on production.

Turns out that my timeout change didn't work - there have been timeouts on dev, stg and prod since the fix was applied.

Looking for another way to get the timeout failure numbers down

This seems to have been fixed with a combination of D339 and D344. Haven't seen any timeouts like this for a while, closing the ticket

Metadata Update from @tflink:
- Issue tagged with: infrastructure

6 years ago

Login to comment on this ticket.

Metadata