#9179 Copr stucked - - can not start VMs
Closed: Fixed 2 years ago by praiskup. Opened 2 years ago by praiskup.

Because we still don't have any other VMs than AWS (fedora lab migration), we completely rely on the fact that we can spawn workers there, but AWS now claims:

Instance creation failed => RequestLimitExceeded: Request limit exceeded.

I don't have more data now, I don't know the precise numbers ... but Copr
has been running without interrupting for about 6 weeks now and could
start several tens of thousands VMs in that period.


I'm not sure whether this is about API rate limit, or instance spawn rate limit.
Any clue? I decreased the amount of concurrently starting VMs to 4 at the same
time, to see if we can get down the rate limit ... but I don't really know.

Weird thing is that we worked fine for quite some time. Is it possible that other
users than Copr waste the (some) shared account rate limits?

Hmm, now the message is different:

We currently do not have sufficient a1.xlarge capacity in the Availability Zone you requested (us-east-1c). Our system will be working on provisioning additional capacity. You can currently get a1.xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1e.

So perhaps we have to tweak our spawn scripts as well.

So, this looks like a amazon issue.

Looking at our 'personal status dashboard' it shows 3 issues:

EC2 runinstances api issue:

We want to provide more information on this issue and progress toward resolution. At 5:18 AM PDT, we began experiencing increased API errors that originated from one of our EC2 sub-systems that is responsible for managing EC2 instances. This resulted in increased error rates for some EC2 APIs and affected new instance launches in a Single Availability Zone. The root cause of the error rates affecting the sub-system has been identified and engineers are currently working on resolving the issue. Existing instances remain unaffected by this issue.

EC2 api issue:

We want to provide more information on this issue and progress toward resolution. At 5:18 AM PDT, we began experiencing increased API errors that originated from one of our EC2 sub-systems that is responsible for managing EC2 instances. This resulted in increased error rates for some EC2 APIs and affected new instance launches in a Single Availability Zone. The root cause of the error rates affecting the sub-system has been identified and engineers are currently working on resolving the issue. Existing instances remain unaffected by this issue.

EC2 operational issue:

Increased API Error Rates 

[06:21 AM PDT] We have identified the cause of the increased API error rates in a single Availability Zone in the US-EAST-1 Region and continue working towards resolution. Customers experiencing errors launching new EC2 instances may attempt to launch their EC2 instances in another Availability Zone. Existing running instances are unaffected.

[08:23 AM PDT] We want to provide more information on this issue and progress toward resolution. At 5:18 AM PDT, we began experiencing increased API errors that originated from one of our EC2 sub-systems that is responsible for managing EC2 instances. This resulted in increased error rates for some EC2 APIs and affected new instance launches in a Single Availability Zone. The root cause of the error rates affecting the sub-system has been identified and engineers are currently working on resolving the issue. Existing instances remain unaffected by this issue.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on External (was: Needs Review)
- Issue tagged with: groomed, outage

2 years ago

Is this fixed now? At least my builds went through.

Yes, seems to be OK now.

Metadata Update from @praiskup:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Looking at our 'personal status dashboard'

You probably mean Personal Health Dashboard (our account doesn't allow us
to see the data, but it doesn't matter - copr works again).

Login to comment on this ticket.

Metadata