UI says:
Spot Request Failed You are not authorized to perform this operation. Hide launch log Creating security groups Successful (sg-0359ec8d33209de46) Authorizing inbound rules Successful Requesting Spot Instances Failure
FAS group: aws-copr
Metadata Update from @mizdebsk: - Issue tagged with: aws
Can you try again now?
Metadata Update from @kevin: - Issue priority set to: Waiting on Reporter (was: Needs Review)
Still the same issue:
Spot Request Failed You are not authorized to perform this operation. Hide launch log Creating security groups Successful (sg-0d1643ae167188f78) Authorizing inbound rules Successful Requesting Spot Instances FailureRetry
ok. Made another change... try again?
Also, if you can try with the cli and see if it provides more information on what was failed?
fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": "Instance creation failed => UnauthorizedOperation: You are not authorized to perform this operation."}
Cli prints the same useless message. I tried through ansible playbook, though, because I don't know how to use the cli properly, yet. From the ami permission problem #8421 though I don't think it would be more descriptive.
Metadata Update from @smooge: - Issue priority set to: Waiting on Assignee (was: Waiting on Reporter)
So I think this was sorted out by @kevin . Can we close this ticket ?
Last time (last week) when I talked to @msuchy this did not work yet.
It does not , just hit the same thing with our account also
Not sure if this is still an issue. When a user needs to create a spot fleet they need the following IAM permissions to be able to pass a role. It is required that they specify a IAM fleet role of some sort
"iam:ListRoles", "iam:PassRole", "iam:ListInstanceProfiles"
AWS offers a default IAM fleet role which can be passed to the instances called aws-ec2-spot-fleet-tagging-role that will allow instances to be requested, launched, terminated, and tagged automatically.
aws-ec2-spot-fleet-tagging-role
It is a problem. I get this when trying to launch an instance with a spot request:
You are not authorized to perform this operation. Encoded authorization failure message: 4Mi3M1NKwlumeltBTaHYtYz7EKgpS_f9O1BygIZhMfVP3woxQdpeou_j5woo3A1FBB4g57Y0314Y9Kt8R4zlhFfxMqIunIU2FKghF9eglNzTQ7Ihq-J28ACkpr-_vaJEv04EJwm-9gd824W-7PNooUWfIJpXQCKKBugl1xEf4XUaXuhKGPeKehxdlgvasFnunWy9lbxn9ZcdTrIMiz-9pDTGWUMBdeqbkoaRaW4ib3-ZJf3zPdW_pLGVV6bLS1zNnhI4ysYJtFPYw9l0L9-gQ4VZaV6LGLTwWfzmss-FI3huf5rfUD8v8o9NeoWy9Du0MjoRsYTTzDHW3otXJKx73Km1Adqnf-9NUpasuss8obAH8M_Rh87vqd5odjrQn5yMzb3Vx3gGck14ABa6AU_pP-mDPKQtR5cIEH5qLLNwtsyarOlmZ_PZCYohTpEHPQU3R5v5BcZWdnUQ5nt9yTgsMbIm14eAg5a7zCsVsbeCX6HNSdEeotlzD1KWlrtc1m2zPevcvJnZ5OHk1ZZF6Lr-nWDG5nuH1638lg
@mvadkert could you run the command aws sts decode-authorization-message --encoded-message <encoded_string> using the encoded string from your comment to decode it please
aws sts decode-authorization-message --encoded-message <encoded_string>
@mobrien thanks for the command :)
An error occurred (AccessDenied) when calling the DecodeAuthorizationMessage operation: User: arn:aws:iam::125523088429:user/fedora-ci-testing-farm is not authorized to perform: sts:DecodeAuthorizationMessage
quite a funny message :D
because it works from cmdline :)
But I use there the automation user we have:
"Arn": "arn:aws:iam::125523088429:user/fedora-ci-testing-farm"
Ok looks like we will need an administrator to run the sts command to decode the message. The reason it has been encoded is that that it could potentially contain privileged information. It should hopefully contain the permissions that are causing the issue though.
Ah right, could be :) just the message confused me3
@kevin could we maybe sometime resolve also this one pls?
So from the output of that decoded message that Kevin provided (thanks Kevin) it appears as though the assumed role aws-fedora-ci/mvadkert does not have the permissions ec2:RequestSpotInstancespermission
aws-fedora-ci/mvadkert
ec2:RequestSpotInstances
I have added ec2:RequestSpotInstances to both fedora-ci-ec2 and copr-ec2 policies. Can you both try again and see what happens?
Spot Request Failed The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances. Hide launch log Creating security groups Successful (sg-0ac77ca30c28155e5) Authorizing inbound rules Successful Requesting Spot Instances Failure
Looks like there needs to be a service-linked role for requesting spot instances. These allow AWS service to make requests on your behalf (ec2 requests in this case). There is a default Role for this scenario, AWSServiceRoleForEC2Spot
AWSServiceRoleForEC2Spot
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html#service-linked-roles-spot-instance-requests
Ah ha.
Added.
@praiskup try now?
Seems like some max limit for spot instances:
Spot Request Failed Max spot instance count exceeded Hide launch log Creating security groups Successful (sg-016538d7de562c291) Authorizing inbound rules Successful Requesting Spot Instances Failure
Looks like the permissions are good. AWS places an initial limit on spot instances per account, as they do with most resources, to help them not get overloaded unexpectedly. There will usually be an overall limit and an instance type limit.
These limits can be seen in the limits section in the EC2 dashboard. We have the default limit which is 20, this limit is per region also.
limits
This can be raised with a request to AWS support and they will almost always accept the request if it is reasonable
Well our reason is that we would like to use spot instances for our CI workloads, where spot instaces are the perfect fit :)
@mobrien thanks for the note to default limite, I was not even aware there is such a thing.
@davdunc Can we get this limit raised on our account? Or should we use the normal process? or something else?
@kevin I asked to raise it on our internal account and it was not an issue, they raised it to 100 instances without any additional questions.
Are we supposed to request it ourselves (copr team)?
@praiskup nope, it is a per account setting afaik. So the Fedora admins owning the AWS account need to do it:
https://console.aws.amazon.com/support/home?#/case/create?issueType=service-limit-increase&limitType=service-code-ec2-spot-instances
We do not have permissions for that anyway
To clarify here, @davdunc is our community contact at amazon. Since out account is a community account, I wanted to ask him about it before blindly following the normal process. It may be that he can just raise it for us, or that there's some other process for community accounts, or indeed we should just follow the normal process. But I wanted to actually find out.
I'll try and catch him tuesday and see if I can find out more.
David was nice enough to increase the limit to 100 (in us-east-1, please let me know if other regions are needed)
I think we are all done here? Please reopen if I missed something we still need to do.
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
@kevin, @davdunc thank you! I suppose the limit 100 is shared among whole fedora community account (not only for aws-copr); since the spot instances are almost perfect fit for Copr (we plan to run most of the builders as spot instances, as we can afford restarting them) I'd expect that we'll be able to utilize ~150 spot instances in peak times (the next copr release will be much, much more flexible from this POV).
aws-copr
Could the limit be raised to something around 150(for copr) + others, so say 250?
Metadata Update from @praiskup: - Issue status updated to: Open (was: Closed)
I tried to launch single instance as aws-copr, and it still doesn't work:
Spot Request Failed Max spot instance count exceeded Hide launch log Requesting Spot Instances FailureRetry
Can we set the limit per sub-account within our community account?
What instance type are you planning on using?
updated request to max 150.
The increase is under evaluation. This is expected and can take some time to move to approved. I'll update with the approval.
Currently we use i3.large for x86_64, and a1.xlarge for aarch64. But that's because those are the cheapest types that have specs >= than the builders we previously had in OpenStack.
Our plan was to have some set of normal instances to guarantee some throughput, and complement it by spot instances.
I want to underline again that the new copr release will be much more flexible, and we'd like go up only when there's high load (copr would do this automatically). But in normal situations we plan to start even smaller set of workers than we have now (now we have 50 x86 and 10 arms, I think we could go with "15+5" to "150+50").
We basically didn't want to have different instance types across our builders at the beginning, but indeed minimal percentage of builds in copr needs to go with i3.large; there's an open possibility to use smaller instance type in general, and and the large variants on explicit copr user's request. (@msuchy, fyi, as we didn't want to concentrate on this topic ATM).
Thank you.
That's probably why I got "Max spot instance count exceeded". What is the current (previous) limit? Am I able to see (aws-copr credentials) what instances ate the quota when it is counted for whole fedora account?
Thank you for looking at this!
What instance type are you planning on using? Currently we use i3.large for x86_64, and a1.xlarge for aarch64. But that's because those are the cheapest types that have specs updated request to max 150. Thank you. The increase is under evaluation. This is expected and can take some time to move to approved. I'll update with the approval. That's probably why I got "Max spot instance count exceeded". What is the current (previous) limit? Am I able to see (aws-copr credentials) what instances ate the quota when it is counted for whole fedora account? Thank you for looking at this!
Currently we use i3.large for x86_64, and a1.xlarge for aarch64. But that's because those are the cheapest types that have specs
That's probably why I got "Max spot instance count exceeded". What is the current (previous) limit? Am I able to see (aws-copr credentials) what instances ate the quota when it is counted for whole fedora account? Thank you for looking at this!
Still in review. The original request was rejected and the reason was "account compromised" I am chasing down why customer service identified that status. I have a feeling that it is related to the internal account payer for an external account. It's non-standard.
@davdunc any news here?
@davdunc did you manage to get it through?
@davdunc This might go through now that we cleaned up that status issue? Can you try again now?
So, I currently see:
Service quota Applied quota value AWS default quota value Adjustable All F Spot Instance Requests 128 0 Yes All G Spot Instance Requests 128 0 Yes All Inf Spot Instance Requests 128 0 Yes All P Spot Instance Requests 128 0 Yes All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests 1,152 0 Yes All X Spot Instance Requests 128 0 Yes
So, it seems to be 128 for most types. IS that enough? can you try again now?
Metadata Update from @smooge: - Issue assigned to kevin - Issue tagged with: medium-gain, medium-trouble, ops
@praiskup Had any chance to test? :)
Sorry for the delay. 128 is just fine quota, though I still can not create spot instances:
You are not authorized to perform this operation. Encoded authorization failure message: Ua9pyA8IHRMqgdLouI2LqOi2n6y0IgYn-tbZpsp-5W3_SzSU_6vcGs5O147-_N9cTMgFlca3wqE8bhs321uKxnqJULh9s_oxii6dMh5fgOrIwfBpo6XQjQqKxzNOxba05ArEn7poVyGQKnT0BFqxsDtjtBmXjNA-4qqqBNV8rzfoWalCPqRKr4NAObeeRtHr2_b7JgURaj-JQe_GhaNA8DfBI-RfZy9Ii4V5_n1QLX1T1Q14RCI9fTpvebGN5vTzACQo6oPMPH08j3yLxf8b9TQYt-GjNt7UKlBJcBx10keiDauFKXhmZbucu523phGoU3nsY3gn5O4eWuHPCPApTv5uZHC5XFM5GF9KDU5cGZj9F5wVyq2mvmpQ74_GTfBpgSNr1q1VFqEEPUxeqL7-swJ1Id5a7xLrlYdSwVd1moNVBOoxq3z2K3rX-RjJWQ4s_t2UtpdJIOHubdHP7KFidJN8v-aog3ZgitgqzNLzAqTB5p7GTquAhQTnZSiGU0E6x3gumGOyLX6YRZpQpr2pAeFWtTQGHLY
Ok it looks like the permission in question was in the wrong resource block. I have updated it now and that should hopefully solve the spot instance issue
Something is still missing:
You are not authorized to perform this operation. Encoded authorization failure message: Yb-HPDNRS_LE4tB65stdVV1NO_qMv1geILYjWTbreh4LbYDQpExI0peW736-ZfhmVsIVQm2DLMT3YpXI20IeolOXIhRB8ZunfQcxMhTY6N3VlrGGLhZH2FFnnUBN74u9euRJp7B52bwpXH1PikcdbuQbjIt6lB8i3_96WcwAohJ4edhB4hqhGC4zyoQGzub_LT7j2uge7K3YJXY3ikPGFDWUDxeYs8LR36w3U_BbMoJVrzl08dvWGYxl9Wn07QzASuytnf1G2ZwbaNdTdWXoVzmwgkwMpU6PsIEObkV-0oHlVc8Uw2e70Z-EXoeBMCXy12MM3iwpnkSsAEfRWAAfIh4XiFJYvs0WM0sGFHhefu8Noj71Ch-9dqDuZDuxqH3aIRYgcr3RwUfCJjrb3o8HVJZJLWKo4oFX8gnRMTHEnJi_PIq-dO-thg9Sx2QhUZn-KTFf22CzIV1fEyqDwy1ELy96hJj-LNxVZ87jZlEAub_vmCiDToO7TgZGHaD04HjYfzFi7g8dAhLYHhUV9dt1LjqNOm8GELU
Adding the EC2:RunInstance permission seemed to fix the issue
Metadata Update from @mobrien: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Issue status updated to: Open (was: Closed)
Log in to comment on this ticket.