#11694 AWS account for openQA cloud enablement effort
Closed: Fixed 3 months ago by kevin. Opened 6 months ago by adamwill.

Describe what you would like us to do:


We would like an AWS (sub)account for an effort to run openQA in the cloud. @anitazha and @dcavalca are kindly running a project with Collabora to try and get a PoC cloud openQA deployment running; it would be great if Fedora could provide the AWS resources for this (Meta could provide some, but they would come with inconvenient security restrictions like no access to the console).

The usage should not be too high as this is an experimental / proof-of-concept thing - the deployment would not be testing everything all the time like the real deployments do. If we manage to get it to the point of potentially being able to work for real, we would then re-evaluate who should own it, fund it, maintain it etc.

When do you need this to be done by? (YYYY/MM/DD)


It's not super urgent, but in the next week or two would be great.


Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

6 months ago

Metadata Update from @kevin:
- Issue assigned to kevin

5 months ago

So, do you just need console access here? Or cli access? Or both?

I assume most of the use would be ec2 spinning up and down workers/instances? Are there any other aws services that would be needed?

We use groups to access console, and we do already have a 'aws-qa' with @tflink in it currently. Could this be used? Or should this be a different group entirely? (aws-qa-openqa?)

AWS console access (so we can spin up instances, create security groups, etc.) and ideally API/CLI (so we can do so programmatically down the road). Just EC2 I believe, unless openQA has stateful artifacts we want to preserve (cc @adamwill), in that case we'd want S3 as well.

Not sure exactly how to answer "unless openQA has stateful artifacts we want to preserve". It produces things we need to keep around for some time, yes - test logs, essentially (both literal log files, and the screenshots you can view for completed tests in the web UI).

I'm not sure what @tflink is using the aws-qa group for...can you let us know, Tim?

Yeah, in that case I suspect we'll want S3 or EFS as well -- it's definitely cheaper to keep those in storage than on a running EC2 instance. Can we get those added as well? We can yank them out later if it turns out we don't need them, but it'll be useful while prototyping for the best solution here.

OK, yeah, if that's the concern then I'd say we're definitely going to want that. In the non-cloud deployment, we mount the places where that kind of data gets stored - /var/lib/openqa/testresults and /var/lib/openqa/images - as NFS shares from a reliable storage server, and have backups configured for it; this means we can actually redeploy the server instances from scratch any time we like and they still have all the existing test data in them. We'd definitely want the analogous setup for a cloud deployment, with the test data stored somewhere safe and separated from the server instances, and automatically backed up.

The other thing that has to be kept separately from the server for this to work is the database, of course. For the current deployment, each openQA instance uses a database stored on a separate database server.

I'm not sure what @tflink is using the aws-qa group for...can you let us know, Tim?

That was for testing ROCm on AWS back when I thought that was possible with the g4ad instances. It may end up possible in the future if an instance type with a ROCm compatible GPU becomes a thing but for now, I'm not using it for anything.

So, shall we just reuse that one and have @tflink get you the creds/add you all to the group?

@kevin how do we access the console now that @tflink has added us to the group?

I think this has been answered in Matrix. The only way I know of to access the console is to use the link in the sysadmin guide:

https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/aws-access/

@kevin davide noted that this seems to give access to a role on the main Fedora account rather than to a subaccount...are we concerned about that, or do we trust him? :D

Yes, it's all under one account... there's no way to do 'subaccounts' (aws has a way, but it's impossible to do on community accounts)

However, permissions are such that each 'role' should only be able to see/do anything with things that are tagged with their group name.

ie, if you login with the 'qa' role, you shouldn't even see resources tagged with 'infra' or 'fedora-ci'.

As noted in that guide anytime you make a new thing, you should tag it with 'FedoraGroup' 'qa'. If you fail to tag something, it could be cleaned up as untagged.

when I log in to AWS (which I assume is the qa role), I can see COPR instances and a bunch of other stuff - I just assumed that was expected. The copr instances, for example, do show up in the console but I do see errors about alarms on those instances due to missing permissions.

I've never tried to touch those instances, though and thus, I have no idea if I could disrupt anything.

Let me check if I made some kind of typo or something...

I don't think so. Perhaps we always allowed seeing other resources, but you should definitely be disallowed to change their state, etc.

We could look at tightening down things, but then you run into creating things often has no tag until you tag it...

Subaccounts are not possible in the sponsored accounts because they are already subaccounts for an organization. It's a flat hierarchy only, so you can't have a tree.

so we're stuck trusting this guy?! we all know what happens when you trust facebook...=)

ok, So I think that seeing things (but not being able to do much with them) is expected.

Is there anything more needed on this ticket? Or can we close?

I'll go and close this. If there's further to do please re-open or file a new issue. Thanks!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 months ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog