#8436 cleanup volumes in aws
Closed: Fixed 3 years ago by kevin. Opened 4 years ago by praiskup.

When I filter State: available and Created: < October 31, I see 693 unused volumes (looks like some leftovers from experiments). Those should be removed to not unnecessarily waste the free quota.


Metadata Update from @mizdebsk:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: aws

4 years ago

ok. I'd like to get a few more eyes on this before we nuke em.

here's the full list, almost none of them were made in 2019 all of them 'available' and 6gb...

@pfrields and @puiterwijk and @dustymabe Any of you see any reason to keep these around? speak now or I will nuke.

ebs-volumes-us-east1

if it's in the prod account I don't even have access to them. I don't know what they could be used for. +1 from me

A good idea here might be to create a lambda function which would check all volumes in a region once a day and delete ones which meet a certain criteria?

A possible set of criteria may be:

  • State: available
  • No tags exist (or possibly search for the existence of a certain tag such such a DoNotDelete)
  • Over 30 days old

I could write some ansible to create this but would need assistance running as I have no access. Let me know what you think

A good idea here might be to create a lambda function which would check all volumes in a region once a day and delete ones which meet a certain criteria?

Anything which is not run manually is very dangerous. My suggestion would be
to have something like:

$ delete_orphaned_images.sh --dry-run
volume<tab>size<tab>any other info
...

... so it can be run by admin explicitly, when the output from --dry-run say that
everything in the output is safe to remove.

@praiskup I agree that it could potentially be dangerous to have something which is not run manually and the first run should absolutely be a manual script with safeguards.

I will attempt a bash script to do just that.

I do think that some automation could be applied here if the correct details were agreed upon, even if it was just a report generated regularly of possible volumes which could be removed.

So, all these are old and I have no idea who created them for what reason.

So, I think we just need to make them unavailable, wait a while to see if anyone yells, then delete them.

I'm not sure we will have any of these moving forward. Our normal images are controlled right now via fedimg and it has it's own cleanup scripts, and we will be moving that to plume, which I think also does.

So, this is kind of a one off cleanup.

I have created a bash script to manage these old volumes. I cant run it myself as I don't have API access keys.

The script has the ability to take in a date or use the default of 3 months ago.
It will then list all the volumes that are in the available state created before this date.
It can then snapshot each of these with the tags
Name : OldVolumeSnapshot
VolumeId: "Id of Volume snapshotted"
MarkedForDelete: True
DeleteDate: "Date 1 month from now"

This will allow for all the volumes to be deleted (which the script also can do) as they will have been backed by snapshots.
The snapshots then can be deleted after that month if there are no screams(Which the script also does based on tags)

If desired this can all be skipped and the volumes can just be straight deleted.

Where would be a good place to store this or who will I send it to? @kevin maybe?

Can you attach the script here? And can you please ignore volumes which have
FedoraGroup tag set (those are accountable, and we should ping the responsible
people instead).

I have updated the script to ignore any volumes with tags as you suggested which is a good note as tagged volumes are likely owned. One note would be that volumes don't inherit tags of the instance they were attached to unless explicitly specified so it is probably best to only run this against volumes a couple of months old.

There is a help prompt if you run the script with -h flag and the -l flag will list effected volumes. The script only validates your aws credentials will work if no flag is presented

manage_ebs_volumes.sh

Yes, I can take and run this. Possibly early next week? If you want to be around in case of trouble @mobrien we can schedule some time mon/tue?

Great! thanks @kevin I am available at 19:30-22:00UTC (12:30-15:00PDT) either of those days which would hopefully suit you.

This fell off my radar. :( Perhaps we could work on this while openshift is installing on thursday?

Sounds good, I was going to bring it up yesterday if time allowed but that wasn't the case. There is one other AWS we can also discuss Thursday if time permits

ok. ran the script with -l and then -s to make snapshots and now with -d for delete.

We have a calendar event in 1 month to run -p (purge) if no one needs them back. :)

Thanks for all the help on this one...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata
Attachments 2
Attached 4 years ago View Comment
Attached 3 years ago View Comment