#8126 purge-ami script not processing correctly
Opened 10 months ago by kevin. Modified 3 months ago

We have a script in releng/scripts/clean-amis.py

It runs every 10 days and is intended to delete any ami images we have on amazon that are older than 10 days.

Sadly, it's not working correctly and we have a bunch of old images.

From the cron output:

Traceback (most recent call last):
  File "./clean-amis.py", line 188, in <module>
    amis = _get_nightly_amis_nd(delta=864000, end=int(end))
  File "./clean-amis.py", line 84, in _get_nightly_amis_nd
    compose_id = msg['compose']['compose_id']
TypeError: string indices must be integers

We should try and get this fixed asap.

CC: @sayanchowdhury and @pfrields


@kevin also suggested the output be changed so that the report is less likely to be ignored on sysadmin-main. For instance, no output if everything works, otherwise the traceback is seen. Something like that?

I asked @sayanchowdhury to take a look at this as a matter of urgency, tomorrow AM his TZ. Hopefully it's a quick fix, if not, manually deleting the offending images and then investigating is the recommended course of action.

I haven't raised a PR yet but couple of points

  • I've fixed the issue mentioned in this issue, this is a trivial one

  • The script considers all the image fedimg uploads (other than GA) have compose_type set to nightly in fedmsg messages which is not true. A big percentage of composes though built everyday have compose_type set to production due to which it leaves out a lot of AMIs

  • I talked with @mohanboddu who told we can treat anything with compose_label set to RC-YYYYMMDD as nightly, and something with RC-X.Y as GA.

  • I'm testing if I just consider any composes have the compose_label having date to be nightly, but this needs to be thouroughly checked.

Also, with the migration to mantle, this would become easier as plume (sub-project of mantle responsible for AMI uploads) supports tagging. So we can add post release step for releng to tag the images with a tag like ga-release and any other images will be scheduled to be deleted.

@mohanboddu

Also, with the migration to mantle, this would become easier as plume (sub-project of mantle responsible for AMI uploads) supports tagging. So we can add post release step for releng to tag the images with a tag like ga-release and any other images will be scheduled to be deleted.

We can discuss it on how to make use of it. But I need more details.

We can discuss it on how to make use of it. But I need more details.

Sure.

The PR is merged which fixes the issue. the is yet to be deployed. I plan to test the script manually and do couple of dry runs before actually deploying.

Is this deployed now?

So, is everything done here? Or should I go check to see myself?

I can give it a look and update it in a few days. I've met with an accident, so can look once I'm okay again.

Sure. The reports look like it's processing ok now... but do check it.

Hope you're ok and have a good recovery!

Sure. The reports look like it's processing ok now... but do check it.

Can I get copied on those reports and maybe access to the project that runs it (if it's in openshift)?

All the reports go to https://pagure.io/ami-purge-report so you're welcome to watch that project if you want.

The script is not in openshift, it's on a releng machine I think...

All the reports go to https://pagure.io/ami-purge-report so you're welcome to watch that project if you want.

Thanks!

The script is not in openshift, it's on a releng machine I think...

Probably a good candidate for moving into openshift as it's lightweight. Thanks kevin for the links.

Login to comment on this ticket.

Metadata