#10532 Fedora nightlies are gone from AWS
Closed: Fixed 8 months ago by mvadkert. Opened 10 months ago by mvadkert.

Hi,

All recent Fedora images are gone from AWS:

AWS_DEFAULT_REGION="us-east-2" aws ec2 describe-images --filter Name=name,Values=Fedora-Cloud-Base-Rawhide-2022* | jq '.Images | .[] | .Name' | sort
$ AWS_DEFAULT_REGION="us-east-2" aws ec2 describe-images --filter Name=name,Values=Fedora-Cloud-Base-35* | jq '.Images | .[] | .Name' | sort
"Fedora-Cloud-Base-35-1.1.aarch64-hvm-us-east-2-gp2-0"
"Fedora-Cloud-Base-35-1.1.aarch64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35-1.1.x86_64-hvm-us-east-2-gp2-0"
"Fedora-Cloud-Base-35-1.1.x86_64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35-1.2.aarch64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35-1.2.x86_64-hvm-us-east-2-gp2-0"
"Fedora-Cloud-Base-35-1.2.x86_64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35_Beta-1.1.aarch64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35_Beta-1.1.x86_64-hvm-us-east-2-gp2-0"
"Fedora-Cloud-Base-35_Beta-1.1.x86_64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35_Beta-1.2.aarch64-hvm-us-east-2-gp2-0"
"Fedora-Cloud-Base-35_Beta-1.2.aarch64-hvm-us-east-2-standard-0"
"Fedora-Cloud-Base-35_Beta-1.2.x86_64-hvm-us-east-2-gp2-0"
"Fedora-Cloud-Base-35_Beta-1.2.x86_64-hvm-us-east-2-standard-0"

This is an empty list, this is blocking Fedora CI


The old nightlies looked like they were cleaned up and a new one hasn't been pushed since Jan 20th.

The fedimg-upload is failing with

euca-import-volume: error (InvalidParameter): Parameter import-manifest-url = https://s3.amazonaws.com/fedora-s3-bucket-fedimg/dd3b2775-3843-40a7-b9e6-285a900ba51b//tmp/tmpufS4y_/Fedora-Cloud-Base-Rawhide-20220207.n.0.aarch64.raw.manifest.xml?AWSAccessKeyId=<redacted>&Expires=1646829000&Signature=<redacted> has an invalid format.

According to this the format looks ok to me so not sure what the issue is https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_ImportVolume.html

mmkay, backup plan ongoing ... will try to register some qcows myself

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, high-trouble, ops

10 months ago

FTR we uploaded images ourselves from qcow2s we found on kojipkgs, so the severity of this can go down.

just noting this is still blocking us and forcing us to sync the images ourselves

I will put some time into this. The fedimg tool is definitely not doing what it needed to do before.

Hey everyone, I think I found a way to work around this problem.
Just did the PR here: https://github.com/fedora-infra/fedimg/pull/162

Looks like thats working, it's been uploading today... @mvadkert can you confirm the images you are looking for are back?

I guess lets close this fixed. If you still see any issue or there's something further to do here, please feel free to reopen...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

10 months ago

@kevin sorry just now:

$ aws --profile fedora_us_east_2 ec2 describe-images --filter Name=name,Values=Fedora-Cloud-Base-Rawhide* | jq '.Images | .[] | .Name' | sort | grep 2022
(nothing except our images)

$ aws --profile fedora_us_east_2 ec2 describe-images --filter Name=name,Values=Fedora-Cloud-Base-35* | jq '.Images | .[] | .Name' | sort | grep 2022
(nothing except our images)

So reopening?

Metadata Update from @mvadkert:
- Issue status updated to: Open (was: Closed)

10 months ago

@phsmoura can you take a look? is it uploading ok?

still not seeing any positive change

I tested an upload with an image and the script is working, although it failed to upload in these regions:

us-east-1-standard-0
eu-south-1-standard-1
eu-north-1-standard-1
ap-northeast-3-standard-1
ap-southeast-3-standard-1
af-south-1-standard-1
ap-east-1-standard-1
me-south-1-standard-1
eu-west-3-standard-1
eu-south-1-gp2-0
eu-north-1-gp2-0
ap-northeast-3-gp2-0
ap-southeast-3-gp2-0
af-south-1-gp2-0
ap-east-1-gp2-0
me-south-1-gp2-0
eu-west-3-gp2-0

Still investigating

Are there any AWS request ID's in the failures? Have you identified a pattern or process that is consistent? If you have the request ID's I can track down the source of the failure.

Hey everyone, another update with a bit of recap. Still didn't find the root cause.

We were having an invalid parameter error because of the format of the URL. When running manually trigger_upload.py, it was complaining that it couldn't find /tmp. Therefore, I did this PR to work around that issue and now it's possible to run, manually, trigger_upload.py without getting any errors.

# trigger_upload.py -u <file.raw.xz URL> -c <compose id>
[2022-03-21 20:30:22][fedimg.uploader    INFO] Starting to process AWS EC2Service.

<...output omitted...>

[2022-03-21 20:56:22][fedimg.uploader    INFO] AWS EC2Service process is completed.

However, when looking at the logs the same issue persists, complaining the URL has invalid format. Which is a bit intriguing, because trigger_upload.py runs just fine manually, but throws errors when triggered.

euca-import-volume: error (InvalidParameter): Parameter import-manifest-url = https://s3.amazonaws.com/fedora-s3-bucket-fedimg/dd3b2775-3843-40a7-b9e6-285a900ba51b//tmp/tmpufS4y_/Fedora-Cloud-Base-Rawhide-20220207.n.0.aarch64.raw.manifest.xml?AWSAccessKeyId=<redacted>&Expires=1646829000&Signature=<redacted> has an invalid format.

Regarding the regions it failed to upload, I believe to be a secondary problem due to the fact it was happening since November 2020, according to the logs, and this issue was opened a month ago.

@davdunc I couldn't find any AWS request ID neither looking in the logs nor running these tests, but trigger_upload.py does the upload using a CLI tool called euca-import-volume from euca2ools, although it doesn't have a verbose mode, maybe it's possible to get more info from there.

On the regions thing, I see things like:

[2022-03-24 17:49:01][fedimg.services.ec2.ec2imgpublisher    INFO] Could not register with name: u'Fedora-Cloud-Base-36_Beta-1.4.aarch64-hvm-ap-northeast-3-standard-0'                                  
[2022-03-24 17:49:01][fedimg.services.ec2.ec2imgpublisher    INFO] Failed  

from the trigger_upload.py script...

(I am manually uploading the rc4 beta images so we can release beta. ;)

Hey everyone, I'm seeing uploads without errors in fedimg.
@mvadkert can you check if you can see the images?

Indeed, seems the imags are back. Thanks,

Metadata Update from @mvadkert:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

8 months ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog