Seems the image uploading or some cleanup went wild, I cannot find any recent images for all Fedoras on our AWS account.
This is seriously hitting us, please resolve ASAP.
Metadata Update from @zlopez: - Issue priority set to: None (was: Needs Review) - Issue tagged with: Needs investigation, aws, high-gain, ops
I assume that the issue is with the fedimg, but I don't see any error in the journalctl -u fedmsg-hub on fedimg01.
journalctl -u fedmsg-hub
@mvadkert will upload the cloud images from https://koji.fedoraproject.org/koji/packageinfo?packageID=39730
I will upload F39, F40 and Rawhide under the expected names with same settings (legacy-bios) until this is resolved on Fedora infra side
I restarted it, but it's need to load all the datagrepper pages before start.
During the start of fedmsg-hub, I can see this warning, but I'm not sure if this is causing any problem:
Aug 05 10:34:46 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: [2024-08-05 10:34:46][fedmsg.crypto WARNING] Message specified an impossible crypto backend Aug 05 10:34:46 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: [2024-08-05 10:34:46][moksha.hub WARNING] Received invalid message RuntimeWarning('Failed to authn message.',)
The restart failed with following when trying to retrieve datagrepper pages. I will try to restart it once more.
Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: Unhandled Error Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: Traceback (most recent call last): Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: self.__bootstrap_inner() Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: self.run() Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/threading.py", line 765, in run Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: self.__target(*self.__args, **self.__kwargs) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: --- <exception caught here> --- Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/site-packages/twisted/python/threadpool.py", line 167, in _worker Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: result = context.call(ctx, function, *args, **kwargs) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: return self.currentContext().callWithContext(ctx, func, *args, **kw) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: return func(*args,**kw) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib/python2.7/site-packages/fedmsg/consumers/__init__.py", line 186, in _backlog Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: for message in self.get_datagrepper_results(then, now): Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib/python2.7/site-packages/fedmsg/consumers/__init__.py", line 222, in get_datagrepper_results Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: data = _make_query(page=page) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib/python2.7/site-packages/fedmsg/consumers/__init__.py", line 208, in _make_query Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: rows_per_page=100, page=page, start=then, end=now, order='asc' Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib/python2.7/site-packages/requests/models.py", line 802, in json Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: return json.loads(self.text, **kwargs) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: return _default_decoder.decode(s) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: obj, end = self.raw_decode(s, idx=_w(s, 0).end()) Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: raise ValueError("No JSON object could be decoded") Aug 05 11:12:01 fedimg01.iad2.fedoraproject.org fedmsg-hub[25166]: exceptions.ValueError: No JSON object could be decoded
we have workarounded the problem in Testing Farm, looking forward for getting this back to working state again, it is a bit of manual work for us
It seems like the fedimg is stuck on the same command over and over:
Aug 07 08:38:54 fedimg01.iad2.fedoraproject.org fedmsg-hub[27918]: [2024-08-07 08:38:54][fedimg.utils INFO] Starting the command: ['euca-describe-conversion-tasks', 'import-vol-02eb96fd623a6f4d8', '--region', u'us-east-1'] Aug 07 08:38:54 fedimg01.iad2.fedoraproject.org fedmsg-hub[27918]: [2024-08-07 08:38:54][fedimg.utils INFO] Finished executing the command: ['euca-describe-conversion-tasks', 'import-vol-02eb96fd623a6f4d8', '--region', u'us-east-1']
@jcline @kevin Do you have any idea what could be happening here?
@zlopez It's stuck polling to retrieve the volume id, and was not able to fetch the volume to create the snapshot.
If you have the credentials, you can check the status of the task and see what's failing using https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-conversion-tasks.html
Last thing that was missing for replacement of fedimg was https://pagure.io/fedora-infrastructure/issue/12005 and that got closed on Flock.
I'm closing this as we already have a better solution.
Metadata Update from @zlopez: - Issue close_status updated to: Will Not/Can Not fix - Issue status updated to: Closed (was: Open)
FYI, in case anyone cares... the problem with fedimg was that it had gotten stuck importing an image. I canceled that import, but then it got stuck because it didn't see it complete. Then, I just cleared it from trying to handle it's backlog and it's uploading again fine now (but all the ones in previous days are just missing).
We have the replacement ready, but need to do a few things more permissions wise on aws, so will do that next week and retire fedimg for good then.
In the mean time fedimg can keep working until we take it down. ;)
I still cannot find any latest images on AWS for all the releases, reopening.
Metadata Update from @mvadkert: - Issue status updated to: Open (was: Closed)
@zlopez @kevin I am re-opening this issue, as the state is still not satisfying for us. The other issue linked is closed also, but it did not help ...
I just checked the fedimg and it seems to be stuck again :-/
I don't know how to unstuck it as the documentation for fedimg is not great.
AWS support for fedora-image-uploader got flipped on yesterday so I wouldn't try to unstick fedimg, and I would recommend turning it off.
This morning's upload with fedora-image-uploader failed with
[2024-08-14 08:30:21,906 fedora_image_uploader.handler WARNING] Failed to import image: An error occurred (UnauthorizedOperation) when calling the CopyImage operation: You are not authorized to perform this operation. User: arn:aws:iam::125523088429:user/fedimg-upload is not authorized to perform: ec2:CopyImage on resource: arn:aws:ec2:us-east-1::image/* because no identity-based policy allows the ec2:CopyImage action. Encoded authorization failure message: kMGOx6Xl1wM-cZ0PstV9KIWIieCVGxVISMYzz-aSiDAHnlzXVDfvFqMKcqhO_tsPK7zSRt_yb-hydSFS9qwFJHGrrNm1gqum7iBrLnt7Pr3hAw-60gBzGNF3EBgj0dDLs41VeFnw_TS0dGx20ce9EXdhNEGzFIfENjVrGV6zk6p1OslKMGTjAAqXk24wOd6uj4MUQc9FPrt62srny-7Sk6JHGtkiT-XDlkexbwM8FGaygT2_Eex8pSbxxTadBQExVOUS5tNtQs6MAi6ixJiZlE6HGQqI1J-ptS2Kk5ylPcvh1qNRxLRPdAz_dstiebZ2i6h-PeXdXGDnP6C_kW1rh48RWeB2tyOJts-RmSxvWWbhNT2UD75zd_4x6RhLhTxgrjI4NfNCo3JVsRh0WW9Yg5qx30jfA5BDjEr9T_3uux5UrUoeehlam9teiZZu3UzRjvZ2TSirMQk91GgxUfQH7MoJyBwTo0XxT9bxWHZFhDdIawk3QYGEb419ASb2PrUE5DUX5bKGTH7Zt5R6
The image snapshot import task succeeded, but the job failed when attempting to copy the snapshot to additional regions, and the uploader currently does not handle this gracefully; it's stuck in a loop retrying the import: Failed to import image: An error occurred (InvalidAMIName.Duplicate) when calling the RegisterImage operation: AMI name Fedora-Cloud-Base-AmazonEC2.aarch64-40-20240814.0 is already in use by AMI ami-0fd35a338fdb86f8a.
Failed to import image: An error occurred (InvalidAMIName.Duplicate) when calling the RegisterImage operation: AMI name Fedora-Cloud-Base-AmazonEC2.aarch64-40-20240814.0 is already in use by AMI ami-0fd35a338fdb86f8a
I'm going to work on a fix on the code side of things to make it handle this better, but the AWS permissions need a bit more adjusting (cc @kevin).
I've found a place I didn't update correctly. Can you trigger a retry? The perms should now be the same as they were in staging for fedimg-upload
Hmm, still seems to be failing. I deployed a quick code fix so it retries properly every minute, you should be able to tail the logs on the pod in https://console-openshift-console.apps.ocp.fedoraproject.org/k8s/ns/cloud-image-uploader/deployments/cloud-image-uploader and see it retry as you adjust permissions.
Feel free to ping me on matrix if you would like to poke at this more synchronously, I will be around pretty much all day (except for one quick meeting).
ok, I fixed that and then it copyied to other regions, but then crashed the pod..
56f7c69-gr2pl, message=[2024-08-14 17:55:31,337 fedora_image_uploader.handler INFO] Coping image to region us-west-1 Aug 14 17:55:31 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=[2024-08-14 17:55:31,684 fedora_image_uploader.handler INFO] Coping image to region us-west-2 Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=[2024-08-14 17:55:32,001 fedora_messaging.twisted.consumer ERROR] Received unexpected exception from consumer Consumer(queue=cloud-image-uploader, callback=<fedora_image_uploader.handler.Uploader object at 0x7f7c5bd1bb60>) Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=Traceback (most recent call last): Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_messaging/message.py", line 118, in get_name Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= return _class_to_schema_name[cls] Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ~~~~~~~~~~~~~~~~~~~~~^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=KeyError: <class 'fedora_image_uploader_messages.publish.AwsPublishedV1'> Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=The above exception was the direct cause of the following exception: Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=Traceback (most recent call last): Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_messaging/twisted/consumer.py", line 220, in _read_one Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= yield d Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/twisted/python/threadpool.py", line 269, in inContext Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= result = inContext.theWork() # type: ignore[attr-defined] Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/twisted/python/threadpool.py", line 285, in <lambda> Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= inContext.theWork = lambda: context.call( # type: ignore[attr-defined] Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/twisted/python/context.py", line 117, in callWithContext Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= return self.currentContext().callWithContext(ctx, func, *args, **kw) Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/twisted/python/context.py", line 82, in callWithContext Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= return func(*args, **kw) Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_image_uploader/handler.py", line 102, in __call__ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= handler(image, ffrel) Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_image_uploader/handler.py", line 447, in handle_aws Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= message = AwsPublishedV1( Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_messaging/message.py", line 343, in __init__ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= self._properties = properties or self._build_properties(headers) Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_messaging/message.py", line 350, in _build_properties Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= headers["fedora_messaging_schema"] = get_name(self.__class__) Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= ^^^^^^^^^^^^^^^^^^^^^^^^ Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= File "/srv/image-uploader/venv/lib64/python3.12/site-packages/fedora_messaging/message.py", line 120, in get_name Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message= raise TypeError( Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=TypeError: The class <class 'fedora_image_uploader_messages.publish.AwsPublishedV1'> is not in the message registry, which indicates it is not in the current list of entry points for "fedora_messaging". Please check that the class has been added to your package's entry points. Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=[2024-08-14 17:55:32,098 fedora_messaging.cli ERROR] Unexpected error occurred in consumer Consumer(queue=cloud-image-uploader, callback=<fedora_image_uploader.handler.Uploader object at 0x7f7c5bd1bb60>): <twisted.python.failure.Failure builtins.TypeError: The class <class 'fedora_image_uploader_messages.publish.AwsPublishedV1'> is not in the message registry, which indicates it is not in the current list of entry points for "fedora_messaging". Please check that the class has been added to your package's entry points.> Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=[2024-08-14 17:55:32,188 fedora_messaging.twisted.protocol INFO] Waiting for 0 consumer(s) to finish processing before halting Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=[2024-08-14 17:55:32,188 fedora_messaging.twisted.protocol INFO] Finished canceling 0 consumers Aug 14 17:55:32 worker03.ocp.iad2.fedoraproject.org : namespace_name=cloud-image-uploader, container_name=cloud-image-uploader, pod_name=cloud-image-uploader-6dc56f7c69-gr2pl, message=[2024-08-14 17:55:32,192 twisted INFO] Stopping factory FedoraMessagingFactoryV2(parameters=<URLParameters host=rabbitmq.fedoraproject.org port=5671 virtual_host=/pubsub ssl=True>, confirms=True)
Okay, just to keep folks in the loop here: I fixed the above issue, along with a couple others yesterday and today the image did seem to upload. However, I realized there were at least two other issues:
I think I've fixed both issues so tomorrow's image should hopefully be both public and replicated to all the expected regions. I'll check in tomorrow and on Monday to make sure things look alright.
@kevin it looks like the permissions need to be slightly expanded to allow the uploader to mark the images public:
Failed to import image: An error occurred (UnauthorizedOperation) when calling the ModifyImageAttribute operation: You are not authorized to perform this operation. User: arn:aws:iam::125523088429:user/fedimg-upload is not authorized to perform: ec2:ModifyImageAttribute on resource: arn:aws:ec2:us-east-1::image/ami-06c456ec7c23c1b13 because no identity-based policy allows the ec2:ModifyImageAttribute action.
Added.
AWS update:
I can now see Fedora 40, Fedora 41 and Fedora 42 images :)
Outstanding issues:
Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240818.0 ami-0bc5a8a603a1e880d Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240818.0 ami-0bed7c983e4ac2fa1
AWS update: I can now see Fedora 40, Fedora 41 and Fedora 42 images :) Outstanding issues: No Fedora 39 images
I'll take a look at this.
Any idea why Fedora Rawhide is uploaded as Fedora 42 (I assume?)
Yeah, I can fix this today.
Any idea why there are duplicates? for example I see for example: Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240818.0 ami-0bc5a8a603a1e880d Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240818.0 ami-0bed7c983e4ac2fa1
My guess is this is related to a couple of bugs in the uploader which caused it to fail part-way through. When I hacked this together I wasn't sure if it was the correct set of API calls so now that it is mostly working, I will go back and add a bit more error handling.
Okay, regarding the Fedora 39 images, I think this is related (sort of) to the Kiwi switch in Fedora 40. The image uploader is looking for images that include AmazonEC2 in the image (like those built for https://koji.fedoraproject.org/koji/packageinfo?packageID=39730).
Were the images used for EC2 prior to F40 the raw images from https://koji.fedoraproject.org/koji/packageinfo?packageID=21547 ?
ok, more problems appeared. The disk layout seems changed, the images added are not automatically extending to the full disk size, this works with the images we uploaded ourselves.
New cloud images
# df -h | grep nvme0 Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p4 4.0G 620M 2.9G 18% / /dev/nvme0n1p4 4.0G 620M 2.9G 18% /home /dev/nvme0n1p4 4.0G 620M 2.9G 18% /var /dev/nvme0n1p3 966M 146M 755M 17% /boot /dev/nvme0n1p2 100M 17M 84M 17% /boot/efi tmpfs 379M 4.0K 379M 1% /run/user/0 # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS zram0 252:0 0 3.7G 0 disk [SWAP] nvme0n1 259:0 0 100G 0 disk ├─nvme0n1p1 259:1 0 2M 0 part ├─nvme0n1p2 259:2 0 100M 0 part /boot/efi ├─nvme0n1p3 259:3 0 1000M 0 part /boot └─nvme0n1p4 259:4 0 98.9G 0 part /var /home /
For reference with our own uploaded image Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240805.0:
Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240805.0
[root@ip-172-31-23-59 ~]# df -h | grep nvme0 Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p4 99G 625M 98G 1% / /dev/nvme0n1p4 99G 625M 98G 1% /home /dev/nvme0n1p4 99G 625M 98G 1% /var /dev/nvme0n1p3 966M 145M 756M 17% /boot /dev/nvme0n1p2 100M 17M 84M 17% /boot/efi tmpfs 381M 4.0K 381M 1% /run/user/0
ok, more problems appeared. The disk layout seems changed, the images added are not automatically extending to the full disk size, this works with the images we uploaded ourselves. New cloud images ``` df -h | grep nvme0 Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p4 4.0G 620M 2.9G 18% / /dev/nvme0n1p4 4.0G 620M 2.9G 18% /home /dev/nvme0n1p4 4.0G 620M 2.9G 18% /var /dev/nvme0n1p3 966M 146M 755M 17% /boot /dev/nvme0n1p2 100M 17M 84M 17% /boot/efi tmpfs 379M 4.0K 379M 1% /run/user/0 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS zram0 252:0 0 3.7G 0 disk [SWAP] nvme0n1 259:0 0 100G 0 disk ├─nvme0n1p1 259:1 0 2M 0 part ├─nvme0n1p2 259:2 0 100M 0 part /boot/efi ├─nvme0n1p3 259:3 0 1000M 0 part /boot └─nvme0n1p4 259:4 0 98.9G 0 part /var /home / ``` For reference with our own uploaded image Fedora-Cloud-Base-AmazonEC2.x86_64-40-20240805.0: [root@ip-172-31-23-59 ~]# df -h | grep nvme0 Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p4 99G 625M 98G 1% / /dev/nvme0n1p4 99G 625M 98G 1% /home /dev/nvme0n1p4 99G 625M 98G 1% /var /dev/nvme0n1p3 966M 145M 756M 17% /boot /dev/nvme0n1p2 100M 17M 84M 17% /boot/efi tmpfs 381M 4.0K 381M 1% /run/user/0
```
Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p4 4.0G 620M 2.9G 18% / /dev/nvme0n1p4 4.0G 620M 2.9G 18% /home /dev/nvme0n1p4 4.0G 620M 2.9G 18% /var /dev/nvme0n1p3 966M 146M 755M 17% /boot /dev/nvme0n1p2 100M 17M 84M 17% /boot/efi tmpfs 379M 4.0K 379M 1% /run/user/0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS zram0 252:0 0 3.7G 0 disk [SWAP] nvme0n1 259:0 0 100G 0 disk ├─nvme0n1p1 259:1 0 2M 0 part ├─nvme0n1p2 259:2 0 100M 0 part /boot/efi ├─nvme0n1p3 259:3 0 1000M 0 part /boot └─nvme0n1p4 259:4 0 98.9G 0 part /var /home / ```
I believe this is related to https://bodhi.fedoraproject.org/updates/FEDORA-2024-d44bd4abd9
https://bodhi.fedoraproject.org/updates/FEDORA-2024-d44bd4abd9 should fix it. The thing that broke it was https://bodhi.fedoraproject.org/updates/FEDORA-2024-67f6df918c .
Apologies for the unclear comment. Yes, I meant related in that it'll be fixed by the update.
Also, yesterday I adjusted things so that Rawhide images are named Rawhide (and Fedora 41 image names include Fedora-41-Prerelease). This morning I also updated things so F39 images should be uploaded (so hopefully tomorrow we'll see one) and fixed an issue where there were probably duplicate AMIs in us-east-1.
Do let me know if you spot any other issues.
How is the 'Prerelease' added? ie, we will need to flip that for GA right? Also, having that may make them less discoverable, but I am not sure...I like making them more clearly labeled.
It uses fedfind and if the compose label includes "rc" it doesn't mark it as a pre-release. I re-used what I had for the Azure image naming. I believe it will "just work" and not need any manual intervention at GA time beyond marking one of the RC images as the official release (and maybe that could be automated), but it's not been running for a GA yet so I will watch with interest during the F41 release.
fedfind
We could leave it out of the name if that impacts discoverability and have it as a tag on the image if that's preferrable (we can also tag it prerelease and leave it in the name, whatever folks find most useful). Right now the image has a few tags I picked fairly arbitrarily and I don't know if they're useful. I'm happy to adjust them.
For reference, here's a description of today's F41 aarch64 image:
aws ec2 describe-images --region=us-east-1 --image-ids=ami-02f62c012aa55d03b { "Images": [ { "Architecture": "arm64", "CreationDate": "2024-08-21T11:34:29.000Z", "ImageId": "ami-02f62c012aa55d03b", "ImageLocation": "125523088429/Fedora-Cloud-Base-AmazonEC2.aarch64-41-Prerelease-20240821.0", "ImageType": "machine", "Public": true, "OwnerId": "125523088429", "PlatformDetails": "Linux/UNIX", "UsageOperation": "RunInstances", "State": "available", "BlockDeviceMappings": [ { "DeviceName": "/dev/sda1", "Ebs": { "DeleteOnTermination": true, "SnapshotId": "snap-050d5d7082d24fcc8", "VolumeSize": 6, "VolumeType": "gp3", "Encrypted": false } } ], "Description": "Fedora-Cloud-41-Prerelease.20240821.0 (aarch64) for HVM Instances", "EnaSupport": true, "Hypervisor": "xen", "Name": "Fedora-Cloud-Base-AmazonEC2.aarch64-41-Prerelease-20240821.0", "RootDeviceName": "/dev/sda1", "RootDeviceType": "ebs", "SriovNetSupport": "simple", "Tags": [ { "Key": "fedora-version", "Value": "41.20240821.0" }, { "Key": "end-of-life", "Value": "2025-08-21" }, { "Key": "fedora-compose-id", "Value": "Fedora-41-20240821.n.0" }, { "Key": "fedora-release", "Value": "41" }, { "Key": "fedora-subvariant", "Value": "Cloud_Base" } ], "VirtualizationType": "hvm", "BootMode": "uefi-preferred", "DeprecationTime": "2026-08-21T11:34:29.000Z", "ImdsSupport": "v2.0", "DeregistrationProtection": "disabled" } ] }
And today's Rawhide:
aws ec2 describe-images --region=us-east-1 --image-ids=ami-0c4c072fb6a1f20c6 { "Images": [ { "Architecture": "arm64", "CreationDate": "2024-08-21T10:34:51.000Z", "ImageId": "ami-0c4c072fb6a1f20c6", "ImageLocation": "125523088429/Fedora-Cloud-Base-AmazonEC2.aarch64-Rawhide-20240821.0", "ImageType": "machine", "Public": true, "OwnerId": "125523088429", "PlatformDetails": "Linux/UNIX", "UsageOperation": "RunInstances", "State": "available", "BlockDeviceMappings": [ { "DeviceName": "/dev/sda1", "Ebs": { "DeleteOnTermination": true, "SnapshotId": "snap-06569246a9ab43c26", "VolumeSize": 6, "VolumeType": "gp3", "Encrypted": false } } ], "Description": "Fedora-Cloud-Rawhide.20240821.0 (aarch64) for HVM Instances", "EnaSupport": true, "Hypervisor": "xen", "Name": "Fedora-Cloud-Base-AmazonEC2.aarch64-Rawhide-20240821.0", "RootDeviceName": "/dev/sda1", "RootDeviceType": "ebs", "SriovNetSupport": "simple", "Tags": [ { "Key": "fedora-subvariant", "Value": "Cloud_Base" }, { "Key": "fedora-version", "Value": "42.20240821.0" }, { "Key": "fedora-release", "Value": "rawhide" }, { "Key": "end-of-life", "Value": "2025-08-21" }, { "Key": "fedora-compose-id", "Value": "Fedora-Rawhide-20240821.n.0" } ], "VirtualizationType": "hvm", "BootMode": "uefi-preferred", "DeprecationTime": "2026-08-21T10:34:51.000Z", "ImdsSupport": "v2.0", "DeregistrationProtection": "disabled" } ] }
Outstanding issues: No Fedora 39 images
https://apps.fedoraproject.org/datagrepper/v2/search?topic=org.fedoraproject.prod.fedora_image_uploader.published.v1.aws.Cloud_Base.39.x86_64 https://apps.fedoraproject.org/datagrepper/v2/search?topic=org.fedoraproject.prod.fedora_image_uploader.published.v1.aws.Cloud_Base.39.aarch64
39 images are being published for AWS as of this morning.
I also spot-checked and didn't see duplicates with the latest images so I think this is now in good shape.
Awesome.
I think we can close this now?
Please re-open if there's anything further needing doing here.
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
@jcline is this naming change expected pls?
Fedora-Cloud-Base-AmazonEC2.x86_64-41-Prerelease-20240822.0
Is the Prerelease string needed?
Prerelease
@jcline is this naming change expected pls? Fedora-Cloud-Base-AmazonEC2.x86_64-41-Prerelease-20240822.0 Is the Prerelease string needed?
See https://pagure.io/fedora-infrastructure/issue/12110#comment-926295. I'm flexible, and I have no particular preference. People expressed that the Azure images should make it clear pre-release images were pre-release, so I just aligned the AWS images with that.
I'd recommend having a discussion in the #cloud matrix room or on the Cloud SIG mailing list regarding naming if you'd like to see something different.
@jcline ack, sorry missed that comment, so it is an expected and wanted change. Ack on that.
Log in to comment on this ticket.