Issue #7506: release-monitoring.org - delete cronjob pod - fedora-infrastructure

fedora-infrastructure

#7506 release-monitoring.org - delete cronjob pod

Closed: Fixed 5 years ago by zlopez. Opened 5 years ago by zlopez.

Describe what you need us to do:
I need to delete pod anitya-1547485200-6hstr on production openshift in release-monitoring.org project. It is pulling wrong image. I did a change in the cron.yml to fix this, but it's still using the same pod instead of starting a new one. After deletion of mentioned pod the job should be restarted automatically.
When do you need this? (YYYY/MM/DD)
As soon as possible
When is this no longer needed or useful? (YYYY/MM/DD)
When cron job will no longer be used.
If we cannot complete your request, what is the impact?
Anitya couldn't check new versions of projects.

I think we need to allow application owner to delete jobs or pods, since this seems to be one of the first step when things are going wrong.

Is there any reason for not allowing application owner to delete pods ?

kevin commented 5 years ago

Deleted.

I have no objection to adding that to app-owner perms...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

zlopez commented 5 years ago

Unfortunately the issue is still there. Here is the log from oc describe pod anitya-1547485200-gl782:

  FirstSeen     LastSeen        Count   From                                            SubObjectPath                           Type            Reason  Message                                                   
  ---------     --------        -----   ----                                            -------------                           --------        ------  -------                                                   
  11h           38m             131     kubelet, os-node04.phx2.fedoraproject.org       spec.containers{release-monitoring-web} Warning         Failed  Failed to pull image "release-monitoring/release-monitoring-web:latest": rpc error: code = Unknown desc = Error reading manifest latest in docker.io/release-monitoring/release-monitoring-web: errors:                                                                      
denied: requested access to the resource is denied
unauthorized: authentication required

  11h   28m     133     kubelet, os-node04.phx2.fedoraproject.org       spec.containers{release-monitoring-web} Normal  Pulling pulling image "release-monitoring/release-monitoring-web:latest"                  
  11h   8m      2917    kubelet, os-node04.phx2.fedoraproject.org       spec.containers{release-monitoring-web} Normal  BackOff Back-off pulling image "release-monitoring/release-monitoring-web:latest"         
  11h   3m      2938    kubelet, os-node04.phx2.fedoraproject.org       spec.containers{release-monitoring-web} Warning Failed  Error: ImagePullBackOff

Not sure what is happening the cronjob image is now specified to same value as before the change that caused this. This is happening on staging and production :-(

The cronjob is pulling this image docker-registry.default.svc:5000/release-monitoring/release-monitoring-web:latest, which should be same as the frontend is using.

cverna commented 5 years ago

@zlopez for some reason it is trying to pull the image from the dockerhub see docker.io here --> docker.io/release-monitoring/release-monitoring-web

zlopez commented 5 years ago

I was able to fix this in staging, when I manually changed the YAML, which still used the previous image path without docker-registry.default.svc:5000/ prefix.
I hope the next scheduled job will use the new cron.yml instead of the old.

But I can't edit yaml on production. So the issue is still there.

zlopez commented 5 years ago

@cverna Yes, it is using the old cron.yml instead of new one. I'm not sure why

zlopez commented 5 years ago

So I did a little experiment and ran the playbook again to be sure, that there is new cron.yml on openshift.

After this I deleted the pod on staging (It looks like I have the permissions for this on staging) and it was recreated immediately, but the pod was still using the old YAML config. Not sure what more I can do with it :-(

zlopez commented 5 years ago

According to the @cverna we need to delete job first, which makes sense.

Unfortunately I can't do this by myself neither on staging or production. Could you restart the job @kevin?

Metadata Update from @zlopez:
- Issue status updated to: Open (was: Closed)

5 years ago

zlopez commented 5 years ago

After restart of the job (thanks @mizdebsk ), the issue is gone.
I'm closing this again.

Metadata Update from @zlopez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

Needs Review

fedora-infrastructure

Source Code

#7506 release-monitoring.org - delete cronjob pod Closed: Fixed 5 years ago by zlopez. Opened 5 years ago by zlopez.

Metadata

#7506 release-monitoring.org - delete cronjob pod

Closed: Fixed 5 years ago by zlopez. Opened 5 years ago by zlopez.