fedora-infrastructure

#8684 docs.stg.fp.o doesn't build, openshift cronjob confused

Closed: Fixed 4 years ago by asamalik. Opened 4 years ago by asamalik.

Describe what you would like us to do:

The staging fedora docs site is not rebuilding. There is the following error:

"Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew."

https://console.app.os.stg.fedoraproject.org/k8s/ns/docsbuilding/cronjobs/build/events

I believe that deleting this job:

https://console.app.os.stg.fedoraproject.org/k8s/ns/docsbuilding/jobs/build-1573408800

... would fix the problem, but I don't have permissions for that.

When do you need this to be done by? (YYYY/MM/DD)

Of course ASAP :) but I don't have any hard deadline

cverna commented 4 years ago

You can actually delete an object but you have to use ansible for that. You can also make the ansible task only play when a tag is specify.

You can find an example here --> https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/openshift-apps/coreos-koji-tagger.yml#n65

You can adapt the type of the object to delete the cron or just reuse that and delete the whole project and then recreate it running the playbook again.

Would that work for you ?

Metadata Update from @cverna:
- Issue priority set to: Waiting on External (was: Needs Review)

4 years ago

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Waiting on External)

4 years ago

Metadata Update from @cverna:
- Issue priority set to: Waiting on External (was: Waiting on Assignee)
- Issue tagged with: OpenShift

4 years ago

jibecfed commented 4 years ago

Blocking: https://pagure.io/fedora-infrastructure/issue/8691

cverna commented 4 years ago

@asamalik @jibecfed could you do the changes in ansible and reopen if that was not enough.

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

jibecfed commented 4 years ago

Cl=C3=A9ment, thanks for your answer=2E
It would be greatly appreciated if you could solve it=2E
You'll probably save Adam some pain=2E

I personally have no knowledge in Ansible and did not understand your solu=
tion=2E Maybe he didn't catch it either?

The issue isn't fixed, our users still are impacted=2E Users are community=
members testing the new i18n system=2E
Please reopen=2E

cverna commented 4 years ago

Ok this was done in https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=a29d21317fc211ef93a7a9a71e3be84530d72f8e

You can now delete the cronjob by running

sudo rbac-playbook -l os_masters_stg[0] -t delete openshift-apps/docsbuilding.yml

for staging or

sudo rbac-playbook -l os_masters[0] -t delete openshift-apps/docsbuilding.yml

for production.

To redeploy the cronjob you can run the playbook

sudo rbac-playbook openshift-apps/docsbuilding.yml

I have also open 2 PRs to update the Fedora version of the base image (moving to F31)
- https://pagure.io/fedora-docs/docs-fp-o/pull-request/131
- https://pagure.io/fedora-docs/docs-fp-o/pull-request/130

Hope that helps.

asamalik commented 4 years ago

Thanks for the PRs, I'll look at those asap, but not today.

It looks like the cronjob has been deleted instead of the job — so the site is not building. I don't have time to look into this today, but I believe putting the cronjob back, and making sure there is no job stuck as it was before will fix it.

Sorry, I don't know how or where to run the 'rbac-playbook' utility, and have no time to learn this at this very moment.

Metadata Update from @asamalik:
- Issue status updated to: Open (was: Closed)

4 years ago

cverna commented 4 years ago

Thanks for the PRs, I'll look at those asap, but not today.

It looks like the cronjob has been deleted instead of the job — so the site is not building. I don't have time to look into this today, but I believe putting the cronjob back, and making sure there is no job stuck as it was before will fix it.

I ll investigate, my understanding is that a Job is created by the cronjob everytime it runs.

Sorry, I don't know how or where to run the 'rbac-playbook' utility, and have no time to learn this at this very moment.

A bit more info available here (https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/ansible.html).

If you want something more self-serivce (ie no going through ansible) maybe communishift would be better fit for this project.

cverna commented 4 years ago

Thanks for the PRs, I'll look at those asap, but not today.
It looks like the cronjob has been deleted instead of the job — so the site is not building. I don't have time to look into this today, but I believe putting the cronjob back, and making sure there is no job stuck as it was before will fix it.

I ll investigate, my understanding is that a Job is created by the cronjob everytime it runs.

Ok so the cronjob is back in place and a job was triggered

oc -n docsbuilding get jobs
NAME              DESIRED   SUCCESSFUL   AGE
cron-1583319600   1         0            24m

a pod in running and building the docs

oc -n docsbuilding get pods
NAME                     READY     STATUS      RESTARTS   AGE
builder-build-35-build   0/1       Completed   0          8d
builder-build-36-build   0/1       Completed   0          3h
builder-build-37-build   0/1       Completed   0          59m
cron-1583319600-vrhvh    1/1       Running     0          25m

I am not sure how long it takes to build the docs, also it seems that there are a few errors in the logs. I can do much here to help investigate.

Let me know if you need anything else or if we can close this.

asamalik commented 4 years ago

Thanks!

The build itself is running, so we should be good.

(It takes sometime, but I can take it from there.)

Cheers!

Metadata Update from @asamalik:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Waiting on External

fedora-infrastructure

Source Code

#8684 docs.stg.fp.o doesn't build, openshift cronjob confused Closed: Fixed 4 years ago by asamalik. Opened 4 years ago by asamalik.

Describe what you would like us to do:

When do you need this to be done by? (YYYY/MM/DD)

Metadata

OpenShift

#8684 docs.stg.fp.o doesn't build, openshift cronjob confused

Closed: Fixed 4 years ago by asamalik. Opened 4 years ago by asamalik.