Issue #8736: Communishift Volume Claim - fedora-infrastructure

fedora-infrastructure

#8736 Communishift Volume Claim

Closed: Fixed 4 years ago by kevin. Opened 4 years ago by jimbair.

Within the Communishift cluster under the fedora-ci project, I am trying to deploy a standard Jenkins instance using oc new-app jenkins-persistent, but it's failing on persistent volume creation with the error no persistent volumes available for this claim and no storage class is set

https://console-openshift-console.apps.os.fedorainfracloud.org/k8s/ns/fedora-ci/persistentvolumeclaims/jenkins/events

I did some testing on Thursday trying to deploy a custom instance, so I'm not sure if I have claims against volumes I can't see within the project? I'm fairly green with openshift, so it could be me. I tried a second project, deleted both and created fedora-ci a second time so it was a clean slate, but it still doesn't want to load.

I looked up the docs, and supposedly we should be allowed up to 5 volumes per https://fedoraproject.org/wiki/Infrastructure/Communishift#Access

I appreciate any help y'all can send my way. Thanks!

-Jim

jimbair commented 4 years ago

My best guess is from my initial tests, I created 5 volumes which are now held as a result of policy even though they have been deleted:

https://docs.openshift.com/container-platform/4.3/storage/understanding-persistent-storage.html

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: communishift

4 years ago

kevin commented 4 years ago

There were several issues... this pvc was for a rwo (read write once) 1gb volume. There were not any available. I created a number more and also created some more rwx (read write many) ones. Your jenkins job should pick up one of the new volumes. ;)

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

jimbair commented 4 years ago

Thanks! I tried to re-deploy jenkins again, but I am getting errors (but they're new, at least!)

Generated from kubelet on os-node08.fedorainfracloud.org
MountVolume.SetUp failed for volume "openshift-01gb-17" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/5cb1fd33-6557-11ea-92f8-405cfda580af/volumes/kubernetes.io~nfs/openshift-01gb-17 --scope -- mount -t nfs stornator01.fedorainfracloud.org:/srv/nfs/openshift-01gb-17 /var/lib/kubelet/pods/5cb1fd33-6557-11ea-92f8-405cfda580af/volumes/kubernetes.io~nfs/openshift-01gb-17 Output: Running scope as unit: run-r3818582d79954af79893b36959bc665b.scope mount.nfs: Failed to resolve server stornator01.fedorainfracloud.org: Name or service not known

Metadata Update from @jimbair:
- Issue status updated to: Open (was: Closed)

4 years ago

kevin commented 4 years ago

Oh man, typo in the hostname. ;( Will fix.

kevin commented 4 years ago

ok, should be fixed now. Please test and reopen if you still see an issue. Thanks!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

jimbair commented 4 years ago

Looks like we have a new one this morning:

MountVolume.SetUp failed for volume "openshift-01gb-20" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/6af13298-22c5-4c75-a75d-bc0a3d3d489f/volumes/kubernetes.io~nfs/openshift-01gb-20 --scope -- mount -t nfs storinator01.fedorainfracloud.org:/srv/nfs/openshift-01gb-20 /var/lib/kubelet/pods/6af13298-22c5-4c75-a75d-bc0a3d3d489f/volumes/kubernetes.io~nfs/openshift-01gb-20 Output: Running scope as unit: run-r79ac1e749ee0455eb4e17ee3ff83b002.scope mount.nfs: mounting storinator01.fedorainfracloud.org:/srv/nfs/openshift-01gb-20 failed, reason given by server: No such file or directory

Metadata Update from @jimbair:
- Issue status updated to: Open (was: Closed)

4 years ago

kevin commented 4 years ago

odd. I see that volume bound now just fine...

But I did find a permissions issue. :( Can you try unbinding and rebinding and see if it works?

jimbair commented 4 years ago

I tried re-deploying top to bottom and it happened again:

MountVolume.SetUp failed for volume "openshift-01gb-15" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ef2b47f2-76c9-4c8d-83c3-7e71b58ad1e1/volumes/kubernetes.io~nfs/openshift-01gb-15 --scope -- mount -t nfs storinator01.fedorainfracloud.org:/srv/nfs/openshift-01gb-15 /var/lib/kubelet/pods/ef2b47f2-76c9-4c8d-83c3-7e71b58ad1e1/volumes/kubernetes.io~nfs/openshift-01gb-15 Output: Running scope as unit: run-rd28295ac474a464c845f5642a809a4ce.scope mount.nfs: mounting storinator01.fedorainfracloud.org:/srv/nfs/openshift-01gb-15 failed, reason given by server: No such file or directory

I know it's a bit heavy handed, but considering the fluid nature of the infra, I like to make sure the full app deployment behaves as expected. But it appears we may still be missing permissions or have some typos.

Also, please feel free to login to the fedora-ci project and fiddle around if it helps.

kevin commented 4 years ago

Humf. ok, can I take some of your time later this afternoon to work on this? After 1pm PDT I should be free, you can catch me in #fedora-admin on irc (nick: nirik)

I can focus on it then and make sure it's working for you...

annamarie commented 4 years ago

That would be great! I appreciate it so much.

On Tue, Mar 17, 2020, 1:14 PM Kevin Fenzi pagure@pagure.io wrote:

kevin added a new comment to an issue you are following:
``
Humf. ok, can I take some of your time later this afternoon to work on
this? After 1pm PDT I should be free, you can catch me in #fedora-admin =
on
irc (nick: nirik)

I can focus on it then and make sure it's working for you...

``

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-infrastructure/issue/8736

jimbair commented 4 years ago

Kevin, sounds good! I'll ping you on IRC here in about an hour and a half and see what we can figure out. =)

kevin commented 4 years ago

Rats, I forgot a meeting at 1... so, after 1:30pm (in about 1.5 hours of this comment) I should be free. :)

jimbair commented 4 years ago

3:30PM CDT it is =) I'll be there! I'll login to IRC early in case your meeting wraps up early

jimbair commented 4 years ago

Thanks for the help! Closing this out as things seem to be behaving now. :)

Metadata Update from @jimbair:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

jimbair commented 4 years ago

Hate to open this up, but it looks like we're hitting this one again:

Generated from persistentvolume-controller
26 times in the last 7 minutes
no persistent volumes available for this claim and no storage class is set

Metadata Update from @jimbair:
- Issue status updated to: Open (was: Closed)

4 years ago

kevin commented 4 years ago

Fixed. We are likely going to need to come up with some better story around reclaiming volumes... perhaps we should just set them to recycle instead of retain, but I am sure someone will then yell about their data getting lost. :)

Anyhow, it should be good again for a bit and we can discuss a better longer term solution.

Sorry for the hassle.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Waiting on Assignee

fedora-infrastructure

Source Code

#8736 Communishift Volume Claim Closed: Fixed 4 years ago by kevin. Opened 4 years ago by jimbair.

Metadata

communishift

#8736 Communishift Volume Claim

Closed: Fixed 4 years ago by kevin. Opened 4 years ago by jimbair.