the directory used by https://jenkins-continuous-infra.apps.ci.centos.org to store the logs from the runs is full.
df -h /var/lib/jenkins/ Filesystem Size Used Avail Use% Mounted on 172.22.6.19:/exports/os-pv-100gi-00000001 187G 187G 1.0M 100% /var/lib/jenkins
I'm not sure if it is possible to get more space on the volume, otherwise we might need to store less results of the Fedora pipeline.
$ du -a /var/lib/jenkins/jobs | sort -n -r | head -n 10 99270116 /var/lib/jenkins/jobs 64994800 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline 64978548 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds 17688372 /var/lib/jenkins/jobs/fedora-f30-build-pipeline 17672212 /var/lib/jenkins/jobs/fedora-f30-build-pipeline/builds 3782064 /var/lib/jenkins/jobs/continuous-infra-ci-pipeline-f27 3743368 /var/lib/jenkins/jobs/continuous-infra-ci-pipeline-f27/builds 3588000 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754 3584704 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754/archive 3584328 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754/archive/images/Fedora-Rawhide.qcow2
https://pagure.io/fedora-infrastructure/issue/7915 filed
#7915 is fixed for now. Let's leave this thread open until we can coordinate the move to a larger PV
We might have to reduce the number of jobs we store or change the way we store qcow2 images. I'm not sure how many people are using the qcow2 images, but the pipeline saves the qcow2 used by the test on build artifacts for each build that has some failure (infrastructure or test).
ex: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-rawhide-build-pipeline/lastFailedBuild/artifact/images/
And we keep the logs of the last 100 builds and 100 PRs for each supported release. At moment F28-F30 and Rawhide.
https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/Jenkinsfile#L59
We hit the issue again during the weekend. I've just deleted 10 qcow2 do bring the Jenkins master back online.
I'm wondering if someone deleted all the qcow2 images under /var/lib/jenkins/jobs/. The base qcow2 images used by the pipeline seem to have been removed and the pipeline was failing like https://pagure.io/fedora-ci/general/issue/43#comment-579380
/var/lib/jenkins/jobs/
The base qcow2 are stored on Jenkins jobs like: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-f30-image-test/
https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-rawhide-image-test/
I'm rerunning now the jobs with missing qcow2 images...
If so, it wasn't from the CentOS CI folks. We never had hands-on the volume itself.
Do we have thoughts about migration? We can provision a new PV any time, and schedule the cutover.
We are hitting this issue again... https://jenkins-continuous-infra.apps.ci.centos.org/computer/(master)/
Disk space is too low. Only 0.954GB left on /var/lib/jenkins.
I've deleted some the the qcow2 and brought the Jenkins master back online.
I've also created https://pagure.io/fedora-infrastructure/issue/8047 to request increase on of the volume
@bstinson do you know if os-pv-100gi-00000001 is used by something else? Because currently /var/lib/jenkins seems to use only 23G, but df says 166G are being used.
sh-4.2$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/docker-253:0-402684926-9300699b03108265ed364ae8a9a3d58923e0482b961c21ede4e6c8d4f83f05d1 20G 1.3G 19G 7% / tmpfs 63G 0 63G 0% /dev tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/mapper/vg_n21-root 221G 13G 208G 6% /etc/hosts shm 64M 0 64M 0% /dev/shm 172.22.6.19:/exports/os-pv-100gi-00000001 187G 166G 22G 89% /var/lib/jenkins
These volumes are mounted via NFS, and another tenant is bursting above their allocation.
If I provisioned another PV in a different storage location, would you be able to migrate /var/lib/jenkins and move the mount when ready?
@bstinson yes, I think we can do it.
PVC jenkins-4 is bound to a new volume.
Please let me know if you need anything else.
I've migrated to new storage, but since the Jenkins redeployment it seems to jobs are running much slower.
I'm wondering if could be the new storage or just a side effect of freshly started Jenkins...
Ex: this trigger job was taking few seconds, now a couple of minutes...
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-task-pipeline-trigger/
@bstinson thanks for the new storage, trigger performance seems to be back to normal.
I think this issue and also https://pagure.io/fedora-infrastructure/issue/8047 can be closed.
We should keep an eye on Fedora triggers jobs and make sure they are not running for too long, they should take less than 1 min to run.
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-pr-comment-trigger/ https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-pr-new-trigger https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-task-pipeline-trigger https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-build-pipeline-trigger
@bstinson The biggest change is when running the trigger for larger repos.
For example kernel:
Old storage: https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/351702/ - 40s
New storage: https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/352058/ - 10mins https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/352088/ - 11mins https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/352129/ - 10mins
Metadata Update from @bookwar: - Issue tagged with: jenkins
Closing as the homedir has plenty of space now =)
Metadata Update from @jimbair: - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.