#58 /var/lib/jenkins is full
Closed 4 years ago by jimbair. Opened 4 years ago by bgoncalv.

the directory used by https://jenkins-continuous-infra.apps.ci.centos.org to store the logs from the runs is full.

df -h /var/lib/jenkins/                                                                                                                                                                                                           
Filesystem                                 Size  Used Avail Use% Mounted on                                                                                                                                                               
172.22.6.19:/exports/os-pv-100gi-00000001  187G  187G  1.0M 100% /var/lib/jenkins

I'm not sure if it is possible to get more space on the volume, otherwise we might need to store less results of the Fedora pipeline.

$ du -a /var/lib/jenkins/jobs | sort -n -r | head -n 10                                                                                                                                                                             
99270116        /var/lib/jenkins/jobs                                                                                                                                                                                                     
64994800        /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline                                                                                                                                                                       
64978548        /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds                                                                                                                                                                
17688372        /var/lib/jenkins/jobs/fedora-f30-build-pipeline                                                                                                                                                                           
17672212        /var/lib/jenkins/jobs/fedora-f30-build-pipeline/builds                                                                                                                                                                    
3782064 /var/lib/jenkins/jobs/continuous-infra-ci-pipeline-f27                                                                                                                                                                            
3743368 /var/lib/jenkins/jobs/continuous-infra-ci-pipeline-f27/builds                                                                                                                                                                     
3588000 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754                                                                                                                                                                   
3584704 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754/archive                                                                                                                                                           
3584328 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754/archive/images/Fedora-Rawhide.qcow2

#7915 is fixed for now. Let's leave this thread open until we can coordinate the move to a larger PV

We might have to reduce the number of jobs we store or change the way we store qcow2 images.
I'm not sure how many people are using the qcow2 images, but the pipeline saves the qcow2 used by the test on build artifacts for each build that has some failure (infrastructure or test).

ex: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-rawhide-build-pipeline/lastFailedBuild/artifact/images/

And we keep the logs of the last 100 builds and 100 PRs for each supported release. At moment F28-F30 and Rawhide.

https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/Jenkinsfile#L59

We hit the issue again during the weekend. I've just deleted 10 qcow2 do bring the Jenkins master back online.

I'm wondering if someone deleted all the qcow2 images under /var/lib/jenkins/jobs/. The base qcow2 images used by the pipeline seem to have been removed and the pipeline was failing like https://pagure.io/fedora-ci/general/issue/43#comment-579380

The base qcow2 are stored on Jenkins jobs like:
https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-f30-image-test/

https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-rawhide-image-test/

I'm rerunning now the jobs with missing qcow2 images...

If so, it wasn't from the CentOS CI folks. We never had hands-on the volume itself.

Do we have thoughts about migration? We can provision a new PV any time, and schedule the cutover.

We are hitting this issue again...
https://jenkins-continuous-infra.apps.ci.centos.org/computer/(master)/

Disk space is too low. Only 0.954GB left on /var/lib/jenkins.

I've deleted some the the qcow2 and brought the Jenkins master back online.

I've also created https://pagure.io/fedora-infrastructure/issue/8047 to request increase on of the volume

@bstinson do you know if os-pv-100gi-00000001 is used by something else? Because currently /var/lib/jenkins seems to use only 23G, but df says 166G are being used.

sh-4.2$ df -h                                                                                                                                                                                                                             
Filesystem                                                                                           Size  Used Avail Use% Mounted on                                                                                                     
/dev/mapper/docker-253:0-402684926-9300699b03108265ed364ae8a9a3d58923e0482b961c21ede4e6c8d4f83f05d1   20G  1.3G   19G   7% /                                                                                                              
tmpfs                                                                                                 63G     0   63G   0% /dev                                                                                                           
tmpfs                                                                                                 63G     0   63G   0% /sys/fs/cgroup                                                                                                 
/dev/mapper/vg_n21-root                                                                              221G   13G  208G   6% /etc/hosts                                                                                                     
shm                                                                                                   64M     0   64M   0% /dev/shm                                                                                                       
172.22.6.19:/exports/os-pv-100gi-00000001                                                            187G  166G   22G  89% /var/lib/jenkins

These volumes are mounted via NFS, and another tenant is bursting above their allocation.

If I provisioned another PV in a different storage location, would you be able to migrate /var/lib/jenkins and move the mount when ready?

PVC jenkins-4 is bound to a new volume.

Please let me know if you need anything else.

I've migrated to new storage, but since the Jenkins redeployment it seems to jobs are running much slower.

I'm wondering if could be the new storage or just a side effect of freshly started Jenkins...

Ex: this trigger job was taking few seconds, now a couple of minutes...

https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-task-pipeline-trigger/

@bstinson thanks for the new storage, trigger performance seems to be back to normal.

I think this issue and also https://pagure.io/fedora-infrastructure/issue/8047 can be closed.

Metadata Update from @bookwar:
- Issue tagged with: jenkins

4 years ago

Closing as the homedir has plenty of space now =)

Metadata Update from @jimbair:
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata