Issue #58: /var/lib/jenkins is full - general

fedora-ci / general

#58 /var/lib/jenkins is full

Closed 4 years ago by jimbair. Opened 4 years ago by bgoncalv.

the directory used by https://jenkins-continuous-infra.apps.ci.centos.org to store the logs from the runs is full.

df -h /var/lib/jenkins/                                                                                                                                                                                                           
Filesystem                                 Size  Used Avail Use% Mounted on                                                                                                                                                               
172.22.6.19:/exports/os-pv-100gi-00000001  187G  187G  1.0M 100% /var/lib/jenkins

bgoncalv commented 4 years ago

I'm not sure if it is possible to get more space on the volume, otherwise we might need to store less results of the Fedora pipeline.

$ du -a /var/lib/jenkins/jobs | sort -n -r | head -n 10                                                                                                                                                                             
99270116        /var/lib/jenkins/jobs                                                                                                                                                                                                     
64994800        /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline                                                                                                                                                                       
64978548        /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds                                                                                                                                                                
17688372        /var/lib/jenkins/jobs/fedora-f30-build-pipeline                                                                                                                                                                           
17672212        /var/lib/jenkins/jobs/fedora-f30-build-pipeline/builds                                                                                                                                                                    
3782064 /var/lib/jenkins/jobs/continuous-infra-ci-pipeline-f27                                                                                                                                                                            
3743368 /var/lib/jenkins/jobs/continuous-infra-ci-pipeline-f27/builds                                                                                                                                                                     
3588000 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754                                                                                                                                                                   
3584704 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754/archive                                                                                                                                                           
3584328 /var/lib/jenkins/jobs/fedora-rawhide-build-pipeline/builds/4754/archive/images/Fedora-Rawhide.qcow2

bookwar commented 4 years ago

https://pagure.io/fedora-infrastructure/issue/7915 filed

bstinson commented 4 years ago

#7915 is fixed for now. Let's leave this thread open until we can coordinate the move to a larger PV

Edited 4 years ago by bstinson

bgoncalv commented 4 years ago

We might have to reduce the number of jobs we store or change the way we store qcow2 images.
I'm not sure how many people are using the qcow2 images, but the pipeline saves the qcow2 used by the test on build artifacts for each build that has some failure (infrastructure or test).

ex: https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-rawhide-build-pipeline/lastFailedBuild/artifact/images/

And we keep the logs of the last 100 builds and 100 PRs for each supported release. At moment F28-F30 and Rawhide.

https://github.com/CentOS-PaaS-SIG/upstream-fedora-pipeline/blob/master/Jenkinsfile#L59

Edited 4 years ago by bgoncalv

bgoncalv commented 4 years ago

We hit the issue again during the weekend. I've just deleted 10 qcow2 do bring the Jenkins master back online.

bgoncalv commented 4 years ago

I'm wondering if someone deleted all the qcow2 images under /var/lib/jenkins/jobs/. The base qcow2 images used by the pipeline seem to have been removed and the pipeline was failing like https://pagure.io/fedora-ci/general/issue/43#comment-579380

The base qcow2 are stored on Jenkins jobs like:
https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-f30-image-test/

https://jenkins-continuous-infra.apps.ci.centos.org/view/Fedora%20All%20Packages%20Pipeline/job/fedora-rawhide-image-test/

I'm rerunning now the jobs with missing qcow2 images...

bstinson commented 4 years ago

If so, it wasn't from the CentOS CI folks. We never had hands-on the volume itself.

Do we have thoughts about migration? We can provision a new PV any time, and schedule the cutover.

bgoncalv commented 4 years ago

We are hitting this issue again...
https://jenkins-continuous-infra.apps.ci.centos.org/computer/(master)/

Disk space is too low. Only 0.954GB left on /var/lib/jenkins.

bgoncalv commented 4 years ago

I've deleted some the the qcow2 and brought the Jenkins master back online.

I've also created https://pagure.io/fedora-infrastructure/issue/8047 to request increase on of the volume

bgoncalv commented 4 years ago

@bstinson do you know if os-pv-100gi-00000001 is used by something else? Because currently /var/lib/jenkins seems to use only 23G, but df says 166G are being used.

sh-4.2$ df -h                                                                                                                                                                                                                             
Filesystem                                                                                           Size  Used Avail Use% Mounted on                                                                                                     
/dev/mapper/docker-253:0-402684926-9300699b03108265ed364ae8a9a3d58923e0482b961c21ede4e6c8d4f83f05d1   20G  1.3G   19G   7% /                                                                                                              
tmpfs                                                                                                 63G     0   63G   0% /dev                                                                                                           
tmpfs                                                                                                 63G     0   63G   0% /sys/fs/cgroup                                                                                                 
/dev/mapper/vg_n21-root                                                                              221G   13G  208G   6% /etc/hosts                                                                                                     
shm                                                                                                   64M     0   64M   0% /dev/shm                                                                                                       
172.22.6.19:/exports/os-pv-100gi-00000001                                                            187G  166G   22G  89% /var/lib/jenkins

bstinson commented 4 years ago

These volumes are mounted via NFS, and another tenant is bursting above their allocation.

If I provisioned another PV in a different storage location, would you be able to migrate /var/lib/jenkins and move the mount when ready?

bgoncalv commented 4 years ago

@bstinson yes, I think we can do it.

bstinson commented 4 years ago

PVC jenkins-4 is bound to a new volume.

Please let me know if you need anything else.

bgoncalv commented 4 years ago

I've migrated to new storage, but since the Jenkins redeployment it seems to jobs are running much slower.

I'm wondering if could be the new storage or just a side effect of freshly started Jenkins...

Ex: this trigger job was taking few seconds, now a couple of minutes...

https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-task-pipeline-trigger/

bgoncalv commented 4 years ago

@bstinson thanks for the new storage, trigger performance seems to be back to normal.

I think this issue and also https://pagure.io/fedora-infrastructure/issue/8047 can be closed.

bgoncalv commented 4 years ago

We should keep an eye on Fedora triggers jobs and make sure they are not running for too long, they should take less than 1 min to run.

https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-pr-comment-trigger/
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-pr-new-trigger
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-task-pipeline-trigger
https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-build-pipeline-trigger

bgoncalv commented 4 years ago

@bstinson The biggest change is when running the trigger for larger repos.

For example kernel:

Old storage:
https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/351702/ - 40s

New storage:
https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/352058/ - 10mins
https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/352088/ - 11mins
https://jenkins-continuous-infra.apps.ci.centos.org/view/all/job/fedora-build-pipeline-trigger/352129/ - 10mins

Metadata Update from @bookwar:
- Issue tagged with: jenkins

4 years ago

jimbair commented 4 years ago

Closing as the homedir has plenty of space now =)

Metadata Update from @jimbair:
- Issue status updated to: Closed (was: Open)

4 years ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Medium

fedora-ci / general

Source Code

#58 /var/lib/jenkins is full Closed 4 years ago by jimbair. Opened 4 years ago by bgoncalv.

Metadata

jenkins

#58 /var/lib/jenkins is full

Closed 4 years ago by jimbair. Opened 4 years ago by bgoncalv.