#105 High latency PV I/O in Jenkins OCP instance
Closed: Fixed 3 years ago by arrfab. Opened 3 years ago by mrc0mmand.

Hello,

Several hours back I noticed that our systemd OCP Jenkins instance[0] takes ages when accessing certain pages (like [1] or [2]). After monitoring it for a few hours it looks like it's caused by a high latency when accessing the PV mounted under /var/lib/jenkins. Consequently, the load average in the container (jenkins-1-5vxk4) dances between 8 - 20 and when trying to save any changes in the Jenkins the request usually times out.

I also checked the Jenkins logs and load statistics and couldn't find anything suspicious which would be causing the unresponsiveness.

OCP project: systemd
PVC: systemd/jenkins

[0] https://jenkins-systemd.apps.ocp.ci.centos.org/
[1] https://jenkins-systemd.apps.ocp.ci.centos.org/job/upstream-centos7/
[2] https://jenkins-systemd.apps.ocp.ci.centos.org/job/upstream-vagrant-archlinux/


Update: the issue seems to have resolved overnight (not sure if by itself or by some other intervention). Is there any hope in tracing what exactly happened so it doesn't happen again?

Metadata Update from @siddharthvipul1:
- Issue tagged with: groomed

3 years ago

Metadata Update from @arrfab:
- Issue marked as depending on: #53
- Issue tagged with: centos-ci-infra, centos-common-infra

3 years ago

See #53 for possible resolution and we'll be in better position after that hardware maintenance (also node running then we up2date kernel)

Metadata Update from @arrfab:
- Issue assigned to arrfab

3 years ago

Just willing to get some feedback : the "md reshape" operation is still in progress , at very low speed, but was wondering if you still suffer from slow speed (today, as in the last days we suffered from multiple other infra issues)

Apart from occasional and intermittent slow loads (and by slow I mean ~10 seconds) I didn't notice any slowness in general. PRs are dispatched immediately, pods are started as soon as they receive a request, etc. I even installed and configured some plugins without any issues or noticeable slowness.

There's definitely not the complete unresponsiveness as there was when I created this issue.

@mrc0mmand thanks for the feedback .. I guess we'll let the reshape operation go at actual speed if at least it's "usable" for CI projects. I'd expect the storage to respond better when the following operation will have finished, but that means "days" ... :-(

 [======>..............]  reshape = 33.6% (3285555276/9766304768) finish=41613.4min speed=2595K/sec

let's close this one as it seems "usable" (but not yet ideal, but should be better after reshape will have finished, so in more than one week - ETA - )

#53 is now closed as array finished its underlying reshape, so closing this one too now

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata
Boards 2
CentOS CI Infra Status: Done