#343 rebooting jenkins should be predictable
Closed: Fixed 2 years ago by dkirwan. Opened 2 years ago by benesv.

Hi all,
I know security is something we have to follow, but restarting all instances of jenkins killing (or in my case) tens or even hundreds jobs (in my case only 5) here and there makes things a bit unpredictable.
Would it be possible to somehow announce outages beforehand? Is there any mechanism that can leave jobs running and jenkins picking them up again when back online? In my case it didn't work:

check the end of https://jenkins-networkmanager.apps.ocp.ci.centos.org/job/NetworkManager-code-mr/73/console

I know it's pretty difficult to make all ppl happy but maybe I am just missing some interesting plugin or jenkins feature how to get rid of these!

Thanks so much,
Vladimir, the NetworkManager breaker!


welcome to openshift, where your pods are automatically rebooted when openshift itself is updated ... (I myself was suprized that it was needed as simple "vm live migration" was possible 10years ago but not supported by containers it seems ...

Let's discuss this on the ci-users list about the best way to come with a solution.
Either announce when David or Vipul are updating openshift cluster (and leading to all pods being teared down and restarted, meaning loss of data in stateless container), or see if jenkins as plugin to "queue" on disk jobs (and so restart automatically jobs in queue after a restart), assuming that you can queue on a PersistentVolume

Metadata Update from @dkirwan:
- Issue tagged with: centos-ci-infra, medium-gain, medium-trouble

2 years ago

Metadata Update from @dkirwan:
- Issue priority set to: None (was: Needs Review)

2 years ago

@siddharthvipul1 , @dkirwan : is that possible to investigate using the jenkins template that uses PersistentVolume and so jobs would still be on "disk/queue" when pod is redeployed and using same PV ?
Worth testing first and then come with the following proposal to projects on CI :

  • use ephemeral storage for jenkins container : fast but losing jobs on each restart (new pod/deploy)
  • use PersistentVolume for jenkins container : really slower but at least a rebuild/redeploy would resume jobs that were in previous jenkins container (hopefully and to be tested)

Does that sound like a plan ?

I've spoken with @siddharthvipul1 we've decided that we will only roll out updates to the cluster on Fridays, for normal updates. If there are CVEs or other critical fixes we will roll them out as soon as they are available.

Metadata Update from @dkirwan:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
CentOS CI Infra Status: Backlog