#79 kempty-n11.ci.centos.org and kempty-n9.ci.centos.org seeds pods with wrong umask
Closed: Fixed 3 years ago by dkirwan. Opened 3 years ago by jlebon.

Looks like https://pagure.io/centos-infra/issue/48 is back, this time on kempty-n11.

Before wiping the machine and reprovisioning, let's try to figure out what is causing this and file RHBZs/touch base with the appropriate teams as needed. Feel free to reach out if you need help debugging!


Metadata Update from @dkirwan:
- Issue tagged with: centos-ci-infra, high-trouble, medium-gain

3 years ago

Metadata Update from @dkirwan:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

3 years ago

This is affecting kempty-n9.ci.centos.org too now.

@jlebon can we book a block of time this week where you can explain in more detail what the problem is, how I can replicate it, and how its affecting your workloads? I'm wondering if its something caused by the elevated access your service accounts have, or if its a bug in RHCOS.

@jlebon and I met and attempted to replicate the issue on the n9-n11 nodes using the pod definition:

apiVersion: v1
metadata:
    name: coreos-assembler-sleep
kind: Pod
spec:
  nodeName: kempty-n10.ci.centos.org
  containers:
   - name: coreos-assembler-sleep
     image: quay.io/coreos-assembler/coreos-assembler:latest
     imagePullPolicy: Always
     workingDir: /srv/
     command: ['/usr/bin/dumb-init']
     args: ['sleep', 'infinity']
     resources:
       requests:
         cpu: "4"

Tested with my cluster-admin user, and the serviceaccount jenkins in project coreos-ci:

oc apply -f ~/testpod.yaml
oc get pods --watch
^C
oc exec -ti coreos-assembler-sleep /bin/bash
[builder@coreos-assembler-sleep srv]$ umask
0002
[builder@coreos-assembler-sleep srv]$ exit
command terminated with exit code 130

In each case it returned the expected umask 0002, so will mark this issue blocked, it is likely to reoccur in the coming days, we'll jump on a call and attempt to replicate the issue, capture any relevant information and report a bug upstream.

Metadata Update from @dkirwan:
- Issue assigned to dkirwan

3 years ago

Metadata Update from @dkirwan:
- Issue priority set to: None (was: Waiting on Assignee)

3 years ago

Hi @jlebon have you noticed this issue affecting you recently?

Will close, please reopen if we notice this issue reoccurring.

Metadata Update from @dkirwan:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata
Boards 1
CentOS CI Infra Status: Done