fedora-infrastructure

#11013 Redeploy openQA workers with consistent storage configuration

Opened a year ago by adamwill. Modified 2 months ago

NOTE

If your issue is for security or deals with sensitive info please
mark it as private using the checkbox below.

Describe what you would like us to do:

With deploying different ones at different times, and trying to work around https://bugzilla.redhat.com/show_bug.cgi?id=2009585 , we've wound up with quite a mess in terms of the storage configuration on different openQA workers. Some are encrypted, some aren't. Some are luks-on-btrfs-on-mdraid (I think that's the right ordering), one (I think) uses btrfs native RAID, one (I think) just uses a single disk and leaves all the others idle...

Since none of the dodges actually worked around 2009585 (except using a single disk, but that makes things slower, so we probably shouldn't stick with that), we should probably just give them all one consistent storage config. Encrypting them is probably a good idea - they don't have anything really super super secret, but they do have the openQA database password and credentials for resultsdb and maybe a couple other not-really-that-big-of-a-deal things I'm forgetting.

When do you need this to be done by? (YYYY/MM/DD)

Whenever, it's not urgent. Please co-ordinate this with me on chat.fp.o . We'll want to do the staging ones first, then prod once we're sure staging is good.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-trouble, medium-gain, ops

a year ago

kevin commented a year ago

Do we know what the actual config we want to use on all of them is?

I am traveling next week, but we could look at trying to do them the week after (which is a week before freeze)?

adamwill commented a year ago

Well, the native btrfs RAID thing seemed to work fine, so maybe that on LUKS?

kevin commented a year ago

ok. Remind me week after next and we can try and get them done.

Also, I will try and rearrange some staging resources and possibly free up some more workers for you.

zlopez commented 11 months ago

[backlog_refinement]
This is low priority right now, but we still want to do it.

kevin commented 8 months ago

We didn't get to this yet... after beta?

zlopez commented 4 months ago

[backlog_refinement]
Moved to next year.

adamwill commented 2 months ago

current state of play:

PROD WORKERS

x86-worker01 - xfs-on-lvm-on-luks-on-mdraid mq-deadline
x86-worker02 - xfs-on-lvm-on-mdraid mq-deadline
x86-worker06 - native-btrfs-raid mq-deadline

LAB WORKERS

x86-worker03 - btrfs-on-luks-on-mdraid bfq
x86-worker04 - btrfs-on-luks-on-mdraid bfq
x86-worker05 - btrfs-on-luks-on-mdraid bfq

so, a nice mix :D edit: we recently redeployed 04 and 05, and now the lab workers are consistent. I do not understand why they're on the bfq scheduler, though - they all have the udev dropin cmurf recommended, but it doesn't seem to be applying.

Still, I'll keep an eye on things for a while and see if lab behaves itself consistently now 04 isn't using just one disk as it had been for some time. prod seems to be behaving fine, despite the mix of layouts.

Edited 2 months ago by adamwill

adamwill commented 2 months ago

if we ever do get to a full redeployment, we should also standardize the network interface naming strategy and make the kickstart match it. right now we have the following mix:

a64-worker01 - enp1s0 (udev)
a64-worker02 - eth1 (old-style)
a64-worker03 - eth1 (old-style)
a64-worker04 - eth2 (old-style)
p09-worker01 - enp1s0f0 (udev)
p09-worker02 - eth0 (old-style)
x86-worker01 - em1 (biosdevname)
x86-worker02 - em1 (biosdevname)
x86-worker03 - eno1 (udev)
x86-worker04 - em1 (biosdevname)
x86-worker05 - eno1 (udev)
x86-worker06 - em1 (biosdevname)

in summary: ugh.

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Waiting on Assignee

Boards 1

ops Status: Backlog

fedora-infrastructure

Source Code

#11013 Redeploy openQA workers with consistent storage configuration Opened a year ago by adamwill. Modified 2 months ago

Close issue as:

Describe what you would like us to do:

When do you need this to be done by? (YYYY/MM/DD)

PROD WORKERS

LAB WORKERS

Metadata

low-trouble medium-gain ops

Boards 1

#11013 Redeploy openQA workers with consistent storage configuration

Opened a year ago by adamwill. Modified 2 months ago