#11751 power9 builders slow
Closed: Fixed with Explanation 3 months ago by kevin. Opened 3 months ago by kevin.

Lately the buildvm-ppc64le builders have been slow and causing build failures sometimes with network timeouts.

I have moved the createrepo channel off all of them (and over to the much faster aarch64 buildvms).

I've done a bunch of investigating on bvmhost-p09-01 today and found:

  • 6.5.x kernel seems to handle things better, 6.6.x and 6.7.x and 6.8.x all get really really high load, but do keep running at least.
  • The problem seems to just be disk i/o in the end. These machines have 7200rpm spinning sata drives and they are just slow. ;(

I'm not sure a good solution yet. I am going to move them all back to 6.5.x kernels since that seems to handle things best currently.
Perhaps we can look at replacing the drives in them with something better, or once we get new stotage in Q2, moving to iscsi.

This is just a tracking ticket for trying to make them more performant.


Current status:

I booted 01 in a 6.8.0rc2 kernel and it actually seemed to do pretty well overnight.
So, I booted them all in that same kernel and will keep an eye out for problems.

This new kernel seems to get load spikes, but it's still actually processing fine and quite responsive. It does hit process alert limits in nagios (Which I have silenced for now).

The bottleneck is really the 7200rpm SATA drives. They just can't handle all the write seeking from the guests very well. In all cases they are 100% utalized when the machines are under heavy builds. I don't think this is related to raid or other higher level items, since the drives themselves are being saturated. Perhaps we can see if we can replace them with ssd/nvme drives.

The 6.8.x kernel seems to have made this much better... I am going to go ahead and close this now, but if anyone notes super slow ppc64le builders please feel free to file a new ticket or reopen this one.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

3 months ago

Login to comment on this ticket.

Metadata