#9595 Investigate moving RISC-V infrastructure into AWS
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by ahs3.

Describe what you would like us to do:

The current infrastructure for Fedora RISC-V is hosted on a system that will be going away in the near future. What I would like to be able to do is gain access to Fedora AWS infrastructure to see if it is practical to (a) re-bootstrap Fedora for RISC-V (f34/rawhide), and (b) see if we can spin up a koji instance from that work. Even though RISC-V is still in its early stages as an architecture, there is significant interest in Fedora for the architecture.

If this infrastructure works and is practical -- performance is the primary concern -- this would
be the first step towards the goal of making RISC-V a fully supported architecture in the future.

When do you need this to be done by? (YYYY/MM/DD)

Soon? We can retain access to the existing Fedora RISC-V koji for a while; it is not clear if that is
one month or three or six. What we're trying to determine is if something like AWS will work as
well as the current Xeon server being used (or work better). Ideally, this use of the Fedora infrastructure will work well enough that we can replace the existing machine and start moving further towards full support like x86.


Metadata Update from @humaton:
- Issue tagged with: low-gain, low-trouble, ops, request-for-resources

3 years ago

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

3 years ago

So, what kind of instance(s) do you need here? Can you tell us the stats of the current one?
(cpu, disk, memory, etc).

The current machine is a 4 core Xeon with 32GB RAM, and over 1TB of disk space -- it's currently used for the RISC-V Koji instance. I don't think that much disk space is needed right away, perhaps half that would be plenty for timing builds and bootstrapping off of f33 (that's what I'm using to test speed with).

@mobrien would you have time to work on this?
I guess we should look and see what size instance would work here (or instance + EBS volume)

I guess we could just put this in the infra tagged instances and setup their ssh key(s). Or make a new group for it...

Metadata Update from @mobrien:
- Issue assigned to mobrien

3 years ago

The current Koji machine we use for Fedora/RISCV is 2S Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (20C/40T) with 128G of DRAM.

We use an expensive storage setup (PCIe 3.0 x8 NVMe drives: WDC SN200 PCIe 3.0 cards HL-HL 6.4TB). Two of these are used for /mnt/koji in RAID and one for Koji DB. There are additional storage for backups.

This was designed for large mixed IOPS (I think, benchmarks showed 1.25 millions IOPS for 75/25 read/write). This allows us to generate distribution repositories extremely fast. On our old server it used to take hours, this new server originally used only 1-2 minutes to 150-200G distribution repository (and you could do several at a time). Now it's slower, I think last one took 7 minutes, but still very fast. We are close to filling up /mnt/koji (<600G left). That's gonna be 6+TB in /mnt/koji soon. This setup also allows feeding a large builders pool quite well (we had some issues before keeping Koji responsive before on old hardware).

We use 256GB M.2 NVMe for boot drive, but that requires regular maintenance (easy to fill up with logs).

This system feeds <200 QEMU builders and a few physical boards.

The current machine is a 4 core Xeon with 32GB RAM, and over 1TB of disk space -- it's currently used for the RISC-V Koji instance. I don't think that much disk space is needed right away, perhaps half that would be plenty for timing builds and bootstrapping off of f33 (that's what I'm using to test speed with).

We could supply something similar to this. r5.xlarge is 4vCPU 19 ECU 32 GiB RAM, it is optimized for memory performance, we could give an attached EBS volume which is IOPS provisioned which should give good disk performance.

The current Koji machine we use for Fedora/RISCV is 2S Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (20C/40T) with 128G of DRAM.

We use an expensive storage setup (PCIe 3.0 x8 NVMe drives: WDC SN200 PCIe 3.0 cards HL-HL 6.4TB). Two of these are used for /mnt/koji in RAID and one for Koji DB. There are additional storage for backups.

This was designed for large mixed IOPS (I think, benchmarks showed 1.25 millions IOPS for 75/25 read/write). This allows us to generate distribution repositories extremely fast. On our old server it used to take hours, this new server originally used only 1-2 minutes to 150-200G distribution repository (and you could do several at a time). Now it's slower, I think last one took 7 minutes, but still very fast. We are close to filling up /mnt/koji (<600G left). That's gonna be 6+TB in /mnt/koji soon. This setup also allows feeding a large builders pool quite well (we had some issues before keeping Koji responsive before on old hardware).

We use 256GB M.2 NVMe for boot drive, but that requires regular maintenance (easy to fill up with logs).

This system feeds <200 QEMU builders and a few physical boards.

This may be too much for AWS, while we could find an instance to match the cpu/memory I don't think the disk performance could be matched reliably.

This system feeds <200 QEMU builders and a few physical boards.

This may be too much for AWS, while we could find an instance to match the cpu/memory I don't think the disk performance could be matched reliably.

I'm not sure we need to match the disk speeds -- perhaps, but that's one of the things I'd like to find out. My suspicion is that AWS will be sufficient but not optimal.

Any news? We're getting to the point where it may be necessary to shut down the existing koji server for RISC-V, and we're no closer to being able to replace it with AWS.

Thanks. Sorry to be a bit of a nag....

Sorry for the delay I will try get this set up today. Just to clarify what you need.

r5.xlarge (4vCPU 19 ECU 32 GiB RAM) instance with 700GB attached EBS disk with provisioned IOPS. Is Fedora33 ok for the OS?
I will need a public ssh key to put on the instance for access.

Also is the koji instance that @davidlt mentioned necessary now or is that for some time in the future?

I am fine with the suggested server. If we hit bottlenecks we improved again. I am a bit worried about 700GB storage. We currently have <6TB of data. So we would need to prune that. A single distro repo is 150-200GB IIRC.

I am fine with the suggested server. If we hit bottlenecks we improved again. I am a bit worried about 700GB storage. We currently have <6TB of data. So we would need to prune that. A single distro repo is 150-200GB IIRC.

The 700GB was a suggestion based the original request but I could give that as a "fast" root volume with a 10TB volume available for storage at standard speed if needed.

The current machine is a 4 core Xeon with 32GB RAM, and over 1TB of disk space -- it's currently used for the RISC-V Koji instance. I don't think that much disk space is needed right away, perhaps half that would be plenty for timing builds and bootstrapping off of f33 (that's what I'm using to test speed with).

Is it just one instance needed as requested in the above quoted comment or this plus a second larger instance as referenced in the below quoted comment?

The current Koji machine we use for Fedora/RISCV is 2S Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (20C/40T) with 128G of DRAM.
We use an expensive storage setup (PCIe 3.0 x8 NVMe drives: WDC SN200 PCIe 3.0 cards HL-HL 6.4TB). Two of these are used for /mnt/koji in RAID and one for Koji DB. There are additional storage for backups.

I am fine with the suggested server. If we hit bottlenecks we improved again. I am a bit worried about 700GB storage. We currently have <6TB of data. So we would need to prune that. A single distro repo is 150-200GB IIRC.

The 700GB was a suggestion based the original request but I could give that as a "fast" root volume with a 10TB volume available for storage at standard speed if needed.

That would be excellent. Thank you.

The current machine is a 4 core Xeon with 32GB RAM, and over 1TB of disk space -- it's currently used for the RISC-V Koji instance. I don't think that much disk space is needed right away, perhaps half that would be plenty for timing builds and bootstrapping off of f33 (that's what I'm using to test speed with).

Is it just one instance needed as requested in the above quoted comment or this plus a second larger instance as referenced in the below quoted comment?

I think we only need the one instance for now. Let's see if and how demand grows before we add more.

The current Koji machine we use for Fedora/RISCV is 2S Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (20C/40T) with 128G of DRAM.
We use an expensive storage setup (PCIe 3.0 x8 NVMe drives: WDC SN200 PCIe 3.0 cards HL-HL 6.4TB). Two of these are used for /mnt/koji in RAID and one for Koji DB. There are additional storage for backups.

This is my pub ssh key (also used in FAS):

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICZmY6oeQ0PXrrEZ/NNNsOeuLoQtKSni0BSPpVUEUaWY ahs3@cherubino

Thanks again, Mark! No worries about any delays. We're all doing the best we can these days :).

I have created the instance below. To access just ssh fedora@3.239.70.25 using the ssh key referenced above. Let me know if you need anything else. The security group may need to be updated with correct ports.

type: r5,xlarge
Root disk: 700G (provisioned iops)
Attached disk: 10000G (Will likely need to be mounted)
IPv4: 3.239.70.25
IPv6: 2600:1f18:8ee:ae04:5c97:c0ed:a866:1eb0
Open ports: 22,80,443 (May need to be updated just let me know)
Distro: Fedora 33

Most excellent. I'll get on this as soon as I can and let you know if there's any additional ports to open. Thanks!

Feel free to re-open or file a new ticket if you need any adjustments!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Done