#5665 RFR: staging and production machines for MBS
Closed 6 years ago Opened 7 years ago by ralph.

MBS stands for the module build service.

For a FESCo approved change, we want to deploy staging and production versions of this for the F26 development cycle. As discussed in the FESCo meeting, we will lock down access so that only a few members of the modularity working group will be allowed to submit builds since we do not yet have any "packaging policy" in place for modules yet.

In both production and staging, we need:

  • Two frontend VMs for load balancing our flask frontend (it is only a JSON api - no web UI).
  • One backend VM for a fedmsg-hub process

That is six VMs in total.

We do not expect to have significant CPU or memory requirements, as the bulk of the real building is done in koji itself. Copying the CPU and memory values from a similar service would be sufficient for initial values.

Please let me know if any other information is needed.


Hostnames could be something like:

  • mbs-frontend01.phx2.fedoraproject.org
  • mbs-frontend02.phx2.fedoraproject.org
  • mbs-backend01.phx2.fedoraproject.org

FYI, we have submitted a package review but it doesn't have a reviewer yet: https://bugzilla.redhat.com/show_bug.cgi?id=1404012

We'll need a standard postgres database (one on the shared db host will be fine).

All looks good to me. :e-mail:

@ralph changed the status to Closed

7 years ago

These instances are missing a lot of parts that the RFR procedure is supposed to take care of.
Can we please get the required information in this ticket and respective places?
From the RFR document, the steps I'm not seeing mentioned here:

Planning:
* MUST determine who is involved in the deployment/maintaining the resource.

Note here that "If your RFR only has a single person working on it, please gather at least another person before moving forward. Single points of failure are to be avoided."
So we should identify at least two people.

Development:
* MUST add any needed SOP's for the service. Should there be an Update SOP? A troubleshooting SOP? Any other tasks that might need to be done to the instance when those who know it well are not available?

Production:
* Monitoring of the resource is added and confirmed to be effective.

Metadata Update from @puiterwijk:
- Issue status updated to: Open (was: Closed)

7 years ago
  • MUST determine who is involved in the deployment/maintaining the resource.

These are the members of sysadmin-mbs.

  • MUST add any needed SOP's for the service. Should there be an Update SOP? A troubleshooting SOP? Any other tasks that might need to be done to the instance when those who know it well are not available?

Gotcha. We'll get on this.

  • Monitoring of the resource is added and confirmed to be effective.

We have host-level monitoring in place. Are you asking for application-specific monitoring?

  • MUST determine who is involved in the deployment/maintaining the resource.

These are the members of sysadmin-mbs.

  • MUST add any needed SOP's for the service. Should there be an Update SOP? A troubleshooting SOP? Any other tasks that might need to be done to the instance when those who know it well are not available?

Gotcha. We'll get on this.

  • Monitoring of the resource is added and confirmed to be effective.

We have host-level monitoring in place. Are you asking for application-specific monitoring?

MUST determine who is involved in the deployment/maintaining the resource.

These are the members of sysadmin-mbs.

Ack, thanks.

Monitoring of the resource is added and confirmed to be effective.

We have host-level monitoring in place. Are you asking for application-specific monitoring?

Yeah, I meant application-specific monitoring.
Especially when people start to depend on it at a later stage, we really need this, and right now you guys are all accounted for and available, so it might be easiest to just get it done :).

Metadata Update from @kevin:
- Issue tagged with: request-for-resources

7 years ago

Nagios checks have been added.

So, all thats left here is some docs/SOPs right?

Yeah, that's the only thing left.

I'll ping @jkaluza, @mprahl, and @fivaldi to see if we can come up with a list of "things we commonly have to fix" that would provide the table of contents for the sop.

Can we get any progress on this?
I have not seen any SOP at all, and people are considering this production now.

Metadata Update from @ralph:
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata