#3054 Request for resources: Darkserver

Created 6 years ago by kushal
Modified 15 days ago

= Project Sponsor =

Project Team (FAS Names): kushal sundaram jankratochvil

Infrastructure Sponsor:

= Project Info =

Project Name: Darkserver

Target Audience: developers and users

Expiration/Delivery Date 2012-05-08 (Fedora 17 release date)

Description/Summary:

Darkserver is a service written to help people finding details of build-id(s). People will be able query the service based on build-id(s) or rpm package names. The service will provide output in JSON format as it will be easier for other tools to parse the output. My idea is to run darkserver under http://darkserver.fedoraproject.org so that developers can develop/enable their tools to use this service.

Project plan (Detailed):

Detailed plan can be found in the wiki page https://fedoraproject.org/wiki/Darkserver and on a blog at http://kushaldas.in/2011/11/11/darkserver-a-gnu-build-id-details-provider-web-service-idea-currently-in-development/

Goals: To serve as a primary place to find all build-id details for any package related to Fedora and helping the developers to solve the bugs easily.

Confirm you have read:
https://fedoraproject.org/wiki/Request_For_Resources
and
https://fedoraproject.org/wiki/Request_for_resources_SOP Yes , I read both.

I'll go ahead and sponsor this request. ;)

Will see about getting you a test/dev instance soon.

I have created a darkserver01.dev.fedoraproject.org instance for you as well as a 'sysadmin-darkserver' group with you in it.

The instance is at 140.211.169.213 and should be in dns soon.

Please use this to test install and work out config.
Also, if you could start a doc in the infra-docs git repo (on lockbox01) for a SOP on how to manage, update, and troubleshoot darkserver instances, that would be great.

Let me know if you need anything further.

I am being able to login to the system but not being able to run sudo or su. Tried my FAS password also. Please add me in sudo list.

Sorry about that. Should be fixed now.

How are things looking here? How is the dev instance working?
Is there any thought of moving to staging?

The dev instance is working perfectly. The next build of the server will also contain the koji plugin as a subpackage.
The service does not require any other Fedora service to work on. It can use normal apache level caching.
I am ready to move to staging for this service.

We now are live in production. ;)

Thanks for all the work.

I may look at decomissioning the dev and stg instances since we can't easily test this in staging right now and development should be pretty much done upstream and not be very rapid now.

Request for comments

For the new version of darkserver, we have the database migrated to Postgresql. Now we can start a darkserver-backend instance which can then start filling up the db with the latest build information. We can also schedule all the old missing builds in the import system in the backend. This way the users will not see the new production system until we create the web front ends, and flip the haproxy to use the new web frontends.

In the current estimation it may take 7 days to backfill the old builds from koji.

This approach seems reasonable to me

Could you give an overview of what resources you need with the new version?

Nr of backend nodes, nr of frontend nodes, backup strategy?
Who are responsible for maintenance of the software, and where are the docs?
Who is the RFR sponsor for this this time around?

Replying to [comment:10 puiterwijk]:

Could you give an overview of what resources you need with the new version?

Nr of backend nodes, nr of frontend nodes, backup strategy?

We want only 1 backend node, and 2 frontend nodes for haproxy. The frontends can be standard dual core+2GB RAM+40 GB vms, they run the Django based frontend.

The backend should be with 4 cores as in previous version of darkserver.

Who are responsible for maintenance of the software, and where are the docs?

I am the primary developer, and also responsible for maintenance of the application. Right now for installation docs I have an updated SOP file at https://github.com/kushaldas/darkserver/blob/master/darkserver-sop.rst We also updated, and tested the ansible playbook on the stg instances.

Who is the RFR sponsor for this this time around?

I do not have a name for this still now.

I can be the RFR sponsor here.

@kushal, will we need to establish any backups for this service? We can backup the DB. Anything else?

@puiterwijk, were all your questions answered sufficiently?

Replying to [comment:12 ralph]:

I can be the RFR sponsor here.

@kushal, will we need to establish any backups for this service? We can backup the DB. Anything else?

Other than DB nothing else needs to be backed up. We can easily spawn up new instances from the db backup.

Some other questions from the RFR SOP:

How is the load balancing situation? Will it work behind a normal haproxy setup?

Do we want to cache anything, and if yes what, on the proxies?

Is the documentation in the infra-docs repo up to date, including update & troubleshooting sections?

Are all Ansible playbooks written and ready to be run?

What is the expected timing of the deployment?

Is monitoring setup in Nagios and working?

Ralph, we need your signoff on moving onwards.

Replying to [comment:14 puiterwijk]:

Some other questions from the RFR SOP:

How is the load balancing situation? Will it work behind a normal haproxy setup?

I can't answer this. Kushal, can you? I can't think of a reason why it wouldn't work (like expecting to share state in-memory in-between requests).

Do we want to cache anything, and if yes what, on the proxies?

I don't know. Kushal, can you answer this? Does darkserver2 have any static css or js assets that should be cached?

Note - darkserver1 does not: https://darkserver.fedoraproject.org/

Is the documentation in the infra-docs repo up to date, including update & troubleshooting sections?

It looks like the answer is no: https://infrastructure.fedoraproject.org/infra/docs/darkserver.rst

We should update that, yes.

Are all Ansible playbooks written and ready to be run?

I think the answer here is "yes":

Kushal, please correct me if I'm wrong.

What is the expected timing of the deployment?

We'll pick a day to do it after we get through this process.

Is monitoring setup in Nagios and working?

No. But it cannot possibly be so before we get the RFR approved. (We don't want nagios monitoring a service that is not there yet. In practice, we always set this up afterwards.)

Ralph, we need your signoff on moving onwards.

I'll wait until we get further response from kushal on some of the above items.

Replying to [comment:15 ralph]:

Is monitoring setup in Nagios and working?

No. But it cannot possibly be so before we get the RFR approved. (We don't want nagios monitoring a service that is not there yet. In practice, we always set this up afterwards.)

A lot of apps also add Nagios to the stg version just to make sure that we have valid Nagios checks that works after an upgrade, so you could work on that.
But yes, we should get an noc01.stg.

Replying to [comment:15 ralph]:

Replying to [comment:14 puiterwijk]:

Some other questions from the RFR SOP:

How is the load balancing situation? Will it work behind a normal haproxy setup?

I can't answer this. Kushal, can you? I can't think of a reason why it wouldn't work (like expecting to share state in-memory in-between requests).
This should work as every request is independent, we do not have states.

Do we want to cache anything, and if yes what, on the proxies?

I don't know. Kushal, can you answer this? Does darkserver2 have any static css or js assets that should be cached?

Note - darkserver1 does not: https://darkserver.fedoraproject.org/

There is no css or js.

Is the documentation in the infra-docs repo up to date, including update & troubleshooting sections?

It looks like the answer is no: https://infrastructure.fedoraproject.org/infra/docs/darkserver.rst

We should update that, yes.

I have an updated SOP doc in the github https://github.com/kushaldas/darkserver/blob/master/darkserver-sop.rst

Are all Ansible playbooks written and ready to be run?

I think the answer here is "yes":

Kushal, please correct me if I'm wrong.
Yes, they are ready as tested in the stg environment.

Replying to [comment:17 kushal]:

Replying to [comment:15 ralph]:

Replying to [comment:14 puiterwijk]:

Is the documentation in the infra-docs repo up to date, including update & troubleshooting sections?

It looks like the answer is no: https://infrastructure.fedoraproject.org/infra/docs/darkserver.rst

We should update that, yes.

I have an updated SOP doc in the github https://github.com/kushaldas/darkserver/blob/master/darkserver-sop.rst

Note that pretty much nothing of the instructions in this SOP are what should go in the infra-docs repo.
This document describes how to install darkserver, but I would hope I would need to run nothing of that as it's all done in Ansible.

What I do want to see is:
1. how does it function, which parts are there and how do they communicate
2. what to do if something goes wrong? Common services to restart, common logs to check, etc
3. What other special considerations do the sysadmins need to take into account when managing this service?
4. What info do we need to recover the service if it goes down at 3AM?

Agreed. For example, there is the text in the existing SOP: "Please make sure that the backend instance can reach koji, ppc.koji, and s390.koji instances."

Perhaps (in the SOP) you could explain why that is. What does darkserver-importer do? Why does it need to talk to those places. What happens if it cannot talk to them for some reason. If they go away and come back, does it need to be restarted? How do you check on it to see if it is working correctly. Etc. @puiterwijk's questions are a good guide here.

Whats the status here? Would be nice to get this rolled out in production before Beta freeze (next tuesday).

What is the status here @kushal ?

I know we rolled out the orig darkserver years ago, and now there is a new one, but we only have it in staging currently? We should finish this up and get it into production.

@kushal what do you want to do here?

The current status seems to be that darkserver isn't at all functional, and hasn't been for a while.

We have one version in prod and one in stg (but neither one works currently).

Do you have time to move this forward? Or should we just retire the service (at least for now).

We have shelved darkserver now.

15 days ago

Metadata Update from @kevin:
- Issue close_status updated to: Will Not/Can Not fix
- Issue status updated to: Closed (was: Open)

Login to comment on this ticket.