#2 debuginfod service
Closed 2 months ago by amoloney. Opened 6 months ago by fche.

New initiative: debuginfod

Transcribing content from
https://docs.google.com/document/d/1jnlV_QT8KK_IfoeE07JHrPKILa6EnEgk9MiPLyXzKvY/edit

See also: https://pagure.io/fedora-infrastructure/issue/9371

What is this initiative about?

Let's get an elfutils-debuginfod server up and running for the public fedora community. This will let developers/users fully debug/trace fedora software without have to do #sudo yum commands.

This entails operating a new public-facing HTTP service, backed by one or more VMs that index koji build artifacts. One large-enough server could handle centos+fedora, or we could break it down by release / architecture, depending on available machine/storage options.

fedora-devel thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/K54HO3X7ANUBFN4OQCQ4QBOYSR4HTSR5/

upstream project: https://sourceware.org/elfutils/Debuginfod.html

Why this initiative?

To make developing, debugging, and troubleshooting as quick & easy as possible. It's a weird experience when you can open a debugger on any running process and just see everything.

To show leadership w.r.t. other distros (who are also investigating setting up such servers).

Definition of success

We are successful if a Fedora developer needs to do nothing manual, nothing as root, nothing requiring a lot of unrelated disk space, merely to debug / probe / trace any fedora-built binary. The clients (gdb, systemtap, perf, etc.) should automatically query the fedora debuginfod server when they need information.

Area/community impacted

Impacts CentOS and/or Fedora, developers/troubleshooters in the sense that they would use the resulting service.

No other impact to existing services/infrastructure, except ordinary system load (especially during the initial indexing step, which could take days/weeks). A rolled-out service might reduce CDN usage for -debuginfo RPM downloads later on, as this service is more granular, and permits federation/caching short of full-fledged mirroring.

Dependencies

Do this initiative have any dependencies?

Mainly: Provision of the VM to run the server, with nearby read-only access to the koji build artifacts. We can advise resource requirement estimates on request.

Skills needed?
Person who must or should be involved?

We (RH tools team) would be happy to help operate this service, and/or advise others. We would appreciate working with security / ops type folks to nail down any considerations from that side (mainly: DoS handling, and secondarily: all the usual ops issues).

Other work that should be completed prior to this initiative?

None known.

Deadline

Is this initiative under a time-constraint? Should it start or end before a certain date?

No particular deadline. As soon as it starts coming online, a quick elfutils update (setting the default $DEBUGINFOD_URLS environment variable on the client system) would activate usage.


Metadata Update from @amoloney:
- Issue tagged with: In Review

6 months ago

Hi @fche

Thank you for submitting this request. I am reviewing this for consideration in our next quarterly planning session on December 10th 2020.
Our team will go through the proposal over the next 2 weeks and outline a technical plan for this, which I will discuss with you to make sure this meets your requirements.
The initiative will then either be discussed at quarterly planning for Q1 (Jan, Feb & March of next year) and prioritised in this session against other projects we have to consider.
I will tag it as 'accepted' and 'Q1 CY20' if it will be in review during that session by Dec 4th.

Any questions, or if you want to talk more about this initiative, please dont hesitate to reach out to me! :)

Thanks for submitting a ticket!
Aoife

Thanks, Aoife, looking forward to working with y'all.

Hi @fche !

Few questions on the debuginfo-d service that have come from the team when we were reviewing this ask as a group:

Can this service run on openshift? For longer term ease of maintenance our team have been looking at this possibility but were unaware if it can run on openshift.

Will you or someone from your team be available to assist us with deployment should we hit an issues?

Do you know how big of a service this is? Is it something that sees a high volume of users, etc? We would like to fully understand the ask so we dont underestimate something and end up struggling to cope with keeping a service running.

If you could get those answers to me today that would be incredibly helpful and apologies for my late notice on them :)

Thanks a million!
Aoife

Few questions on the debuginfo-d service that have come from the team when we were reviewing this ask as a group:

No problem.

Can this service run on openshift? For longer term ease of maintenance our team have been looking at this possibility but were unaware if it can run on openshift.

The program is a basic one-process executable, so readily runs from a command line, systemd, or containers. So yes, deploying into openshift should be trivial, including health checking / prometheus monitoring.

Will you or someone from your team be available to assist us with deployment should we hit an issues?

Definitely, and on an ongoing basis.

Do you know how big of a service this is? Is it something that sees a high volume of users, etc? We would like to fully understand the ask so we dont underestimate something and end up struggling to cope with keeping a service running.

We've tried to estimate these things in the google-doc and its predecessor fedora-infrastructure issue. Roughly speaking, if we want a debuginfod instance to index a bunch of fedora RPMs, we need:

  • close network proximity to the RPMs on the koji file server
  • fast persistent-volume / disk storage on the order of 3% of the size of the -debug{info,source} RPMs of interest, to store the index
  • RAM of say 16-32GB, to store momentarily decompressed content
  • CPU of say 4-8 cores, to index & serve
  • external network usage should be pretty small - just the actual debugging traffic by fedora developers/users; since it can displace having to manually download -debuginfo* RPMs, it could be a net savings to fedora network usage

The gadget can be highly parametrized to shard / federate, so we can instantiate separate smaller servers by (say) architecture and/or version and/or package-subset and/or replication, or we can have a big-bang single server that does all RPMs exposed to bodhi/compose. It probably depends on the storage available in a single spot. We'd love to talk to fedora koji folks to determine how best to proceed.

Thank you kindly @fche!
I have pulled those answers into the project board for the team to read though too! This request will be in our Quarter 1 Planning call on Thursday 10th Dec where the CPE team management and our stakeholders (both Fedora, Centos & RHEL) will review all projects and prioritize them. We hope to be able to action this project as part of this list, and I will update you on Friday if it has been chosen to work on in Q1!
If this is not in the work item list, we will still keep your request in our backlog for consideration in our next quarter.

I hope we will be able to work together soon and thank you for taking the time to engage with me/us :)

Metadata Update from @amoloney:
- Issue tagged with: Accepted

5 months ago

@amoloney, what was the result of the Dec. 10 meeting?

Metadata Update from @amoloney:
- Issue untagged with: In Review
- Issue tagged with: 2021Q1

5 months ago

Morning @fche !

We had x6 initiatives in our prioritization review, debuginfo-d placed 5th with stakeholders so it should be worked on in Q1 once we have completed adding Fedora-messaging schemas to our applications and deploying some code for the Flatpak team to help their service. We will hopefully get to your ask towards late Feb or March - or earlier if we can!

I hope this is agreeable to you, we would certainly like to help with this service and will do our best to get it actioned in Spring.

Hi, are things going on track? Will you need me to connect with anyone to plan next steps on prototyping etc.?

Hi Frank,

Short answer, no unfortunately. We have had a few setbacks in the Noggin project that meant the team had to reinstall some components and retest a lot of the tech so we thought we would have some free cycles coming into Feb but in reality we need to spend the remaining month & a half with Noggin to test it properly.

I will reach out to our infra & releng team lead to see if there is anyone in that section of the team who could potentially start this, but with F34 being released soon we are a little tight on free time there too :(

Let me see if there is anything we can do in the next few weeks or not and I will be back to you then to see whats best to do.

I really do appreciate your patience and thank you for reaching out!

Aoife

@fche could you add a meeting invite for both of us sometime next week to speak about this?

I was thinking the easiest way forward may be to simply help you onboard debuginfod into our openshift and giving you the access there to be able to maintain it yourself.

Sure thing.

By the way, our friends at Debian just stood up a server. We're in ongoing discussion to help them technically and to address user questions.

https://lists.debian.org/debian-devel-announce/2021/02/msg00003.html
https://lists.debian.org/debian-devel/2021/02/threads.html#00262

Metadata Update from @amoloney:
- Issue status updated to: Closed (was: Open)

2 months ago

Initiative moved to a Request for Resources with all needed infra provided to requestor. Closing this ticket from initiatives repo.

Metadata Update from @amoloney:
- Issue status updated to: Open (was: Closed)

2 months ago

Metadata Update from @amoloney:
- Issue status updated to: Closed (was: Open)

2 months ago

Login to comment on this ticket.

Metadata
Boards 1
2021Q1 Status: Done