This is primarily an idea for the Security SIG, but since there's no tracker for that, I'm putting it here for coordination.
{{{
I think we did a pretty good job in responding to CVE-2014-0160, but there's also room for improvement.
One particular need is the ability to get in touch with owners of core components, or if they are not available, provenpackagers with particular security expertise -- and in either case, also testers with a security background.
Maybe we need to have some sort of (opt-in) Fedora Bat Signal for extra-critical and urgent security issues in core packages. We would promise not to use it unless the internet were actually on fire, as it appears to be in this case, and then have (escrowed somewhere?) private 24/7 contact information (phone numbers, SMS).
What do you think? Anyone interested in developing this idea further? }}}
We need to have responders for
and the ability to get at least one person in each role out of bed in the event of an emergency.
Coordination/communications could probably be merged together. Jóhann just mentioned on the security list that removing QA would make sense:
{{{ You can forget including QA in this since maintainers dont provide the testing community with test cases so testers cant quickly through test cases for the affected package and provide the necessary karma.
JBG }}}
So we'd end up with:
On the release engineering side, is there really much to do there? If a package gets enough karma to move to stable quickly, doesn't it move relatively quickly on its own? I wasn't sure if release eng. had to get involved for CVE-2014-0160 (heartbleed).
Coordination and communication can be merged together, but it's important that someone have the coordination role and that they and others know who has that baton.
QA is important, whether Jóhann wants to participate or not. We don't want to send out an update that makes the situation ''worse''. At the very least, we need people to provide bodhi karma, but I'd prefer that we are actually relatively confident that the fixes are valid and do not introduce new problems.
Release Engineering needs to sign the packages and do the "push". Although it looks like packages magically move when they get enough karma, this is actually done by human beings behind the scenes. And they need to stay involved in case there are glitches with that process (as there were in the most recent case).
Totally agree on the incident leader baton. I wasn't aware of the human intervention required on the Release Eng. side.
One thing you brought up in the email thread was the way we're all brought together to solve the problem. Getting into a phone bridge might be troublesome with language barriers and people scattered in different countries. I could see an IRC channel working well but there would need to be that "bat signal" to let folks know that something is going on.
Would it be possible to use a service like PagerDuty for that? It's relatively pricey but I'm sure there are alternatives out there.
I like the idea of a coordinated/planned setup for events like these (even though they are thankfully rare).
Perhaps we could work on a wiki page that outlines steps and process here? I don't think we should make it too heavyweight, just that we establish an irc channel, decide on a leader, and pull in others as needed.
I think its impractical to have cell phone contact info for all maintainers of all core components. We should hopefully be able to pull in needed resources via a network effect. (ie, leader asks for maintainer for foo, others reach out to people who might know them, etc).
It might also help to have responders from different time zones available. Since Fedora is an international effort, this might be an easy way to increase coverage.
Replying to [comment:5 till]:
Right, that was a small part of the delay in the last incident - the maintainers were in Europe and had gone offline. Fortunately a provenpackager (dgilmore) was ready and able to just do it (and the patch was clear and uncomplicated), so that was probably about 30 minutes delay while we made that call.
Do we have an updated proposal here? Or somebody who wants to create the wiki page mentioned in comment:4?
Paul Frields started a draft SOP for security updates at https://fedoraproject.org/wiki/User:Pfrields/Critical_security_update_SOP
More updates later :)
Just polling for an update here.
Can rel-eng people provide update here?
This looks like not discussed since a year and we have now enhanced bodhi for updates. Though this ticket looks to me like how to handle urgent security updates but I think there is also a need to have urgent security updates to be pushed first before normal updates. I think more important steps here is repository compose time and then time to push the security updates.
I see we have a draft page https://fedoraproject.org/wiki/Urgent_updates_policy created by Kevin.
There's a second ticket somewhere for the rel-eng side of actually pushing out the urgent updates (don't have it at hand at the second). This one is about communications flow.
I guess that second ticket is https://fedorahosted.org/rel-eng/ticket/5886
Replying to [comment:12 pnemade]:
Yes, that's it. Thanks!
At today's FESCo meeting we decided to defer voting until mattdm 's starts a conversation with the security team.
Bodhi also needs to allow direct stable pushes again if you want to have any chance of fixing such urgent issues in a timely manner for all supported Fedora releases.
Per today's meeting, removing from meeting agenda until that conversation with the security team happens.
@mattdm Do you have any update on the conversation that was supposed to happen?
@mattdm Any updates here?
I think we can take this off the FESCo radar and keep it moving on the Security Team.
@mattdm, Is there any ticket opened or email sent to Security Team that we can link here?
It has been 10 months since this ticket had meaningful updates. Closing as WONTFIX.
Log in to comment on this ticket.