#9858 rework https://www.fedorastatus.org
Closed: Fixed 3 days ago by ryanlerch. Opened a month ago by kevin.

There are many things I adore about our current status site ( https://www.fedorastatus.org , https://github.com/fedora-infra/statusfpo )

Good things:

  • It's simple: it's just synced to a s3 bucket and served from there
  • It's completely disconnected from the rest of our infra. If everything else of ours is down, as long as amazon is up, we can provide status.
  • It auto refreshes
  • It has a nice handy rss feed so you can see when outages have started/ended.

But there are some things not to like:

  • It has a incomplete list of our services. There is no way we could list all possible services here, and even if we did, it would be confusing to users because they have no idea how things are connected in our infrastructure. Does 'bodhi' mean something they care about? If our openvpn is marked down, what other things does that affect? Why doesn't this list bugzilla or rhn (hint: we don't run them, but our users may use them and think we know)
  • people somehow think this status page is automatically updated. When they hear it's not they immediately make some suggestion to hook it up to our monitoring. That is not at all what we want to do.

The goal of the status site is to provide a single location our stakeholders can look to see if some issue they are seeing is KNOWN AND BEING WORKED ON. This helps us because then less people ping us on irc or file tickets for something we are already working on, and it helps users because they can see whats affected and know when it's back.

So, I would like to:

  • Drop all the service names. All of them.
  • Have a intro paragraph or something that explains that this is a list of KNOWN AND BEING WORKED ON issues.
  • A list of any current issues
  • Some kind of list of previous issues
  • A way for admins to describe an issue.

So, what we would have then is something much more free form, but hopefully more useful to users too. Instead of say a screen with outage next to several names, we would have a "We are seeing an outage with updates, buildsystem and account system. This is being investigated in ticket XYZ" and when closing an outage we could have "This outage is over, root cause was a switch reboot, more details in ticket XYZ"

For scheduled/planned outages we could have a 'Planned outage for X, see outage ticket XYZ for more information"

I think this setup will make things nicer. If that proves hard to hack into our current status setup, we could look at moving to a open source status app. However, that means we have to deploy it somewhere, manage it, etc.

CC: @ryanlerch expressed interest in working on this.


Metadata Update from @mobrien:
- Issue assigned to ryanlerch
- Issue tagged with: dev, high-trouble, medium-gain

a month ago

Metadata Update from @mobrien:
- Issue tagged with: mini-initiative

a month ago

How much do you all value the "run a script to update JSON files" workflow? With the requirement of having more free-form statuses, we may have to look at using RST or markdown files checked into git.

I'm assuming hand-editing HTML is out of the question here, but how about markdown or RST?

markdown would be perfect..

+1 for markdown or asciidoc

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

a month ago

markdown would be fine... prefer seperate files per outage tho or one big doc could grow big and anoying over time...

OK, here is a proof of concept i came up with:

https://github.com/ryanlerch/status-playground

Basically is uses python-pelican to create the status website. I took this approach because it does a handful of things for us automatically (like easy syncing to s3, RSS feeds). It is also python, and by default uses jinja as the templating engine, which is somewhat of a defacto standard for infra apps at the moment.

Although pelican is typically used for generating blogs, with a bit of config and templates, i have it simplified down to two pages.

please check it out

At the moment, the only thing i think i am missing from @kevin 's list above is the autorefresh. but that is a pretty easy addition.

Awesome work!

I see some warnings:
WARNING: Feeds generated without SITEURL set properly may not be valid
--- AutoReload Mode: Monitoring content, theme and settings for changes. ---
WARNING: Watched path does not exist: /home/kevin/git/github/status-playground/content/images

Is there a link to the rss feed?

nit: instead of asking to file a ticket and pointing to infra tickets, how about pointing to: https://docs.fedoraproject.org/en-US/cpe/day_to_day_fedora/

I assume if there was some kind of security issue somehow in pelican, it would depend on the version installed by the last person to update it as to how up to date it is?
(and not that they could do anything except possibly overwrite our files).

The resolved outage list might get long, but that should be ok... if people look there they can just deal with a big page. :)

This looks super to me. Perhaps mail infra list and ask for wider feedback before we roll it out?

Awesome work!

I see some warnings:
WARNING: Feeds generated without SITEURL set properly may not be valid

Yeah, this one is expected -- there is an additional "publishconf.py" which sets the SITEURL when running a publish command. We didnt set this in the main pelicanconf.py, otherwise links in the devserver wont work (when just testing changes out)

--- AutoReload Mode: Monitoring content, theme and settings for changes. ---
WARNING: Watched path does not exist: /home/kevin/git/github/status-playground/content/images

I'l look into this one.

Is there a link to the rss feed?

Yeah, the rss feeds are linked in the <head>, just like on the current status.fp.o (i can add links in the body of the HTML somewhere too if we want)

either way, the feeds are currently set up to be /feeds/ongoing.rss /feeds/planned.rssand/feeds/resolved.rss`

nit: instead of asking to file a ticket and pointing to infra tickets, how about pointing to: https://docs.fedoraproject.org/en-US/cpe/day_to_day_fedora/

Will update this.

I assume if there was some kind of security issue somehow in pelican, it would depend on the version installed by the last person to update it as to how up to date it is?
(and not that they could do anything except possibly overwrite our files).

we could probably put a check in the makefile to just require a certain minimum version of pelican? (not sure how hard this is, but will look into it)

The resolved outage list might get long, but that should be ok... if people look there they can just deal with a big page. :)

Yeah, in theory we can enable pagination -- although not sure how useful that will be without a search function on a static page. I was leaning towards having a single page still, but grouping them a little better -- maybe chunking them by month?

This looks super to me. Perhaps mail infra list and ask for wider feedback before we roll it out?

too easy -- will implement some of these changes and hit up the list.

Been testing the publishing workflow too -- but with github pages rather than s3 -- but it works:

https://ryanlerch.github.io/status-playground/

feel free to open issues in the repo too if there are more features to add (or just list them here -- happy for either approach)

A few updates on the outstanding items from @kevin

--- AutoReload Mode: Monitoring content, theme and settings for changes. ---
WARNING: Watched path does not exist: /home/kevin/git/github/status-playground/content/images

I'l look into this one.

Fixed this. By default pelican watches for an images folder in content -- we wont need this, so changed a setting to fix this warning.

nit: instead of asking to file a ticket and pointing to infra tickets, how about pointing to: https://docs.fedoraproject.org/en-US/cpe/day_to_day_fedora/

Will update this.

Updated.

I assume if there was some kind of security issue somehow in pelican, it would depend on the version installed by the last person to update it as to how up to date it is?
(and not that they could do anything except possibly overwrite our files).

we could probably put a check in the makefile to just require a certain minimum version of pelican? (not sure how hard this is, but will look into it)

Version comparisons in the Makefile was kinda fiddly to get right -- so created a small pelican plugin in our repo to check against a minimum pelican version, and blow up the build if below that.

The resolved outage list might get long, but that should be ok... if people look there they can just deal with a big page. :)

Yeah, in theory we can enable pagination -- although not sure how useful that will be without a search function on a static page. I was leaning towards having a single page still, but grouping them a little better -- maybe chunking them by month?

Got these chunking nicely. If it gets to unweildy and large in the future, pagination shouldn't be too difficult. (its already baked in to Pelican, its just a matter of tweaking the theme logic)

How should we proceed from here?

Is this something that we want to roll out?

Obviously one item missing here is some better documentation on how to use this new system.

Yeah, docs would be nice.

Can we make a branch with this or a new repo and point it to a new s3 bucket and then once we are satisified it's working as we like, we can then swap dns over to make it live.

Sound reasonable?

Sounds perfect.

I have added the code to this branch in my fork of statusfpo:

https://github.com/ryanlerch/statusfpo/tree/main

(i dont have privs to add a new branch to this repo)

I created it as an orphan branch, i figured we keep the old version around in this repo as a different branch, right?

Also, i found this repo on pagure: is it an old version of status.fp.o?

https://pagure.io/fedora-status

FYI, here are a couple of things that i'm working on for this at the moment:

  • adding more of the historical outages (i'm just going through the outage tag on the infra tracker)
  • removing some of the superfluous targets and other cruft from the makefile.
  • updating the UI to use clientside JS to display datetimes in the user's timezone

Sounds perfect.

I have added the code to this branch in my fork of statusfpo:

https://github.com/ryanlerch/statusfpo/tree/main

(i dont have privs to add a new branch to this repo)

I created it as an orphan branch, i figured we keep the old version around in this repo as a different branch, right?

Yeah. We can get you perms to rearrange the branches...

Also, i found this repo on pagure: is it an old version of status.fp.o?

https://pagure.io/fedora-status

Yep. I'd say we could delete this or at least put a big warning there and point to the github one.
I think we orig had it there, but then realized it makes this have a dep on pagure, which we do not want.

FYI, here are a couple of things that i'm working on for this at the moment:

  • adding more of the historical outages (i'm just going through the outage tag on the infra tracker)

The current status has a rssfeed, it might have all the old historical outages if you want to grab them.

  • removing some of the superfluous targets and other cruft from the makefile.
  • updating the UI to use clientside JS to display datetimes in the user's timezone

Awesome.

Okay, updated my the main branch on the fork with all the historical outages from the outages tickets -- there was really no way to automate that, so did it by hand.

have also updated the makefile to be a lot simpler.

i think we are almost ready to try this out -- is is possible to make the new main branch in the repo?

We could. Can we/should we use a new s3 bucket? Or should we just reuse the existing one and if we need to roll back we just redeploy the old site?

I think the only ones who will notice us rolling this out will be people who have the rss feeds. Are the new and old feeds the same? Or can we add a link in so the old subscribers don't need to unsubscribe/resubscribe?

If we can fix that, I think it's fine to roll this out anytime you like...

We could. Can we/should we use a new s3 bucket? Or should we just reuse the existing one and if we need to roll back we just redeploy the old site?

I think the only ones who will notice us rolling this out will be people who have the rss feeds. Are the new and old feeds the same? Or can we add a link in so the old subscribers don't need to unsubscribe/resubscribe?

The feeds are currently a little different. The current site has "changes.rss" which provides updates like all good, and there is an issues, etc.

The new setup has feeds for the three categories (scheduled, ongoing, and resolved), which is a little different from the current setup.

If we can fix that, I think it's fine to roll this out anytime you like...

okay, updated, adding back a 'changes.rss' (in the same location as the current RSS feed), and this feed is the list of all the outages / posts.

Awesome. I think we can deploy whenever... perhaps I can get you the aws token out of band and you can just push things live? Or should we schedule some time to do it?

Okay, we are deployed...

just an issue with the fonts and CSP at the moment.

Font issue resolved.

Going to close this issue off now -- please file any additional bug reports / issues / feature requests to the repo:

https://github.com/fedora-infra/statusfpo

Metadata Update from @ryanlerch:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 days ago

Login to comment on this ticket.

Metadata
Boards 1
dev Status: Backlog