#9867 RFE: data store and/or messages and/or server for release cycle information
Opened 8 months ago by adamwill. Modified 7 months ago

  • We've been talking about this informally for years, but I figured it'd be a good idea to get a proper ticket somewhere. We have all sorts of things in Fedora that should happen at specific points in the release cycle. The most simple example is things that should happen when a release is made, or when a release goes EOL. But there is no good "source of truth" for release cycle events and information. There is no authoritative source, exactly, you can go to and find out that right now, the stable Fedora releases are 32 and 33. There was no fedora-messaging message emitted when the Fedora 33 release event happened. We have various approximations that are used by various things, but none of them is entirely satisfactory; you can find a lot of discussion about that e.g. in this GNOME Software issue.

This is a thing that should really exist. In my head, right now, it looks vaguely like a sort of small daemon thing that gets poked somehow when events "happen", and when that happens, it publishes a fedora-messaging message and updates a JSON or YAML or whatever data store which is also publicly available. So things can listen out for messages and respond to them, and they can also look stuff up at will in the data store. I dunno if it's worth making the server answer any kinds of queries, or if we can just say "go parse the data if you want a question answered".

The fun parts, I guess, are designing the data store and message formats and deciding precisely what "release events" we have. For instance, "Fedora 33 is released!" is arguably not so much an event as a sort of process that happens over several days, and we might want to represent it as several distinct "events". But I don't know how far we need to get into the weeds on this ticket, so I won't go into a lot of detail on that. I think we should be able to come up with designs for the formats and server that would allow that stuff to be figured out - and changed, importantly - on the fly as we deal with the system and as the release process changes.

Another obvious question is, should this system be somehow directly wired into actually making the events happen? And if not, how do we make sure it provides accurate and up-to-date information? i.e. when a release event happens, how do we make sure that is reflected in the system?

I think making 'release events' fly-by-wire in this way - you don't directly "release Fedora 33", you tell ReleaseBot that we want Fedora 33 released and it simultaneously does all the "release Fedora 33" actions and updates the data store - would be an interesting and useful stretch goal, but possibly if we tried to do it right away we'd never get anywhere. So I think my suggestion would be that first we come up with a viable design for the system and implement that, without worrying too much yet about how to be sure it's accurate, then we look at ways to avoid release events happening without the system being updated (for instance: git hooks in infra ansible?), then we look at the most ambitious idea.

Tagging folks I know are interested in this:

Please tag anyone else you know is interested.

  • When do you need this? Tomorrow! Now! Yesterday!

  • When is this no longer needed or useful? Probobably about as long as we're making Fedora it'd be useful. But we definitely need to think about possibilities like "the release cycle" multiplying into "several release cycles", for instance. We need the data and message formats and the whole workflow to account for those sorts of possibilities.

  • If we cannot complete your request, what is the impact? Alarums, excursions, things continuing to not quite happen when they should or at all or happen wrong.


Some quick design thoughts:

  • We'd want to have arbitrary "products" or "dists" or whatever, of course. The main Fedora product would just be one of them. We'd probably want to have some kind of allowance for dists to have properties, not sure how this would look yet.

  • I can think of at least two concepts we'd probably want to consider: "events" and "states". Often "events" are changes of "state" - are they always? Do we want to account for events that aren't changes of state? How would we think about recording those in the data store? A consumer may well want to know "is release X in state Y?" I think we'd want to account for being able to answer that sort of question in the design, as opposed to - explicitly or by omission - requiring consumers to figure out "states" by querying "events". I see "states" as being potentially overlapping - for instance right now, Fedora 34 is in states "active", "development", "branched", "pre-beta"...

  • There's obviously two big ways you could broadly represent this if we're just looking at a typical dict/hash approach: are the keys states/events and the values releases, or are the keys releases and the values states/events? i.e. do we have:

    34: [active, development, branched, pre-beta]

or:

active: [32, 33, 34, rawhide]
development: [33, rawhide]
branched: [33]

This is probably something where we could really use some formal use cases, I guess.

OK, slightly more detailed design thought...after a sold shower thinking session, I'm thinking about the idea of a "release stream". This is our little server's whole view on the world. A release stream is a sequence of "release events", and release events are changes of "release state". A "release event" is simply a POST or whatever with a release identifier, a state identifier, and an indicator of whether this is an "enter state" or "leave state" event. It may also - I think this is a good idea - include a timestamp of when the event "really happened", to allow for post-facto fixups and stuff. If this isn't included the server just assumes it happened at the time the POST request did.

Our little server receives these events, and publishes messages for them, which are pretty much exactly the event as submitted, with the addition of the information as to whether the release was in the same state before the event or not. So it'd basically publish "33 active 0-1" if 33 was not active but went active, "33 active 0-0" if it was not active and got a "leave active state" event (for some reason), etc. It also writes the events to a file, and reads that file on startup (so we can restart it, move it around, fiddle with the stream behind its back, whatever). The raw event stream file may or may not be published for download, I guess.

The server also generates 'views'. Obviously we're most likely to want current views, but I guess this design makes views at any given point in time possible. A view would be something like the simple formats I suggested above. So I was thinking of a very simple 'view' like that, for a consumer that just wants to figure out, you know, "what are the current development releases?" But it allows us to have both "states by release" and "releases by state" views, easily - the server produces both, you just pick whichever is more convenient to consume. We can also produce more complex views that include time/date information - so a consumer can ask for a more complex view that would show when a given release entered or left each state, for instance, if it needs that information.

Learning the lessons of resultsdb, I intend that the release and state values be freeform. There can be formal or informal conventions about what they ought to be, we can write policies, whatever - but the tool does not enforce them. It accepts any string (maybe with a length limit or something) for each. As it parses the "event stream" it builds up an internal representation of known releases and states for generation of the 'views' (I imagine I'd write it with internal dicts of both releases and states, at least as a first cut, but this is one of those things where the best design might become apparent while you're writing it :>).

However, there's one important rule/convention I'd want to suggest up front: we use an integer to refer to Rawhide. I suggest either a really high number, or - and I think this is my favoured candidate - 0. This is to make it much less painful to deal with 'rawhide' being this kind of joker outlier when all the other releases are integers (Fedora) or at least usually parseable as some kind of number (RHEL? Fedora if we do a 33.1?). All sorts of code winds up with some sort of dodge for dealing with this, like replacing 'rawhide' with 99 or whatever. The idea of making it 0 rather than something high is it's quite hard to think of something that'd be definitely high enough, if we decide to start versioning things based on dates or something wild like that. 0 feels sort of elegantly appropriate for Rawhide in an ineffable way, too. :D Consumers would just have to be aware that if they want to do a query for the "highest" release with some property and they consider Rawhide to be at the high end, they need to take 0 if it's in the range. But at least it would make type issues much less likely (so you could do stuff like if rel > 15 or rel == 0 and not blow up because rel isn't an int, in Python). The 'really REALLY high number' option does make a lot of common cases easier, though.

So I pushed myself over the threshold of thinking about this so much that I want to build it. So I started :D So far I have a thing that can write and read stream files and parse them into the 'simple' per release JSON. I'll bash on it some more and put it up in a Pagure project tomorrow.

One thing I decided while writing it is that the main app (this thing) will only be able to read the history in from file and write out new events, one at a time, timestamped at the time the write function is called. This avoids having to solve a lot of ickiness with how to handle 'historical' writes, and also avoids the problem of what to do with two events marked as having happened at precisely the same time, because that'll be effectively impossible. This keeps the design simple and works for the normal course of events; any 'fixups' we have to do can just be done by hand, since the format of the release stream file is intentionally very simple.

I was hoping we could also replace pdc with this, but it sounds like you want a more narrow approach here. :(

the thing I'm writing is narrower, yeah. The "replace PDC" idea is fine, only it's been mooted for years and hasn't happened yet, so.

If that ever does happen, it should be able to subsume this quite trivially, I'd think. This is being kept intentionally extremely simple, so anything more complex should be able to implement all its functionality easily.

IT'S ALIIIIIIIIIVE

the smallest REST API ever works. Updates the release stream and generates two simple JSON views. TBD: message publishing. To Be Figured Out: authentication (sigh).

to test it, run the script, do something like http post "http://localhost:8000/event?dist=fedora&release=34&state=branched&reached=1", and check the JSON files. (I didn't make the server serve them yet, will do that shortly.)

@kevin I will propose PDC replacement as an initiative for the next quarter. Its not getting prioritized as CPE always has other stuff to work on.

Thanks @adamwill for working on this. I can help you in anything you need in this project.

Metadata Update from @mohanboddu:
- Issue tagged with: dev, high-gain, high-trouble, ops

8 months ago

Update on releasetracker status: I implemented message publishing (untested, but it's pretty simple) and twiddled a few other things. Biggest open question is now what to do about authentication. I documented a few other outstanding concerns in tickets. I'm off till next year after this week, and it's a slow month for most folks, so I think I'll leave it till next year to publicize this a bit more widely and try to get it actually deployed and used.

Login to comment on this ticket.

Metadata
Boards 2
Ops Status: Backlog
Dev Status: Backlog