#183 Decision time: drop old docs?
Opened 6 months ago by mattdm. Modified 2 months ago

Background

See older issue and recent mailing list thread

The old ticket suggests a banner, and in the thread I suggested another site. But, in reading the ticket again, I am inclined to agree that this is causing more harm than good, and that if old reference material is needed, it exists on the Internet archive.

I think the main exception is old release notes, because 1) they've got a lot of valuable history for some technical decisions that sometimes it's useful to refer to and 2) by their nature, they're less likely to get confused for current docs.

Proposal

Step 1

We convert the English-language release notes from their single-page html form to asciidoc programmatically, and add those to the relevant (new, even though the numbers are old!) branches of https://pagure.io/fedora-docs/release-docs-home.

The non-English release notes (where they exist — it seems pretty random) could also get this treatment but I don't want it to block Steps 1-3.

FC1, FC2, and FC3 have separate documents for 32 and 64 bit systems. For simplicity, I suggest we combine them into a single document (one right after the other).

Step 2

Add the following redirect rules

RewriteRule "^(\w\w-\w\w)/Fedora(_Core)?/(\d\d?)/html/Release_Notes.*" "/$1/fedora/f$3/release-notes/" [R=301,L]
RewriteRule "^(\w\w-\w\w)/Fedora(_Core)?/.*" - [G]

Note case-sensitive — the old docs are under /en-US/Fedora.* and the new docs under /en-US/fedora/. This is kind of terrible, but probably had a good reason. :)

This redirects for the release notes pages, and returns http 410 ("Gone") for the rest. That theoretically makes search engines drop the pages faster than 404.

Optional: the language codes for the old docs are always in the form aa-AA. The new site uses that for some, but just aa for most. Add (separate) rewrite rules for all of those.

Step 3 (optional)

Make a custom 410 page which explains that older documentation can be found on the Internet Archive, ideally with a link.

Also while we're at it, fix the 404 error page on the docs site.

Steps 4+ (optional)

  1. Resurrect non-English release notes.
  2. Update any old docs which actually seem useful.
  3. Drop old docs from being published to mirrors, simplify things for infra team
  4. Anything I've missed?

Decision

I think this has mostly stalled for lack of anyone feeling like they have enough authority to do this. Well, let's go ahead and declare such authority. Does anyone have any objection to this overall plan? If you do, please speak up quickly. I'm going to post this to the docs list... if there's no significant opposition in, say, two weeks, let's move to working out whatever details remain and then do it.


So, as part of this I would really really love to drop the old docs from our proxies. :( It takes up about 20GB and it's a pain because we have to hardlink it to avoid filling up disk space on proxies.

The process currently is a messy set of rsyncs that combine the old docs, the new docs and docs-redirects into one tree of files that we serve.

So, hopefully we can just drop that tree from the process, but I want to make sure we do that. ;)

I'll edit that in to step 4.

Here's the old docs converted (very quickly with a script) into asciidoc: https://fedorapeople.org/~mattdm/misc/old-release-notes-adoc.tar.xz

These need to be put into branches of https://pagure.io/fedora-docs/release-notes (with the corresponding framework), and also I guess https://pagure.io/fedora-docs/release-docs-home updated?

Also those converted docs should have a "This is old material for reference only" bit added at the top. I didn't do that.

Can someone other than me volunteer to drive the next steps of this? I have too many ideas right now and not enough follow-through. :)

A couple questions here:

  1. Where are the old docs stored? Are they just put on the docs.fp.org server and site there (i.e. not a part of the antora setup at all)
  2. All old docs other than release notes are going to be dropped?
  1. Not sure where they are physically stored -- but they're not part of the antora setup as I understand it.

  2. That's the idea, yeah. If there's anything useful in there that hasn't been updated yet, time to do it. If it turns out someone was using something, we can see what we can do to get it up to date.

The old docs are in https://pagure.io/fedora-docs-web.git all 4.3GB of them.

We clone that out on sundries01, then rsync it to proxies.

It's a kind of anoying dance there, what we do is:

  1. rsync the old docs (4.3GB)
  2. rsync docs-redirects and old docs to a 'docs combined'
  3. rsync new docs and docs-redirects and old docs into 'docs combined'

(this allows us to hardlink things so the old docs + docs combined doesn't take up 8.6GB + new docs)

Metadata Update from @ryanlerch:
- Issue assigned to ryanlerch

4 months ago

ok, i'm going to start having a crack at this -- starting at step 1.

Going to Do the following steps for Fedora 25 release notes:

  1. Branch (from rawhide/master) into a new branch f25 in the release notes repo https://pagure.io/fedora-docs/release-notes/
  2. Convert the html-single from the publican output (aka the old docs) into asciidoc output, and put it in the f25 branch.
  3. Create a f25 branch of https://pagure.io/fedora-docs/release-docs-home/ and update the index.adoc to only talk about the release notes.
  4. add the two f25 branches created in 2. and 3. to the proper locations in https://pagure.io/fedora-docs/docs-fp-o/blob/prod/f/site.yml

I realize I'm a couple months late here, but assuming @ryanlerch hasn't put a significant amount of work into this yet:

  • Issue no. 1 is keeping these old docs out of search results. I see the repo has a robots.txt that excludes some old IPA docs that people were complaining about earlier. I know those are excluded from searches - is it based on this file or is that handled somewhere else? If it does, why don't we just add en-US/Fedora/26}/* etc. and a few more lines for the other URLs, and fix that problem immediately?

  • As for reducing space taken up by old docs - each doc we published using publican back in the day has a PDF version. How about we just take all those PDFs, put them into one big tarball, put that in a normal repo as an attachment, and link to it for people who want to download the archive? Actually it doesn't even need to be one big tarball, we could just point them to a pagure repo where they could grab anything they wanted. That way we wouldn't need to worry about asciidoc conversions, and Kevin should be able to get rid of the whole rsync process since everything would be in the current system.

  • Issue no. 1 is keeping these old docs out of search results. I see the repo has a robots.txt that excludes some old IPA docs that people were complaining about earlier. I know those are excluded from searches - is it based on this file or is that handled somewhere else? If it does, why don't we just add en-US/Fedora/26}/* etc. and a few more lines for the other URLs, and fix that problem immediately?

That makes sense to me.

  • As for reducing space taken up by old docs - each doc we published using publican back in the day has a PDF version. How about we just take all those PDFs, put them into one big tarball, put that in a normal repo as an attachment, and link to it for people who want to download the archive? Actually it doesn't even need to be one big tarball, we could just point them to a pagure repo where they could grab anything they wanted. That way we wouldn't need to worry about asciidoc conversions, and Kevin should be able to get rid of the whole rsync process since everything would be in the current system.

Yeah, I'd even go as far as saying "let's throw them on the Internet archive and provide links to that for anyone who needs the old docs".

Login to comment on this ticket.

Metadata