Based on the centos Docs SIG meeting day in Brussels : we can try to have a parallel wiki instance from which we can try to extract static html pages and then decide to keep it (or not) as wiki archives Some links :
https://git.autistici.org/ale/crawl/ https://github.com/iipc/warc2html
Metadata Update from @zlopez: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: investigation
Metadata Update from @arrfab: - Issue assigned to arrfab
not possible to assign multiple "assignees" on a pagure ticket but for awareness, I'll work on this with @dcavalca to produce a PoC and then propose it as an archive solution for the Docs SIG (based on a discussion with @shaunm )
deployed two ec2 instances for this and @dcavalca will be able to test things. On the first one the moin role will be applied (c7 ec2 instance) and actual anonymized data imported, to test export to html static files That's the role of the second ec2 host (c9s)
moin
I have an export solution that seems to work well and is currently chewing through the existing content. Assuming this goes well, should have something ready for polishing sometime tomorrow.
Export was successful and I have a preliminary archive up at https://dcavalca.gitlab.io/wiki-archive (sources: https://gitlab.com/dcavalca/wiki-archive).
Had a quick look and seems really good enough (for a PoC)
@dcavalca , @shaunm ^ worth sending a mail to centos-docs list to now discuss the plan and point to the PoC ? (ideally we can use the other wiki-archives node for this, instead of personal page on gitlab.io but fine for a PoC)
hey @dcavalca and @shaunm : tempted to close this ticket and reclaim the deployed ec2 instances that were just used as a PoC for this. I guess the plan was now to start a thread on dedicated centos-docs list and list actions there ?
I talked to @shaunm about this yesterday, he's going to post to centos-docs@ soon. The next steps here would be to disable edits on the wiki, do another dump and put it on the dev instance, and do another crawl to catch any edits that happened since then. Once that's done we can start productionizing the static archive.
Well, I'd say that this was just about the spike/PoC but then explaining the plan and when you want to see it going live : the sooner the wiki is offline, the better, but ideally let's have a consensus through a thread on centos-docs list :-)
@dcavalca , @shaunm : any feedback on spike ? can I shutdown the nodes that were used for the test ? any documented process about archiving wiki and when we can proceed ?
Yes, this was discussed in the latest board meeting and I forgot to post an update here. Here's the game plan:
@dcavalca , @shaunm : just revisiting open tickets and wondering if we can just schedule it at one point and announce migration. Other thoughts : why not just importing in a git repository (on git.centos.org) that content so that even when it will not be "moin" powered, we can still eventually (through PR or just git commits) update content (just thinking about SIGs that haven't opted-in for the other doc system and would need in a hurry to reflect a simple change).
bump ?
As discussed in today's board meeting (https://git.centos.org/centos/board/issue/91), let's move forward with this. We need infra to lock edits on the wiki and take another snapshot and update the copy on wiki.dev.
Then I'll re-run the scraper so we have an updated archive and update https://gitlab.com/dcavalca/wiki-archive with it. We can either serve that from gitlab, or host it on a CentOS instance (your call); once that's settled infra will need to repoint wiki.centos.org to wherever the static archive is hosted.
We should also preserve the moinmoin backup somewhere before sunsetting the old instance, so that we can potentially improve the static archive in the future if we find a better way to convert it.
@dcavalca : thanks for the update. Diving into afk/pto mode myself but so can we revisit that in August ? I'll have to redeploy tmp ec2 instances, as the previous ones were automatically discarded by duffy ci (they were there for tmp tests and not supposed to be remain online for a long time but easy/fast to redeploy)
Sure, we can do this when you get back. Thanks!
As investigation showed us, and PoC was a success, closing this one as we'll track the prod migration in #1245
Metadata Update from @arrfab: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.