#1061 [spike] : investigating export moin wiki pages to static content
Opened 2 months ago by arrfab. Modified a month ago

Based on the centos Docs SIG meeting day in Brussels : we can try to have a parallel wiki instance from which we can try to extract static html pages and then decide to keep it (or not) as wiki archives
Some links :

https://git.autistici.org/ale/crawl/
https://github.com/iipc/warc2html


Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: investigation

2 months ago

Metadata Update from @arrfab:
- Issue assigned to arrfab

2 months ago

not possible to assign multiple "assignees" on a pagure ticket but for awareness, I'll work on this with @dcavalca to produce a PoC and then propose it as an archive solution for the Docs SIG (based on a discussion with @shaunm )

deployed two ec2 instances for this and @dcavalca will be able to test things.
On the first one the moin role will be applied (c7 ec2 instance) and actual anonymized data imported, to test export to html static files
That's the role of the second ec2 host (c9s)

I have an export solution that seems to work well and is currently chewing through the existing content. Assuming this goes well, should have something ready for polishing sometime tomorrow.

Had a quick look and seems really good enough (for a PoC)

@dcavalca , @shaunm ^ worth sending a mail to centos-docs list to now discuss the plan and point to the PoC ? (ideally we can use the other wiki-archives node for this, instead of personal page on gitlab.io but fine for a PoC)

hey @dcavalca and @shaunm : tempted to close this ticket and reclaim the deployed ec2 instances that were just used as a PoC for this.
I guess the plan was now to start a thread on dedicated centos-docs list and list actions there ?

I talked to @shaunm about this yesterday, he's going to post to centos-docs@ soon. The next steps here would be to disable edits on the wiki, do another dump and put it on the dev instance, and do another crawl to catch any edits that happened since then. Once that's done we can start productionizing the static archive.

Well, I'd say that this was just about the spike/PoC but then explaining the plan and when you want to see it going live : the sooner the wiki is offline, the better, but ideally let's have a consensus through a thread on centos-docs list :-)

Login to comment on this ticket.

Metadata