#6750 Help! We need some kind of search solution for docs
Opened 2 years ago by mattdm. Modified 2 months ago

Good news: Fedora Quick Docs is starting to shape up with real content. See https://docs.fedoraproject.org/quick-docs/en-US/index.html.

Bad news: browsing that horrible unsorted list on the left is the only way to find different things other than using an external search engine. As this grows, it's going to get worse and worse.

More bad: that list itself gets indexed on every page, so, oops, external search engines are overwhelmed and can't always find the right page.

We need some kind of search solution of our own, be it based on ElasticSearch or Solr or whatever. I'd be perfectly happy with something-as-a-service or hosted (open source) — I understand these things to be rather complicated to set up and maintain, so that might be most efficient. But I'm not set on any particular approach, as long as it is open source and works.

cc @bex @ryanlerch @jperrin


This is in fact one of the goals of the upcoming infrastructure hackathon.

We want to stand up a test/proof of concept instance of a ElasticSearch and see if it will meet our needs. Of course any other ideas or input on what we could use is welcome also.

This might be better as a thread on the mailing list which we could revisit after the hackathon and see where we are than a ticket.

Well, we didn't get to that at the hackathon (some key folks were not there).

Lets see if we can discuss in a meeting and get somewhere.

Metadata Update from @kevin:
- Issue priority set to: Next Meeting

2 years ago

I suggest we can look at outsourcing hosting Elasticsearch and Kibana to the experts: https://www.elastic.co/cloud/as-a-service .
We can maybe see if they've got some discount program for FOSS programs or the like, but otherwise, given the amount of data we have for searching/indexing per our current plans, we can see whether we can just pay the price (it starts at $45/mo... Which is not that much I'd say).

I also wonder if Algolia (we are using it on the Antora docs test site) or something from Amazon's Elastic service would also be options. Both are also FOSS (and in Amazon's case known to be Fedora) friendly.

Would an outsource work for other infra Elastic needs too?

As for kibana, it is not a docs need. CommOps will probably need it for stats presentatoin, but th eneed is defined by the tooling in their case.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Next Meeting)

2 years ago

While I think the need is real and still present to this day, now that the CPE has given itself a mission statement, I wonder if hosting and maintaining such a service fits in the scope of that mission.

What do people think?

I'd like to work on this, docs really needs search. Is there a preference for the backend of this?

While I think the need is real and still present to this day, now that the CPE has given itself a mission statement, I wonder if hosting and maintaining such a service fits in the scope of that mission.

Algolia is already managed and has an OSS tier. Elasticsearch / Solr / Yacy could easily fit on communishift.

We need some kind of search solution of our own, be it based on ElasticSearch or Solr or whatever. I'd be perfectly happy with something-as-a-service or hosted (open source) — I understand these things to be rather complicated to set up and maintain, so that might be most efficient. But I'm not set on any particular approach, as long as it is open source and works.

How about we use JavaScript based full-text searches? Those are pretty lightweight. An example of those can be seen at MkDocs Python package.

I have an Algolia demo here: https://mymindstorm.fedorapeople.org/docs-search-demo/ (it will break when my trial runs out in 14 days)

The index is loaded with a crawl of docs.fp.o. It looks like they do public documentation sites for free. This is likely the simplest solution. https://docsearch.algolia.com/

docsearch-scraper config: docs.fp.o-scrape.json

@mattdm in relation to search we have a POC in place above that looked pretty good, WDYT?

@lgriffin Thanks for the ping -- missed this earlier. Works for me. It'd be lovely to have an open-source solution, but my vote is for getting something in place now and if someone wants to work on a open source / free software replacement, awesome, but let's not hold up for it. Lack of search is really painful.

Ok, looking more into this, the clients seem to be FOSS but the server/API itself, I'm not sure (still trying to figure it out)

Ok, looking more into this, the clients seem to be FOSS but the server/API itself, I'm not sure (still trying to figure it out)

That is my understanding, the frontend and docsearch (thing that populates Algolia index) are open source, but the actual search backend is closed source

if someone wants to work on a open source / free software replacement, awesome

I can try and experiment with a Solr instance when communityshift comes back up.

Who should communicate with Algolia? I would do it, but one of the requirements to talk with them is to either be the owner of the website or have permission to update it's content.

@mattdm did you see the question from @mymindstorm above?

ie:

Who should communicate with Algolia? I would do it, but one of the requirements to talk with them is to either be the owner of the website or have permission to update it's content.

I just noticed this, but it seems like there's a separate effort on the docs side.

https://pagure.io/fedora-docs/docs-fp-o/issue/2

Login to comment on this ticket.

Metadata