#2 Add Search
Opened 2 years ago by bex. Modified a month ago

There is some testing going on with https://github.com/smitthakkar96/ascii_binder_search_plugin

Which is being looked at for Gluster.org

Testable here: https://bex.fedorapeople.org/fedora-docs-web/search.html (but not enabled at the moment)


So what about search? Maybe borrow the engine from Python Sphinx?

Since we are moving to Antora, we are looking at different options.

The current Antora test site has search wired in via Algolia. We are looking at ultimately using a fedora-infra provided Elastic search backend, however this is still not finalized. We need to find some people interested in Elastic to help move that forward faster ... for now. Elastic is preferred as there are other needs for it in Fedora so we can share the resource.

https://bex.fedorapeople.org/antora-test/fedora/rawhide/index.html

@bex if it's any consolation, I've done work with ElasticSearch in setting up an ELK stack.

I've never used ES as a backend for a search tool, but I'd be happy to help any way I can.

@tjzabel that would be amazing! I'm tracking search on the project's Taiga board: https://taiga.fedorainfracloud.org/project/asamalik-antora-for-docs/epic/4

What would be the area you'd be interested to help here? The infrastructure (setting up an ElasticSearch instance)? The integration into the website and making that work somehow? UI?

@asamalik I can help with setting up the ES instance at least.

I've worked with setting up an ELK stack to parse log data. Getting ES set up to be the backend search tool for a website is a challenge I don't have experience with, but would like to help try to complete.

ES likes to work with JSON, so I'd need to figure out the best way to parse the docs pages into a format that ES likes to use. I don't think it can be fed raw HTML. All "How-To's" I can find for using ES as a search engine skip this step and use the data directly as JSON. Shouldn't be too hard to figure out in theory.

If I have time this weekend (I should), I am going to work on creating a proof-of-concept to figure out the best way to do this.

Should we be considering a hosted solution for this? Either a hosted Elastic or something like Algolia?

So the Antora docs actually use Algolia — so it's been proven it works with Antora. https://docs.antora.org/antora/1.1/

Hmm, it seems they do make use of Algolia. It seems Algolia can definitely work, and we seem to have a little more progress done with the Algolia side of things.

I suppose it comes down to whether or not there would be more use from ES or Algolia in the long-term? @bex was mentioning other Fedora communities could make use of ElasticSearch with the full ELK stack.

From what I can see, Algolia would be quicker to set up (since it looks like it's already kinda wired into the test site), but ES would be used for more than just the doc search functionality.

What's the end goal we want to achieve?

What if we just make it work with Algolia (hopefully) quickly and then we can figure out if we need a self-hosted elasticsearch and eventually move?

Hmm, that might be the best way to go about this. Usually I like to go with the option that is better for the long-term, but in this case it may be best to work with Algolia for now.

That way, we can get doc search up and running so people can start using the search functionality, and we can take our time figuring out the best implementation of ES if we decide to pursue that further.

I have heard that Fedora infra is thinking about bring up an Elastic stack and they probably want help. That said, getting docs search faster is probably useful and I'd prefer to not see us block on other issues indefinitely.

I have a few ideas which an Elastic stack would be useful for (i.e. community metrics) but I also know that we have no one volunteering to do them right now so they shouldn't block this.

Perhaps a pointed question to infra would give us a way to move forward?

I imagine that using Algolia is a good temporary solution since the search functionality is desperately needed and could potentially turn off some new users looking to switch to Fedora.

About JSON for search consumption, it could be a good idea to see if it could be done upstream in Antora. Maybe using something like https://github.com/weixsong/elasticlunr.js

You can see how lunr.js worked in gitbook cli https://github.com/GitbookIO/gitbook

mdBook (similar, just in Rust) uses https://github.com/mattico/elasticlunr-rs now

If a temporary solution is what we're looking for, a search box that does a site specific duckduckgo search could be used too maybe? (in duckduckgo: site:docs.fedoraproject.org %searchterms?

References:
https://jonbeebe.net/2017/07/duckduckgo-site-search/ and https://duckduckgo.com/search_box

Any update on this please? We're launching the new AskFedora soon (in 2 weeks if the current plan is implemented correctly) and we're going to be pointing lots of users to quick-docs for commonly asked questions. Not being able to search quick docs will be quite a frustrating experience for end-users, especially as quick-docs grows?

Update

@asamalik @bex The best way to get search up and running is via Algolia DocSearch.

This makes use of an Algolia backend to crawl Fedora's doc pages, and create an Algolia index. This eliminates the need for us to convert our HTML pages into JSON, which Algolia still needs. At that point, we would simply add the search functionality into the UI.

How to get Search Up and Running

There are two ways of working with DocSearch:
1. Apply to DocSearch with Fedora docs url
2. Host our own DocSearch instance

Apply to Antora DocSearch

We give Algolia our documentation page root url, and they give back the JS snippet to add into our website's search field UI. Algolia will take care of parsing the HTML, and storing the Algolia index into their own instance, which is updated daily.

Host our own DocSearch Instance

Need to set up our own Algolia app ID and API key, create config, and run docsearch.

Thanks for looking into that @tjzabel!

It looks like we might be getting an Elasticsearch instance that we could use. The benefits would be having more flexibility for example for quickdocs — potential filtering by labels and other things.

But we're discussing this for some time already and I really want search to appear as soon as possible. So let's time box this: if I won't have an action plan for Elasticsearch by 20 March, I'll deploy Algolia.

We might need to pay Algolia a subscription fee because Fedora might not qualify as a "small personal project" to get the free tier, so I'd need to open a Council ticket to ask for funding if we go that route.

@tjzabel Would you be interested in helping me with making the search work using Elasticsearch?

@asamalik in terms of an Algolia subscription, it's totally free if you add a small "powered by Algolia" snippet somewhere in the search box. An example of this is seen on the Vue.JS page.
If that's not an option, then yes, we would have to host it ourselves, and open up an account.

In terms of ES if that route is chosen, I may be able to help a bit to get that working. I'm currently spread a little thin until mid-May. I can commit to doing some preliminary research for now.

I have prototyped search using Elasticsearch for the Fedora Docs. I'm able to load all pages into the database and search it from within the Docs.

The client code: https://pagure.io/fedora-docs/fedora-docs-ui/blob/master/f/src/layouts/search.hbs

Testing data: https://github.com/asamalik/fedora-docs-search-prototype/blob/master/load_testing_data.sh

I learned quite a lot on the way. I have ideas how we could enable tag-based search for quick docs and other things.

The only thing that blocks me from deploying it to staging is having an Elasticsearch instance. Good news is that Amazon may be willing to donate us one. Once I have it, I'll be able to do some testing in stg and not long after that deploy it to production.

Before that happens, at least a screenshot:

search-preview.png

So, are we OK going forward with Elasticsearch?

Sounds great. Thanks for the work! It'd be great to get that in as soon as
possible!

On Mon, 25 Mar 2019, 12:03 Adam Samalik, pagure@pagure.io wrote:

asamalik added a new comment to an issue you are following:
``
I have prototyped search using Elasticsearch for the Fedora Docs. I'm abl=
e
to load all pages into the database and search it from within the Docs.

The client code:
https://pagure.io/fedora-docs/fedora-docs-ui/blob/master/f/src/layouts/se=
arch.hbs

Testing data:
https://github.com/asamalik/fedora-docs-search-prototype/blob/master/load=
_testing_data.sh

I learned quite a lot on the way. I have ideas how we could enable
tag-based search for quick docs and other things.

The only thing that blocks me from deploying it to staging is having an
Elasticsearch instance. Good news is that Amazon may be willing to donate
us one. Once I have it, I'll be able to do some testing in stg and not
long after that deploy it to production.

Before that happens, at least a screenshot:

search-preview.png

So, are we OK going forward with Elasticsearch?

``

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-docs/docs-fp-o/issue/2

@asamalik this is great!! It's much simpler than I had anticipated, just needed that python script to pull the html as JSON :)

So, are we OK going forward with Elasticsearch?

+1 from me. Thank you so much for working on this, it looks awesome.

It's taking a little longer than I'd like, but I have a prototype ready: http://docs.horsefunerals.co.uk

Still need to work on the scripts behind the scenes and then work with the Infra team to get it into production.

Looks absolutely brilliant!! I only managed to check it out on my phone. The search box may need a css tweak.
Screenshot_20190417-183741.png

Hey @asamalik! Have there been any updates on the search functionality? The prototype looks great ^.^

I agree there could be some stylistic changes that could be done to the actual search dropdown, but I'm not sure that qualifies as a blocker for this. With that being said, are there any specific action items that can be worked on in order to get this out to the main docs? I would love to see this out, as there are some stray docs that can only be found through a direct link :(

We got asked about this on AskFedora:

https://ask.fedoraproject.org/t/search-for-topic-in-fedora-documentation/7564/2

Any updates here? Could we temporarily include a search box that uses DuckDuckGo to do a site specific search perhaps?

@ankursinha I actually just raised this with @amoloney yesterday as a potential request to the CPE team. I don't have an update but I'm thinking about it too :)

Login to comment on this ticket.

Metadata
Attachments 2