Issue #1055: Fedora Search Engine - fedora-infrastructure

fedora-infrastructure

#1055 Fedora Search Engine

Closed: Fixed None Opened 15 years ago by mmcgrath.

So Fedora needs a search engine. Here are the requirements as I see them:

Crawl the websites
Search the websites

Preferences:
* Python based
* Allows programmable keywords [1]
* Has some sort of xml or library interface so other applications can use it

[1] Allow us to have control over what pages get displayed for certain keywords

mmcgrath commented 15 years ago

Something I'd like to see out of appropriate candidates is how much they storage they take up. Also, no need to code this ourselves.

ianweller commented 14 years ago

At first glance, here's a bunch of possible extensions already written.

I also found this and it seems like a non-option ;) but it's included for fullness.

https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:RigorousSearch

akistler commented 14 years ago

Added a wiki page[[BR]]
https://fedoraproject.org/wiki/Infrastructure/Search

ausil commented 14 years ago

we need something that can search more than the wiki. it needs to index fedorahosted.org fedorapeople.org and fedoraproject.org .

there is
http://www.mnogosearch.org/
http://www.dataparksearch.org/
http://crawler.archive.org/

sadly none are python, either java or c. ive not found a python one yet.

akistler commented 14 years ago

There is also Perl, which is neither Java nor C. mnoGoSearch and DataparkSearch were already on the wiki status page in Comment 6. We can add Heritrix and note that it's written in Java.

KinoSearch, Namazu, OpenFTS, and Plucene are Perl. KinoSearch and Namazu appear to be actively maintained. OpenFTS has a Python interface.

In the meantime, reassigning this ticket to me.

rlandmann commented 13 years ago

Replying to [comment:7 ausil]:

we need something that can search more than the wiki. it needs to index fedorahosted.org fedorapeople.org and fedoraproject.org .

It also needs to index docs.fedoraproject.org.

Publican, which generates the structure of the documentation site can incorporate a search form into the navigation menus that it maintains for each language.

fche commented 13 years ago

FWIW, over on sourceware.org / sources.redhat.com / gcc.gnu.org, we run mnogosearch
against the local web sites. It works okay. I believe these servers in the same
colocation facility as fedora*, so we could do a trial run without too much fuss.

lmacken commented 12 years ago

What is the status with this project?

kevin commented 12 years ago

We now have a dev instance of dpsearch setup at:
https://search-dev.fedoraproject.org/search.cgi

it's crawling docs now. Feedback welcome.

puiterwijk commented 11 years ago

Has there been any progress on this since?

kevin commented 11 years ago

We had a dev instance, but it got very very very slow, so we reaped it.

I'd really like to see us try again and see if we can figure out what went wrong.

frankieonuonga commented 10 years ago

I think this really has to be revisited. I will take it up and reintroduce it on the mailing list. Too many factors have changed since then. We need to remain relevant. We will need to come up with both short term and long term goals.