NOTE
Searching for things on a mailing list always returns "Sorry no email could be found for this query."
An example is https://lists.fedoraproject.org/archives/list/freeipa-users@lists.fedorahosted.org/
Search for freeipa, a string which is visible within the displayed threads, and nothing is found.
Search for centos, also visible. No threads found.
This is more annoying than urgent. But it is a fundamental part of hyperkitty not working.
Metadata Update from @zlopez: - Issue assigned to zlopez - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: high-gain, high-trouble, ops
After upgrading to new mailman version, the search index is completely different and needs to be regenerated. Unfortunately this takes a lot of time to finish. Please be patient we know about this.
Update: The fulltext search index has now 7.2 GB and the job is still running. For comparison the old one had around 200 GB, but the new one could be more efficient.
Update: The fulltext search index has now 30 GB and the job is still running.
Update: The fulltext search index has now 132 GB and the job is still running.
Update: The fulltext search index has now 302 GB and the job is still running.
Update: The fulltext search index has now 523 GB and the job is still running.
Update: The fulltext search index has now 659 GB and the job is still running.
Update: The fulltext search index has now 1015 GB and the job is still running.
How much text is there? Because that is starting to sound like a growth of 10:1
There is a different version of the whoosh engine we are using now, but I'm not sure why it's that big when compared to old one.
There is a cleaning script that runs monthly, so maybe that will make it smaller. But I really don't see what the script is exactly doing, as the logs are really spare.
My debugging would be to load just one small list on mailman.stg and run it to see how it built out. Then load a large list to see what the growth might be
I will try something today, they could be issue with how hyperkitty_hourly job is building the cache. So I will try create a script to do it instead on staging and if it will work correctly I will do the same on production.
I got to this today and I wrote a script that will regenerate the search index from scratch using mailman-web update_index_one_list command.
mailman-web update_index_one_list
The script is already running on staging and if there wouldn't be any issue I will start it on production tomorrow.
After running the new script for a while in staging I probably figured out why the index was so big. There was a temp file which wasn't cleared and just grow bigger and bigger. It should be deleted after the mailing list is processed, but because hourly script sometime got completely stuck, and needed a manual restart, it just stayed there.
Hopefully this will not happen with my script.
The script on staging is running fine, so I started the script on production as well. It will probably take some time to finish, but the final cache size should be much better. And the output from script will actually tell you how far in the processing it is :-)
UPDATE: Processed: 28 from 754 Size: 4.6 GB
UPDATE: Processed: 55 from 754 Size: 9.2 GB
UPDATE: Processed: 141 from 754 Size: 12 GB
UPDATE: Processed: 178 from 754 Size: 21 GB
We finally processed the devel mailing list.
Does the disabled search engine affect also API functionality? While running the periodic inactive packagers check I saw that the queries to https://lists.fedoraproject.org/archives/api/sender/{email}/emails/?ordering=-date were all failing with errors. Also a rest query as simple as https://lists.fedoraproject.org/archives/api/lists seems broken.
https://lists.fedoraproject.org/archives/api/sender/{email}/emails/?ordering=-date
https://lists.fedoraproject.org/archives/api/lists
I wonder if this will automatically be solved once the search is enabled again or if it's a totally different issue.
The lists search should still working. I only disabled the context search on web UI. I didn't touch the API. But during investigation of https://pagure.io/fedora-infrastructure/issue/12011 I applied few patches, so those could be the ones causing the issues. Could you open a new issue with the error, so I can look at it?
UPDATE: Processed: 187 from 754 Size: 22 GB
UPDATE: Processed: 188 from 754 Size: 34 GB
It seems strange that it processed only one mailing list till last update, I will check why.
Found the issue here, the monthly job wasn't disabled and it started processing yesterday. This caused the index generation to be stuck, after disabling it another mailing list was processed.
Not sure why, but the mailman-web update_index_one_list sometimes spawn itself and this blocks the processing and I need to kill it manually. This causes the temp files to not be cleaned. After cleaning them the size of index is 22 GB.
After some hours playing with different ways to resume the index generation after monthly job messed up with it. It seems that the only way to prevent mailman-web update_index_one_list blocking itself is to start generating the search index from scratch again :-/
I don't understand why this is happening, the issue is either with mailman-web update_index_one_list command or with underlying whoosh library.
whoosh
I will try to play with that a little more, so I don't loose the progress. It seems that I found where the issue was on staging and hopefully it will resolve the situation on production instance as well.
After two days playing with it I couldn't find a way to get the index regeneration process to continue. So I have to start it from scratch. I set the max size of batch to 100, hopefully that will prevent the issue in future.
If the issue will start happening again I will probably look for different cache backend.
Current progress (after restart): Processed: 28 from 755 Size: 4.8 GB
Current progress: Processed: 80 from 755 Size: 9.8 GB
At the size of 12 GB fulltext index generation started to have issues again. There are two mailman-web update_index_one processes which are not doing anything.
update_index_one processes
I will start looking at another haystack backend that has better support. The options are Solr, ElasticSearch and Xapian. I will check which one is already packaged in Fedora.
I was able to build xapian-haystack for EPEL 9 in COPR. Let's try if that will work in staging.
xapian-haystack
I got it running on staging and will leave it running there for a day to test it out. It's much faster than whoosh and looks more stable right now. After the test I will package it in infra tag and deploy it on production. Hopefully this will resolve the issues and we will finally get some search index generated.
I was going to suggest submitting it for review in fedora/epel, but you already did that. ;)
@kevin Michel Lind already asked me for that.
The xapian backend is really fast it processed 133 mailing lists till yesterday. I will continue with packaging that for fedora/epel today.
Xapian backend is almost finished generating fulltext search index on staging (457 lists finished from 730). At this rate I will let it finish and enable the searching on staging again, so we can test it properly. After that I will continue with production.
Staging has 616 from 730 mailing lists already indexed. I will test it soon enough and then continue with production.
Thanks everybody who is waiting for that for the patience. Hopefully it will not be that long now.
The staging fulltext index generation finally finished. So I removed the search restrictions from staging and it works although I would probably need to increase the timeout as some requests are taking too long to process and getting killed by Apache before the response is sent.
I didn't noticed any memory spikes as well, so that should be OK.
Now let's continue with the production deployment.
I started the fulltext search index generation on production. I will be posting updates to this ticket.
Forgot about the automatic run of ansible playbook and the index got reverted to whoosh again. I will fix that and restart the index generation. There are already few mailing lists done, so it should be faster.
Current progress of search index generation on production: Processed: 188 from 755 Size: 51 GB
Current progress of search index generation on production: Processed: 420 from 755 Size: 66 GB
Current progress of search index generation on production: Processed: 479 from 755 Size: 85 GB
Current progress of search index generation on production: Processed: 534 from 755 Size: 113 GB
Current progress of search index generation on production: Processed: 640 from 755 Size: 256 GB
Current progress of search index generation on production: Processed: 711 from 755 Size: 266 GB
Current progress of search index generation on production: Processed: 719 from 755 Size: 273 GB
We are getting there :-)
The search index generation finished today and I created a PR to enable the search again on hyperkitty.
This will wait for end of freeze as it's not urgent.
The search on the web UI is now enabled again in production. Closing this as fixed.
Metadata Update from @zlopez: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.