Issue #123: resultsdb frontend is painfully slow - taskotron

taskotron

#123 resultsdb frontend is painfully slow

Closed: Fixed None Opened 8 years ago by kparal.

Just opening the main page https://taskotron.fedoraproject.org/resultsdb/results takes between 10-15 seconds. Searches like https://taskotron.fedoraproject.org/resultsdb/results?testcase_name=depcheck are even slower. Are we missing some indexes again?

somebody commented 8 years ago

This ticket is a duplicate of https://pagure.io/taskotron/resultsdb/issue/54

somebody commented 8 years ago

This ticket had assigned some Differential requests:
D635

jskladan commented 8 years ago

OK, so this problem has two layers.

First off, we are sorting all the queries by submit_time in a descending order, without having an index on the column. I can see two ways to tackle this - either add an index CREATE INDEX result_submit_time_desc ON result (submit_time DESC); (which seems obvious), or just get rid of the sorting-by-date all together. This seems weird at first, but since the submit_time is created as "current UTC time at the time of saving stuff to database", it effectively means that (as long as we can count on id being infinitely growing sequence) sort-by-id yields the same result, and id is already indexed. My preference would be to add the index, as it seems to be "cleaner and less race-condition-prone-y", but the preference is not strong. What do you think @tflink?

The second problem is this (controllers/api_v1.py: def pagination()):
bbe42e01 (Ralph Bean 2015-07-29 13:09:04 -0400 90) total = q.count()

Which is there in order to make it possible to show how many results are for the particular query, thus going through all the database, instead of the (by default) 20 rows. This also get sorted by some datetime column, which is absolutely unnecessary for this kind of operation.

Ideally, I'd like to get rid of the whole total = q.count() line, since I really see it as unnecessary, and even bad, as it takes about the 16 seconds out of the 16.5 seconds that it takes to load the https://taskotron.fedoraproject.org/resultsdb/results page. This time goes down with the use of the index on the submit_time column, to about 2.5 secs (once again, 90% of the time is spent counting records), which is "acceptable" IMHO, but still taking far too long IMHO. Is there any particular use-case for the total functionality @ralph?

If we decide not to get rid of it, I'd be tempted to make it optional by adding a parameter to the resultsdb-query URL, something in the likes of: http://.../resultsdb_api/api/v1.0/results?count_total=1, which would default to 0.

Thoughts?

ralph commented 8 years ago

Here's the commit message for bbe42e01

With all JSON queries, return more pagination metadata.

Summary:
This now includes:

total - the total number of entities that can be retrieved.

pages - the total number of pages that can be retrieved

Without these, if you have code that is paging through resultsdb
results, there is no way to know when you are going to be done -- and no
way to jump to the last page and iterate through them in reverse.

I wrote it in July of 2014, which was right after the Bodhi2 FAD when we did the first resultsdb integration there. Grepping through the Bodhi source now, though, we don't use either total or pages. Only the next field. So, making those optional sounds fine to me.

jskladan commented 8 years ago

Awesome, good to know!

Metadata Update from @kparal:
- Issue tagged with: infrastructure

6 years ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Normal

taskotron

Source Code

#123 resultsdb frontend is painfully slow Closed: Fixed None Opened 8 years ago by kparal.

Metadata

infra

#123 resultsdb frontend is painfully slow

Closed: Fixed None Opened 8 years ago by kparal.