Issue #2482: rethink pagination on web - koji

koji

#2482 rethink pagination on web

Closed: Fixed 2 years ago by tkopecek. Opened 3 years ago by tkopecek.

Every filter page (tasks, etc.) uses count(*) for displaying all pages and total count. This is pretty db-heavy in many cases. E.g. simple task list with selected "failed" and "createrepo" method take more than 7 seconds in my env.

1) we can hide these completely (show only "next" page if it exists)
2) we can use estimates from query planner - fast but often quite out of real value (for same example estimate is 350k while real result is 20k
3) Do the first page fast without count and just "next" button, while second page can show what is there today (count + pages)

I think, the tasks page is the only one with such heavy problem.

ktdreyer commented 3 years ago

Agreed that we should make the "lists" pages faster. I had not investigated far enough to discover that count(*) was the problem.

Metadata Update from @tkopecek:
- Custom field Size adjusted to None
- Issue set to the milestone: 1.25 (was: 1.24)

3 years ago

mikem commented 3 years ago

I had not investigated far enough to discover that count(*) was the problem

It might not be the only problem, but it is a problem.

mikem commented 3 years ago

2) we can use estimates from query planner

I'm fine with this, but we should make sure to indicate in the ui that is is an estimate. E.g. "Tasks 1 through 50 of about 15100"

3) Do the first page fast without count and just "next" button, while second page can show what is there today (count + pages)

This also seems fine to me and is certainly easier.

mikem commented 3 years ago

As I understand it, postgres has to do the whole query even without the count call, since it has to order the rows it returns, so I wonder how much of this is because we're calling count in a separate rpc call, and therefore a separate transaction.

I wonder if things would improve if we could optionally include the count in the single query. Perhaps with this approach:

https://stackoverflow.com/questions/3984643/equivalent-of-found-rows-function-in-postgresql

tkopecek commented 3 years ago

I think that the last is not easy to achieve without slowing other cases. Planner will have it more complicated. E.g simple comparison (not exactly what task page does).

explain  select count(*) over(), id from task where method = 'buildArch' order by id desc limit 100;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..180.97 rows=100 width=12)
   ->  WindowAgg  (cost=0.44..5038823.90 rows=2791103 width=12)
         ->  Index Scan Backward using task_pkey on task  (cost=0.44..5003935.12 rows=2791103 width=4)
               Filter: (method = 'buildArch'::text)
(4 rows)

vs

brew=> explain  select  id from task where method = 'buildArch' order by id desc limit 100;
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..179.72 rows=100 width=4)
   ->  Index Scan Backward using task_pkey on task  (cost=0.44..5003935.12 rows=2791103 width=4)
         Filter: (method = 'buildArch'::text)
(3 rows)

window aggregation put there whole index scan, while without it it can quit after finding first 100 results. Real times are 200ms vs something over 14 minutes (cancelled).

mikem commented 3 years ago

Option 3 is super easy and saves load in the common cases. Maybe we should start there and see how far it gets us?

mikem commented 3 years ago

Unless we think people will complain?

mikem commented 3 years ago

A tangent to #1, if we get less than the limit number of results, perhaps we can assume that we've hit the last page and simply calculate the count ourselves?

tkopecek commented 3 years ago

I'm struggling with reliable estimation. Problem is that basic technique uses explain to parse original query. But it needs correct query string which we don't get without cursor.execute(). Best variant looks to be here https://www.cybertec-postgresql.com/en/postgresql-count-made-fast/

CREATE FUNCTION row_estimator(query text) RETURNS bigint
   LANGUAGE plpgsql AS
$$DECLARE
   plan jsonb;
BEGIN
   EXECUTE 'EXPLAIN (FORMAT JSON) ' || query INTO plan;

   RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;$$;

But it also needs SQL string input. It could be improved to create prepared statement and delete it after that, but it looks to me like a lot of additional complexity in QueryProcessor. Something like this:

PREPARE foo AS SELECT * FROM task WHERE parent IS NULL;
SELECT row_estimator('EXECUTE foo');
DEALLOCATE foo;

Such call could use normal argument interpolation without interfering with '.

Edited 3 years ago by tkopecek

tkopecek commented 3 years ago

For basic variant (skipping count on first page): #2827

Metadata Update from @tkopecek:
- Issue untagged with: discussion
- Issue tagged with: testing-ready

2 years ago

Metadata Update from @jobrauer:
- Issue tagged with: testing-done

2 years ago

tkopecek commented 2 years ago

Commit d6694d0 fixes this issue

tkopecek commented 2 years ago

Commit aa2011c fixes this issue

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

Normal

Milestone

1.25

Size

None

Related Pull Requests

#2827 Merged 2 years ago

koji

Source Code

Documentation

#2482 rethink pagination on web Closed: Fixed 2 years ago by tkopecek. Opened 3 years ago by tkopecek.

Metadata

usability testing-ready testing-done

Related Pull Requests

#2482 rethink pagination on web

Closed: Fixed 2 years ago by tkopecek. Opened 3 years ago by tkopecek.