#8320 Feature: pagination
Opened 3 years ago by schlitzered. Modified 3 years ago

From what i can see all the "*_find" endpoints have no way to do pagination. therefore you are stuck with, by default, a maximum of 2000 results.

therefore please implement pagination, in a way that allows to specify the offset, and the page size for all "*_find" endpoints.


LDAP pagination is surprisingly hard to implement.

LDAP protocol and 389-DS have two ways to provide efficient pagiation:

  1. Simple Paged Results Control, https://tools.ietf.org/html/rfc2696
  2. Virtual browsing index (VLV, virtual list view)

Simple paged results use browsing cookies. The cookies are connection specific and require a persistent connection. IPA's HTTP API uses a standard request/response scheme. On each request the frameworks opens a new connection and closes it at the end of the connection. We can't easily use WebSockets because WSGI protocol and mod_wsgi don't suport WebSockets. IPA would have to move to uwsgi.

VLV indexes are expensive to maintain. The basedn, scope, and full filter string must be static. AFAIK VLV don't work unless all search parameters are fixed and predefined on index time.

why not simply use #1 and skip in the python server code to what is requested?

if for example you request "page=3,limit=100" this could pretty easily be done in the server code. the client would then be required to adjust page on its own until no more results a yield.

the only problem is, that if a object is added or removed on lets say "page=2 limit=100" that the result of "page=3 limit=100" will now silently shift one item left or right.

but IMHO this is how all major Request/Response style API implementations out there work, and i feel that this is totally file, it just has to be mentioned in the API docs on how this is implemented.

if at one point in time a websocket based api is implemented, then of course one can use the the browsing cookie to implement a generator that will yield all results one after the other.

but for request response based API´s i feel that returning whatever currently is the result for "page=3 limit=100" at the time of the request being issued is good enough.

Why not simply use (1)? Because it's not simple to implement this correctly and efficiently. For starters your proposal scales O(n * m) (n: items, m: pages). Without a persistent connection the server has to fetch, build, and sort the result list for every page. The approach yields incorrect results when an item is added, removed, or changed.

(In the future please be careful when using words like "simple" or "only". The words have a negative and degrading co-notation.)

i am sorry, it was not my intent to insult anyone, please be aware that my native language is not English.

yes, i am aware that the method described above will yield incorrect results in case something is added or deleted, but i think there is no way around this for stateless API´s like the JSON RPC API implemented by RedHat IdM.

And IMHO, this is how all stateless API´s work. For my use case i am totally fine with this.

The only downside is that the python server implementation has to fetch all the data from the ldap server, and skip to what has been requested.

i feel like that this is an improvement then what is currently implemented, where you wont have a chance to fetch results that exceed the configured server limit.

The proposal may solve your problem, but we have to solve the problem in a more generic way that applies to installations from 10 entries, 100,000 entries and possibly 1,000,000 entries plus.

Therefore I'm strongly against a solution that is based on SPRC and stateless HTTP, because it is does not scale and solves merely a narrow range of installations. Systems with a few hundred entries already work well. For large installations with > 10k entries, SPRC queries without persistent connection don't scale well. For very large installations I presume that even SPRC with persistent connection will run into performance issues.

Yes, i can understand that this might not be optimal in deployments with
many objects.

maybe as an interim solution, would it be possible to internally switch to
pagination, if sizelimit is set to more then 2000 results? the api can then
internally use a paginated search to fetch all results, and then deliver a
huge result set.

the downside of this is that memory utilization on the API and client might
be huge :-/

alternatively, i could adjust my client code, to directly talk to LDAP, and
do the paginated search on my own, but i have to admit that this is
something i like to avoid.

Am Di., 12. Mai 2020 um 14:18 Uhr schrieb Christian Heimes <pagure@pagure.io

:

cheimes added a new comment to an issue you are following:
``
The proposal may solve your problem, but we have to solve the problem in a
more generic way that applies to installations from 10 entries, 100,000
entries and possibly 1,000,000 entries plus.

Therefore I'm strongly against a solution that is based on SPRC and
stateless HTTP, because it is does not scale and solves merely a narrow
range of installations. Systems with a few hundred entries already work
well. For large installations with > 10k entries, SPRC queries without
persistent connection don't scale well. For very large installations I
presume that even SPRC with persistent connection will run into performance
issues.
``

To reply, visit the link below or just reply to this email
https://pagure.io/freeipa/issue/8320

Metadata Update from @cheimes:
- Issue set to the milestone: Future Releases

3 years ago

Login to comment on this ticket.

Metadata