#214 properly fix pagination in Bugzilla
Opened a month ago by kparal. Modified a month ago

There was a sudden change in Bugzilla that limited anonymous queries to just 20 results. @tflink created a hotfix, which we applied in production, and on develop it is available as 0b612d2. However, the issue is more complicated.

Bugzilla now returns max 20 results if you're anonymous, or optionally more, if you have a token (in ~/.config/python-bugzilla/bugzillarc, created e.g. with bugzilla login) and request a higher limit. However, if you're anonymous (e.g. the token expires), you again start getting max 20 results, without any warning. That means our approach in that code if len(last_query) == bugzilla_query_limit; then "this is probably not the last page" is flawed. If we created a token to be able to query faster, increased the requested limit, and then the token expired, we'd be getting fewer results than expected right with the first response and never tried a second page.

So we either need to come up with a better pagination implementation, or we need to agree that anonymous access and 20 responses for a call is enough for us and not worth the bother. In that case we should add a comment next to the BUGZILLA_QUERY_LIMIT = 20 line and say that this is not intended to be configurable and why.

This is also related to #184, switching to REST API might change things.


Issue tagged with: next

a month ago

A few more pointers:

  • https://bugzilla.redhat.com/docs/en/html/integrating/api/Bugzilla/WebService/Bug.html?highlight=offset#search (search for offset and limit)
  • https://github.com/python-bugzilla/python-bugzilla/issues/149
  • from internal bugzilla-list:

    Note: There is the chance that when using limit and offset that the data set can change while you are processing results. To avoid this you would not use offset and would instead set order=bug_id in your search and rerun your search with bug_id > $largest_bug_id_processed.

  • again from bugzilla-list:

    The parameters should just get mapped directly through to the database query:
    https://www.postgresql.org/docs/12/queries-limit.html
    "When using LIMIT, it is important to use an ORDER BY clause that
    constrains the result rows into a unique order. Otherwise you will get
    an unpredictable subset of the query's rows. "
    "using different LIMIT/OFFSET values to select different subsets of a
    query result will give inconsistent results unless you enforce a
    predictable result ordering with ORDER BY. This is not a bug; it is an
    inherent consequence of the fact that SQL does not promise to deliver
    the results of a query in any particular order unless ORDER BY is used
    to constrain the order."

  • and again bugzilla-list:

    "I thought we might offer an option where instead of getting bug details you only get bug IDs, but you get them all, and in the right order. You'd then call Bug.get to get batches of them. The data could still change between when you search and when you get the bugs, but it might be a preferable compromise for some use cases."
    https://bugzilla.redhat.com/show_bug.cgi?id=2005153

A script for testing query limits by @tflink:

import bugzilla

F35_BETA_FE_TRACKER = 1891954

bz = bugzilla.Bugzilla(url='https://bugzilla.redhat.com/xmlrpc.cgi', tokenfile=None, cookiefile=None)


def do_bugzilla_query(limit):
    query = {'o1': 'anywords',
                'f1': 'blocked',
                'query_format': 'advanced',
                'extra_fields': ['flags'],
                'limit': limit,
                'offset': 0,
                'v1': str(F35_BETA_FE_TRACKER)}

    buglist = bz.query(query)

    print("limit: {}".format(limit))
    print("number of results: {}".format(len(buglist)))


do_bugzilla_query(0)
do_bugzilla_query(100)
do_bugzilla_query(20)

Login to comment on this ticket.

Metadata
Boards 1
Next tasks Status: Picked