Learn more about these different git repos.
Other Git URLs
https://lists.fedorahosted.org/archives/list/copr-devel@lists.fedorahosted.org/thread/AFLJGEAMGRRHQAL4DPR4R4KB7V2HY2MM/
My guess: Serializing all the build information takes too much time. I'd be happy to just get build IDs.
Metadata Update from @frostyx: - Issue assigned to frostyx
Actually, the issue is IMHO different. The API endpoint for returning all packages uses Sqlalchemy ORM while the monitor page uses manually written SQL query. We once had similar performance issues even for other pages, like
https://copr.fedorainfracloud.org/coprs/<user>/<project>/builds/
and
https://copr.fedorainfracloud.org/coprs/<user>/<project>/packages/
and fixed them, so this should be fixable too.
I started working on this issue. It is not fixed yet but I want to share the progress.
I originally thought, that all the API queries are slower than their HTML counterparts. They are not but ... there is a but.
On a project with less than 100 builds
client.build_proxy.get_list("frostyx", "tracer")
curl http://127.0.0.1:5000/coprs/frostyx/tracer/builds/
We could look into this but both are under one second so I am not sure if anyone even notices a difference. The results are similar when querying for packages 0.1s vs 0.2s.
Now moving to the big projects, I used iucar/cranas an example.
iucar/cran
client.build_proxy.get_list("iucar", "cran")
curl http://127.0.0.1:5000/coprs/iucar/cran/builds/
This looks like there is a huge performance issue in the API but we need to consider a fact, that the HTML output is streamed and outputs 1000 results at a time. I believe that curl ends after obtaining the first batch. So I would add these two measurements into consideration.
client.build_proxy.get_list("iucar", "cran", pagination={"limit": 1000})
I wasn't able to get consistent times, so here are some attempts
I am not sure what we can conclude from these results but in general, the API doesn't slower to me.
See how to work with API pagination here https://python-copr.readthedocs.io/en/latest/client_v3/pagination.html
The mail archives don't currently work because of the infrastructure migration but I guess the following mail is linked in this issue
API feedback: It's faster to grep HTML
It suggests that for obtaining all packages with their last build it is much faster to
parse the HTML returned from the Monitor page.
rather than
copr list-packages --with-latest-succeeded-build <copr>
I am not able to load a monitor page for these huge projects so I am not sure if you switched to parsing packages page ... but anyway, the CLI command is indeed excruciatingly slow and from my POV it is the only API performance issue that needs to be fixed. I know even this issue title explicitly says that there is a problem with just that one command but I suspected it is a more general issue.
The reason why it is so slow is its naive implementation. It obtains the list of all packages and then goes and sends a new request for each of them to get their last build. We need to obtain everything in just one or two requests ...
Metadata Update from @praiskup: - Issue priority set to: High
Modified in PR#1433
Commit 1144f6e fixes this issue
Follow-up #1462
Seems like the --with-latest-build is still slow for large projects, but I haven't done the measurement so far...
--with-latest-build
Metadata Update from @praiskup: - Issue status updated to: Open (was: Closed)
Metadata Update from @praiskup: - Assignee reset
Metadata Update from @praiskup: - Issue assigned to frostyx
re-reported here: https://bugzilla.redhat.com/show_bug.cgi?id=2036631 (the monitor cli command seems to work-around this)
Metadata Update from @praiskup: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.