#8648 Koji task info pages load very slowly
Opened 2 months ago by omos. Modified 7 days ago

When I open any task info URL (e.g. https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802), it takes ~4-5 seconds to load. For the main page, Chromium's developer tools report:

Request sent: less than 1 ms
Waiting (TTFB): 4-5 seconds
Content Download: ~4-15 ms

The pages load equally slowly in Firefox. FWIW, I am accessing them from the Red Hat internal network (BRQ).

See also: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/PLYNNXSIFCW2X4NGFCG2HQHY3FTCKXUV/


Thanks for the geolocation. Could you add a traceroute.. and in the firefox page use inspect and the network and performance tag to see if any particular pages are slow.

in Firefox right click inspect, and switch to the network page and do a ^R on the koji.fedoraproject.org page. please let us know if any of them are larger than others.

I have a similar response.

x-fedora headers on response:

x-fedora-appserver: koji01.phx2.fedoraproject.org
x-fedora-proxyserver: proxy01.phx2.fedoraproject.org
x-fedora-requestid: XkadSxoxOPIC@62yMDIPrwAAAAE

response times and sizes:

koji_speed.png

Traceroute:

$ traceroute koji.fedoraproject.org
traceroute to koji.fedoraproject.org (209.132.181.16), 30 hops max, 60 byte packets
 1  _gateway (192.168.1.1)  1.626 ms  1.592 ms  2.078 ms
 2  10.187.21.105 (10.187.21.105)  1.595 ms * *
 3  149.red-213-0-187.static.ccgg.telefonica.net (213.0.187.149)  13.394 ms 121.red-80-58-75.staticip.rima-tde.net (80.58.75.121)  15.002 ms *
 4  * * *
 5  21.red-80-58-84.staticip.rima-tde.net (80.58.84.21)  15.287 ms * *
 6  105.red-80-58-97.staticip.rima-tde.net (80.58.97.105)  15.398 ms * *
 7  * * *
 8  * 33.red-81-45-103.staticip.rima-tde.net (81.45.103.33)  18.083 ms *
 9  * * *
10  * * *
11  * * *
12  ae8-400-grtbcnes1.net.telefonicaglobalsolutions.com (213.140.50.218)  23.633 ms * *
13  * 213.140.33.55 (213.140.33.55)  56.528 ms *
14  * be12956.rcr21.b023101-0.lon01.atlas.cogentco.com (130.117.15.157)  52.759 ms *
15  be12499.ccr21.lon01.atlas.cogentco.com (154.54.60.241)  58.099 ms be12500.ccr22.lon01.atlas.cogentco.com (154.54.61.5)  54.916 ms *
16  be2868.ccr41.lon13.atlas.cogentco.com (154.54.57.153)  122.420 ms be2869.ccr42.lon13.atlas.cogentco.com (154.54.57.161)  120.223 ms be2870.ccr41.lon13.atlas.cogentco.com (154.54.58.173)  55.751 ms
17  be2490.ccr42.jfk02.atlas.cogentco.com (154.54.42.85)  125.081 ms be2317.ccr41.jfk02.atlas.cogentco.com (154.54.30.185)  124.040 ms be12497.ccr41.par01.atlas.cogentco.com (154.54.56.130)  132.625 ms
18  * be3627.ccr41.jfk02.atlas.cogentco.com (66.28.4.197)  129.188 ms *
19  be2806.ccr41.dca01.atlas.cogentco.com (154.54.40.106)  132.859 ms be2807.ccr42.dca01.atlas.cogentco.com (154.54.40.110)  122.888 ms be2113.ccr42.atl01.atlas.cogentco.com (154.54.24.222)  137.828 ms
20  be2113.ccr42.atl01.atlas.cogentco.com (154.54.24.222)  136.427 ms be2690.ccr42.iah01.atlas.cogentco.com (154.54.28.130)  157.543 ms be2112.ccr41.atl01.atlas.cogentco.com (154.54.7.158)  147.101 ms
21  * be2687.ccr41.iah01.atlas.cogentco.com (154.54.28.70)  161.296 ms be2690.ccr42.iah01.atlas.cogentco.com (154.54.28.130)  151.906 ms
22  * be2928.ccr21.elp01.atlas.cogentco.com (154.54.30.162)  163.266 ms be2927.ccr21.elp01.atlas.cogentco.com (154.54.29.222)  174.163 ms
23  be3437.agr22.phx01.atlas.cogentco.com (154.54.85.78)  179.379 ms be2929.ccr31.phx01.atlas.cogentco.com (154.54.42.65)  185.112 ms be3436.agr21.phx01.atlas.cogentco.com (154.54.85.74)  171.622 ms
24  * te0-0-2-3.nr11.b019174-0.phx01.atlas.cogentco.com (154.24.18.26)  181.078 ms *
25  te0-0-2-0.nr11.b019174-0.phx01.atlas.cogentco.com (154.24.18.22)  174.968 ms 38.122.88.218 (38.122.88.218)  182.786 ms *
26  38.88.238.30 (38.88.238.30)  178.257 ms * transit-199-181-132-209.redhat.com (209.132.181.199)  175.084 ms
27  * * *
28  * * *
29  * * *
30  * * *

Geolocation: around 43.315378, -1.991465

In my case, download speed is quite sane, the problem is on waiting time for the dynamic .html

Sorry when I meant geolocation.. I meant like Czech Republic, Central Europe, etc.

My first thought is that for the last 4 years, we start seeing reports of this in January from Europe for about 3-4 months (it goes away by itself in May usually). Transactions trying to get to the US all have that giant pause you show in the cut off image above.. where it takes 4 seconds for the first page but every page after that is faster.. but then the next initial route does the same thing.

However I am not saying that is the conclusion here.. I need to get data on what the load at koji is when these slow points.

Okido, I wrote this little script to test your theory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/python

import requests
from datetime import datetime

s = requests.Session()
for url in ['https://koji.fedoraproject.org/koji-static/koji.css', 'https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802']:
    print("url: {}\nstarted: {}".format(url, datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')))
    s.get(url)
    print("ended: {}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')))

It swaps the query order to ask first for the .css and then for the .html while it keepalives the connection.

Results:

$ python kojitest.py 
url: https://koji.fedoraproject.org/koji-static/koji.css
started: 2020-02-14 15:00:21.213242
ended: 2020-02-14 15:00:21.876458
url: https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802
started: 2020-02-14 15:00:21.876510
ended: 2020-02-14 15:00:25.796787

The problem is fetching the .html, not on the first connection

Tested from a datacenter in germany:

$ python kojitest.py 
url: https://koji.fedoraproject.org/koji-static/koji.css
started: 2020-02-14 15:14:44.295798
ended: 2020-02-14 15:14:44.993797
url: https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802
started: 2020-02-14 15:14:44.993836
ended: 2020-02-14 15:14:49.049439

From a datacenter in france:

$ python kojitest.py 
url: https://koji.fedoraproject.org/koji-static/koji.css
started: 2020-02-14 14:14:38.545488
ended: 2020-02-14 14:14:39.719645
url: https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802
started: 2020-02-14 14:14:39.719717
ended: 2020-02-14 14:14:44.029743

UK:

]$ python kojitest.py 
url: https://koji.fedoraproject.org/koji-static/koji.css
started: 2020-02-14 15:15:27.464990
ended: 2020-02-14 15:15:28.310617
url: https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802
started: 2020-02-14 15:15:28.310752
ended: 2020-02-14 15:15:31.995204

The taskinfo html always needs at least 3.5secs from all those european locations

Traceroute from my machine:

$ traceroute koji.fedoraproject.org
traceroute to koji.fedoraproject.org (209.132.181.15), 30 hops max, 60 byte packets
 1  _gateway (10.43.12.254)  1.601 ms  1.447 ms  1.353 ms
 2  10.43.255.50 (10.43.255.50)  22.067 ms  22.047 ms  21.979 ms
 3  10.43.255.160 (10.43.255.160)  17.238 ms  17.151 ms  17.067 ms
 4  10.40.255.138 (10.40.255.138)  4.900 ms  4.851 ms  4.764 ms
 5  10.40.255.3 (10.40.255.3)  4.693 ms  4.619 ms  4.504 ms
 6  213.175.37.9 (213.175.37.9)  1.018 ms  1.135 ms  1.026 ms
 7  cz-brn-bbr7-et-5-2-0.dialtelecom.cz (82.119.245.97)  20.704 ms  20.500 ms  20.423 ms
 8  cz-prg-p1sit-hunge0-1-0-1.dialtelecom.cz (82.119.246.105)  4.705 ms  5.153 ms  5.081 ms
 9  82.119.246.14 (82.119.246.14)  4.718 ms  4.654 ms  4.843 ms
10  cz-prg-asbr2-et-0-0-0.dialtelecom.cz (82.119.246.238)  4.715 ms  4.905 ms  4.656 ms
11  xe-4-0-0.prg11.ip4.tinet.net (213.200.74.93)  4.810 ms  4.916 ms  4.860 ms
12  be3257.ccr21.prg01.atlas.cogentco.com (130.117.14.49)  5.175 ms  5.361 ms  5.188 ms
13  be3027.ccr41.ham01.atlas.cogentco.com (130.117.1.205)  15.757 ms be3029.ccr42.ham01.atlas.cogentco.com (154.54.59.61)  16.088 ms be3027.ccr41.ham01.atlas.cogentco.com (130.117.1.205)  16.130 ms
14  be2815.ccr41.ams03.atlas.cogentco.com (154.54.38.205)  23.314 ms be2816.ccr42.ams03.atlas.cogentco.com (154.54.38.209)  23.431 ms be2815.ccr41.ams03.atlas.cogentco.com (154.54.38.205)  23.576 ms
15  be12488.ccr42.lon13.atlas.cogentco.com (130.117.51.41)  111.484 ms be12194.ccr41.lon13.atlas.cogentco.com (154.54.56.93)  115.299 ms  115.445 ms
16  be12489.ccr42.par01.atlas.cogentco.com (154.54.57.70)  113.035 ms  111.570 ms  112.904 ms
17  be3627.ccr41.jfk02.atlas.cogentco.com (66.28.4.197)  112.516 ms be3628.ccr42.jfk02.atlas.cogentco.com (154.54.27.169)  113.148 ms  113.703 ms
18  be2332.ccr41.iad02.atlas.cogentco.com (154.54.85.245)  115.866 ms  116.024 ms be2806.ccr41.dca01.atlas.cogentco.com (154.54.40.106)  110.745 ms
19  be3083.ccr41.dca01.atlas.cogentco.com (154.54.30.53)  115.409 ms be2113.ccr42.atl01.atlas.cogentco.com (154.54.24.222)  122.229 ms be3083.ccr41.dca01.atlas.cogentco.com (154.54.30.53)  115.606 ms
20  be2687.ccr41.iah01.atlas.cogentco.com (154.54.28.70)  138.354 ms be2690.ccr42.iah01.atlas.cogentco.com (154.54.28.130)  139.671 ms be2112.ccr41.atl01.atlas.cogentco.com (154.54.7.158)  128.523 ms
21  be2690.ccr42.iah01.atlas.cogentco.com (154.54.28.130)  143.051 ms  142.946 ms be2687.ccr41.iah01.atlas.cogentco.com (154.54.28.70)  141.461 ms
22  be2930.ccr32.phx01.atlas.cogentco.com (154.54.42.77)  168.846 ms  169.276 ms be2927.ccr21.elp01.atlas.cogentco.com (154.54.29.222)  162.697 ms
23  be2930.ccr32.phx01.atlas.cogentco.com (154.54.42.77)  172.224 ms be3439.agr22.phx01.atlas.cogentco.com (154.54.85.94)  168.985 ms be3438.agr21.phx01.atlas.cogentco.com (154.54.85.90)  170.037 ms
24  te0-0-0-0.nr12.b019174-0.phx01.atlas.cogentco.com (154.24.53.150)  171.258 ms be3438.agr21.phx01.atlas.cogentco.com (154.54.85.90)  172.812 ms be3439.agr22.phx01.atlas.cogentco.com (154.54.85.94)  173.537 ms
25  te0-0-1-0.nr12.b019174-0.phx01.atlas.cogentco.com (154.24.53.154)  174.213 ms 38.88.238.30 (38.88.238.30)  169.216 ms  169.230 ms
26  transit-199-181-132-209.redhat.com (209.132.181.199)  169.540 ms  169.666 ms  169.622 ms
27  * * *
28  * * *
29  * * *
30  * * *

The timing stats are similar as in @jlanda's comment (only the additional resources all load in ~175ms, which is almost the same as ping delay, i.e. 170 ms). Also note that the index page (https://koji.fedoraproject.org/koji/index) is not similarly affected - it loads in under a second for me.

Location: Ann Arbor, MI, USA

$ traceroute koji.fedoraproject.org
traceroute to koji.fedoraproject.org (209.132.181.16), 30 hops max, 60 byte packets
 1  35.3.0.2 (35.3.0.2)  2.289 ms  6.594 ms  6.524 ms
 2  o-fxb-ipsvrf-loop.r-fxb.umnet.umich.edu (10.224.254.2)  7.861 ms  7.793 ms  7.713 ms
 3  l3-fxb-lsa.r-lsa.umnet.umich.edu (141.211.0.150)  6.817 ms  6.799 ms  6.965 ms
 4  o-seb-ipsvrf-loop.r-seb.umnet.umich.edu (10.224.254.1)  7.780 ms  7.709 ms  7.654 ms
 5  l3-binseb-seb-ipsvrf.r-bin-seb.umnet.umich.edu (192.12.80.138)  6.410 ms  6.411 ms  6.267 ms
 6  anar-cor-cath.mich.net (192.12.80.37)  6.273 ms  1.877 ms  2.067 ms
 7  irbx69.chcg-lvl3-600w.mich.net (198.108.22.101)  7.780 ms  7.746 ms  7.690 ms
 8  et-11-0-5.2043.rtsw.chic.net.internet2.edu (64.57.21.217)  12.908 ms  12.879 ms  12.845 ms
 9  lo-0.8.rtsw2.ashb.net.internet2.edu (64.57.29.131)  27.902 ms  27.873 ms  27.819 ms
10  equinix-ix.iad1.us.voxel.net (206.126.236.163)  20.887 ms  20.861 ms  20.765 ms
11  bbr2.ae1.wdc002.pnap.net (64.95.159.30)  20.619 ms  20.595 ms  20.517 ms
12  bbr2.xe-1-0-1.inapbb-wdc-dal-7.dal006.pnap.net (64.95.158.209)  50.868 ms  50.940 ms  50.830 ms
13  bbr1.ae7.dal006.pnap.net (64.95.158.201)  50.716 ms  50.342 ms  50.712 ms
14  bbr1.phx010.pnap.net (64.95.158.169)  75.501 ms  75.141 ms  75.084 ms
15  64.95.160.1 (64.95.160.1)  74.472 ms  74.444 ms  74.387 ms
16  64.95.158.70 (64.95.158.70)  75.563 ms  75.846 ms  76.460 ms
17  border5.ae2-bbnet2.phx010.pnap.net (69.25.168.65)  70.232 ms border5.ae1-bbnet1.phx010.pnap.net (69.25.168.1)  69.953 ms  70.333 ms
18  redhatphx-1.phx004.pnap.net (69.25.120.242)  81.350 ms  81.293 ms redhat-2.edge1.phx004.pnap.net (69.25.121.26)  81.173 ms
19  transit-199-181-132-209.redhat.com (209.132.181.199)  81.174 ms  81.121 ms  81.762 ms
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
$ ./kojitest.py
url: https://koji.fedoraproject.org/koji-static/koji.css
started: 2020-02-14 10:33:37.501393
ended: 2020-02-14 10:33:37.839625
url: https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802
started: 2020-02-14 10:33:37.839735
ended: 2020-02-14 10:33:42.327157

Screenshot_from_2020-02-14_10-36-23.png

Thanks for the specific link. I think for this problem we have been looking at different pages as people would say koji is slow.. and we would test the one which is 'fast' while what people meant was the slower ones which have a lot of database queries to fill out. I see a 3 second delay also in the US.

So, for folks seeing this:

  • Are you seeing it always, or at some particular times it's much worse than normal?

  • 4-5 seconds load time for a task link might just be what we have now (the task table is like 92GB). At least until we are able to partition tables.

  • Has it always been this speed or when was it last faster?

  • Finally, perhaps @mikem or @tkopecek have some ideas on what we can check or if they are seeing it too?

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: koji

2 months ago

Are you seeing it always, or at some particular times it's much worse than normal?

It has been happening consistently for at least the past week or two.

4-5 seconds load time for a task link might just be what we have now (the task table is like 92GB). At least until we are able to partition tables.

Has it always been this speed or when was it last faster?

This I'm not sure about. I think it wasn't always this slow, but I can't say for sure. I started to use Koji more intensively during the past few weeks, so maybe I just became more aware of the usual delay.

Last 3-4 days Koji works fine for me, but before i waited for every action 1-3 minute.

Firefox-report_20200216084036.png

Yes, taskinfo is slow. Underlying listBuildroots SQL query is not best (3 seconds in this case) and should be reworked in 1.21 Anyway, I've some problems with doing it right and fast in same time now :-)

On the postgres-side you should be able to see slow queries which are not slow per-se, but use large amount of memory/diskspace due to inefficient joins.

For debugging, I would also recommend to compare koji taskinfo speed vs web speed. It is almost the same code without interfering browser.

https://pagure.io/koji/issue/1562
https://pagure.io/koji/pull-request/1110

It's crazy slow for me now (Sun 01 Mar 2020 07:39:55 PM UTC)

# time koji taskinfo 42071731
real    2m5.247s
user    0m0.276s
sys     0m0.085s

Hey, everyone who reported this:

  • Are you still seeing it?
    ** If yes, please say when (utc, ie, 'date -u') you saw it and on what url(s)

Yes, I am still seeing this:

$ time koji taskinfo 42071731
real    0m5.177s
user    0m0.352s
sys     0m0.047s
$ ./kojitest.py 
url: https://koji.fedoraproject.org/koji-static/koji.css
started: 2020-03-06 14:22:57.701681
ended: 2020-03-06 14:22:57.962608
url: https://koji.fedoraproject.org/koji/taskinfo?taskID=41480802
started: 2020-03-06 14:22:57.962686
ended: 2020-03-06 14:23:02.124932
$ date -u
Fri 06 Mar 2020 07:23:53 PM UTC

@mkrupcale was it faster ever in the past? I mean 5 seconds isn't great, but the task table is 52GB after all.

Are other actions slow for you? or just taskinfo?

@tkopecek any ideas here? Is this a query we could optimize?

The load on the db server is pretty normal, nothing stands out that I can see...

@kevin I may not have been paying close attention to it, and maybe it has "always" taken this long for me. Indeed it may just be the fact that the DB is large and takes time to query. Other actions such as loading tag, host, or user information can take 2-4 seconds, while things like package or build information seem to take closer to 1 second or less.

taskinfo is not slow by itself, problem is additional info (about 6 expensive queries) and especially buildroots. While whole command is +-6 seconds for me, Basic call getTaskInfo is about 180ms. Compare with speed of

import koji
s = koji.ClientSession('https://koji.fedoraproject.org/kojihub')
s.getTaskInfo(42071731)

listBuildroots - 1.3s - should be better with 1.21 (https://pagure.io/koji/pull-request/1110) but needs to be tested more, before we'll merge it.

Another piece of slowness (~1s) is authentication. Whole command doesn't look to use ay authenticated calls, so we can drop it (created https://pagure.io/koji/issue/2070)

Login to comment on this ticket.

Metadata
Attachments 3