= phenomenon = Any access to Trac instances on fedorahosted.org is very slow.
Some examples (anonymous user): $ time wget https://fedorahosted.org/freeipa/ticket/3274 -O /dev/null real 0m12.571s real 0m18.466s real 0m11.192s real 0m13.509s
$ time wget https://fedorahosted.org/freeipa/ticket/3254 -O /dev/null real 0m3.349s (Wow! I was lucky!) real 0m7.730s real 0m6.625s real 0m7.546s
= How we can help? = Were is the bottleneck? Where we should push? Who is the responsible manager (if any)?
Hmm, my bad. There is a table with better formating: {{{ Some examples (anonymous user): $ time wget https://fedorahosted.org/freeipa/ticket/3274 -O /dev/null real 0m12.571s real 0m18.466s real 0m11.192s real 0m13.509s
$ time wget https://fedorahosted.org/freeipa/ticket/3254 -O /dev/null real 0m3.349s (Wow! I was lucky!) real 0m7.730s real 0m6.625s real 0m7.546s }}}
Replying to [ticket:3582 pspacek]:
Do you see this with only the freeipa trac? Or others?
When did you notice the slowdown?
I don't know yet, will investigate, above info could help us narrow it down.
I don't think complaining to management before you give us time to investigate would be very useful.
Replying to [comment:2 kevin]:
Replying to [ticket:3582 pspacek]: = phenomenon = Any access to Trac instances on fedorahosted.org is very slow. Do you see this with only the freeipa trac? Or others?
I see similar slowdown with the SSSD Trac as well.
So, can anyone note when they saw the slowdown? In the last few days? Weeks? Months?
Is it only downloading tickets? Or all operations?
At least for me, the fh.o Trac instances have always been kind of slow, but the performance seems to be getting worse over time (maybe proportionally to the data fh.o hosts?).
The slowest operation seems to be anything that queries the ticket database. Displaying the open tickets for the SSSD trac (https://fedorahosted.org/sssd/report/3) takes about 10 seconds on average.
ok. I have run a sqlite vacuum on both projects db's... this does seem to improve things from what I can see, but not sure it's up to acceptable for you.
Can you test now?
It might be we are hitting some kind of sqlite issue and should consider moving over to a real database backend for projects.
Thank you Kevin. This is rather unscientific but it does seem faster now, I'd say maybe 2x?
As for moving off sqlite, we have an awful lot of data in this trac instance and we modify it frequently, I don't know what the working limits are for sqlite but I always thought it was for smaller data sets. It wouldn't surprise me if using a more capable db would be an improvement (postgresql?)
I appreciate the interest here and additional cc's, but they don't help any without data. ;)
Are things fast enough for now?
Still taking a lot time to pull up some reports, 30 for report 3.
Re-running a query speeds things up a bit, I imagine that caching plays a role.
Fetching individual tickets took 7-10 seconds each in my tests. Slow but tolerable I suppose.
FreeIPA Trac is now faster, but still it isn't comfortable to use (from my personal point of view). {{{ $ time wget https://fedorahosted.org/freeipa/ticket/3274 -O /dev/null real 0m7.342s real 0m10.690s real 0m6.263s real 0m7.965s
$ time wget https://fedorahosted.org/freeipa/report/3 -O /dev/null real 0m16.710s real 0m15.166s real 0m20.910s real 0m22.004s }}}
bind-dyndb-ldap Trac has 32x less tickets (100 vs. 3200 in FreeIPA Trac) and still it takes some time to load particular bug. {{{ $ time wget https://fedorahosted.org/bind-dyndb-ldap/ticket/12 -O /dev/null real 0m3.456s real 0m4.117s real 0m2.875s real 0m2.978s
$ time wget https://fedorahosted.org/bind-dyndb-ldap/report/3 -O /dev/null real 0m7.992s real 0m4.511s real 0m3.846s real 0m7.356s real 0m4.008s }}}
Results for "empty form page" (logged in as pspacek): {{{ $ time wget https://fedorahosted.org/bind-dyndb-ldap/newticket -O /dev/null --load-cookies=/tmp/cookies.txt real 0m3.413s real 0m3.733s real 0m2.724s real 0m3.066s }}}
From results above - there is approximatelly 3 seconds overhead somewhere added to each request/reply.
There is traceroute from my station to fedorahosted: {{{ traceroute to 66.135.62.187 (66.135.62.187), 30 hops max, 60 byte packets 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 ge-0-0-2.vnlams1-core-1.virtela.net (77.247.0.0) 38.585 ms 39.275 ms 38.599 ms 9 asd2-rou-1043.NL.eurorings.net (134.222.152.118) 47.139 ms 38.793 ms 39.374 ms 10 rt2-rou-1022.NL.eurorings.net (134.222.199.106) 43.314 ms 41.072 ms 43.627 ms 11 ldn-s2-rou-1101.UK.eurorings.net (134.222.232.75) 48.504 ms 48.497 ms 47.669 ms 12 ldn-s2-rou-1041.UK.eurorings.net (134.222.228.94) 47.117 ms 92.293 ms 50.010 ms 13 10ge.xe-0-0-0.linxb.ldn-teleh-dis-1.peer1.net (195.66.224.156) 47.931 ms 46.208 ms 47.188 ms 14 10ge.xe-1-1-0.nyc-telx-dis-1.peer1.net (216.187.115.117) 120.018 ms 209.518 ms 119.681 ms 15 10ge.xe-0-0-0.nyc-telx-dis-2.peer1.net (216.187.115.182) 120.327 ms 119.466 ms 119.901 ms 16 10ge.xe-1-2-0.chi-eqx-dis-1.peer1.net (216.187.114.161) 207.870 ms 140.502 ms 140.549 ms 17 10ge-ten1-1-0.dal-eqx-xe2-1.peer1.net (216.187.124.73) 169.704 ms 171.003 ms 169.169 ms 18 10ge-ten1-2.sat-8500v-cor-1.peer1.net (216.187.124.78) 245.670 ms 179.149 ms 178.620 ms 19 10ge.xe-0-0-1.sat-8500v-sbdis-1.peer1.net (216.187.124.66) 182.647 ms 182.640 ms 177.015 ms 20 hosted02.fedoraproject.org (66.135.62.187) 175.639 ms 176.038 ms 176.628 ms 21 hosted02.fedoraproject.org (66.135.62.187) 175.733 ms !X 177.950 ms !X 176.883 ms !X }}}
180 ms is a lot (I'm based in Europe), but it should not add more than 1 second anyway I guess.
Please let me know you need further information.
So, some things to note:
There's 2 nodes (both active) that you get in a dns round robin
There's caching involved. The first access will be slower than subsquent ones, and the cache is local to the node. So, if you hit hosted01, then hosted02, it will now be cached on both, but those first two accesses may be slower. In any case the first time someone hits something in a while it will not be cached.
I'm not sure what else to try to improve performance off hand. We could look at moving to a postgresql backend, but then we are going back to a single point of failure, which I dislike.
I have tweaked a few other things... can you retry your tests, with multiple attempts to show caching?
I tried to run report/3 requests for FreeIPA instance and I don't really see substantial improvements and caching results across the nodes. There is clear difference in responses by different nodes. In particular .187 node seems to be faster than .201 node, all .187 node responses are below 15s and vary between 10s and 14s while .201 node responds slower and there seem to be less caching effect visible -- note 28.596s spike while previous request to the same node gave 16.025s.
At the same time, if anything was cached, even 10s response time is clearly unacceptable for caching case. I doubt there were modifications in the tickets at the time since all my requests were done sequentially, over few minutes time when US already went asleep and Europe didn't start seriously working yet. {{{ time wget https://fedorahosted.org/freeipa/report/3 -O /dev/null .. real 0m16.025s .. real 0m16.797s .. real 0m28.596s .. real 0m17.107s .. real 0m10.445s .. real 0m14.853s .. real 0m15.593s .. real 0m10.136s .. real 0m14.093s .. real 0m11.215s }}}
Replying to [comment:18 kevin]:
Speaking about redundancy - how file (SQLite?) backend helps? I'm not database guy, but I don't see improvement when whole database is in file. Is that file located on some clustered filesystem? Then filesystem is single point of failure (as I saw in our lab) ...
Yes, the two nodes have a replicated glusterfs filesystem between them. So, all data is shared. One node could be completely down and all data access would function as normal.
I'll try and come up with some ways to improve performance more...
Another query for everyone: Is this just trac?
Are other items: downloads, git pulls/pushes, cgit access all fine? Or do you find those slow as well?
Replying to [comment:21 kevin]:
Yes, the two nodes have a replicated glusterfs filesystem between them. So, all data is shared. One node could be completely down and all data access would function as normal. I'll try and come up with some ways to improve performance more... I suspect Gluster+SQLite combination from some sort of lock contention. Unfortunatelly, I know next to nothing about Gluster and tools for diagnosing this kind of problems.
I'll try and come up with some ways to improve performance more... I suspect Gluster+SQLite combination from some sort of lock contention. Unfortunatelly, I know next to nothing about Gluster and tools for diagnosing this kind of problems.
Replying to [comment:22 kevin]:
Another query for everyone: Is this just trac? Are other items: downloads, git pulls/pushes, cgit access all fine? Or do you find those slow as well? Cgit is fine, git access has some latency after initial connection but it is acceptable. Personally, Trac is causing biggest pain to me.
Are other items: downloads, git pulls/pushes, cgit access all fine? Or do you find those slow as well? Cgit is fine, git access has some latency after initial connection but it is acceptable. Personally, Trac is causing biggest pain to me.
ok. I found another bottleneck in the freeipa and sssd trac instances.
Can you folks try now and see how things look?
It was trying to sync the git repo on every page load...which as you might imagine slowed things down quite a lot. ;(
'''FreeIPA''' {{{ $ time wget -O /dev/null 'https://fedorahosted.org/freeipa/report/3' 0m11.516s 0m13.488s 0m11.310s 0m10.005s 0m9.036s }}}
Access time for latest tickets is much better. Server .201 seems to have +2 seconds for each request. {{{ $ time wget -O /dev/null https://fedorahosted.org/freeipa/ticket/3274 0m2.272s (66.135.62.187) 0m2.355s (66.135.62.187) 0m2.595s (66.135.62.187) 0m2.420s (66.135.62.187)
0m4.114s (66.135.62.201) 0m4.749s (66.135.62.201) 0m4.138s (66.135.62.201) 0m4.539s (66.135.62.201) 0m4.569s (66.135.62.201) 0m4.459s (66.135.62.201) }}}
Test with older ticket showed something. Are old data blocks on slower storage or so? Huge differences from previous case seems strange to me. {{{ $ time wget -O /dev/null https://fedorahosted.org/freeipa/ticket/327 0m22.761s (66.135.62.187) 0m4.386s (66.135.62.187) 0m3.877s (66.135.62.187)
0m16.451s (66.135.62.201) 0m6.954s (66.135.62.201) 0m9.088s (66.135.62.201) 0m6.850s (66.135.62.201) }}}
'''SSSD''' {{{ $ time wget -O /dev/null 'https://fedorahosted.org/sssd/report/3' 0m5.519s 0m6.354s 0m6.322s 0m5.864s 0m5.724s }}}
Can you eliminate git sync for bind-dyndb-ldap Trac too? It is an example of very small Trac instance so could test the impact for the other end of "spectrum".
Replying to [comment:25 pspacek]:
Access time for latest tickets is much better. Server .201 seems to have +2 seconds for each request.
Backups may have been running against the 201 server when you were testing...
Test with older ticket showed something. Are old data blocks on slower storage or so? Huge differences from previous case seems strange to me.
Me too. All of the tickets are in the same trac db on the same storage.
done.
Also, I have been testing here with 'ab'... it does a pretty nice job:
{{{ ab -c 1 -n 3 "https://fedorahosted.org/freeipa/report/3" }}}
So, where are we now?
Acceptable for now?
We do intend to look at some kind of 'next generation' fedorahosted next year... possibly with something better than trac or setup with real dbs, etc.
Hopefully things as they are now will get us along until then?
I'm going to go ahead and close this out... hopefully things are at least usable for everyone for now.
Later in the year we do plan to look at moving to faster hardware and possibly at that time look at real db backends (or even replacing trac with something better if we can find something better).
Hello Kevin,
I forgot to reply to your comment:27, I'm sorry. Trac is slow, but usable. I'm looking forward to new HW and other changes, it hopefully will help.
Thank you for assistance.
Login to comment on this ticket.