#3582 Trac instances are very slow
Closed: Fixed None Opened 11 years ago by pspacek.

= phenomenon =
Any access to Trac instances on fedorahosted.org is very slow.

Some examples (anonymous user):
$ time wget https://fedorahosted.org/freeipa/ticket/3274 -O /dev/null
real 0m12.571s
real 0m18.466s
real 0m11.192s
real 0m13.509s

$ time wget https://fedorahosted.org/freeipa/ticket/3254 -O /dev/null
real 0m3.349s (Wow! I was lucky!)
real 0m7.730s
real 0m6.625s
real 0m7.546s

= How we can help? =
Were is the bottleneck? Where we should push? Who is the responsible manager (if any)?


Hmm, my bad. There is a table with better formating:
{{{
Some examples (anonymous user):
$ time wget https://fedorahosted.org/freeipa/ticket/3274 -O /dev/null
real 0m12.571s
real 0m18.466s
real 0m11.192s
real 0m13.509s

$ time wget https://fedorahosted.org/freeipa/ticket/3254 -O /dev/null
real 0m3.349s (Wow! I was lucky!)
real 0m7.730s
real 0m6.625s
real 0m7.546s
}}}

Replying to [ticket:3582 pspacek]:

= phenomenon =
Any access to Trac instances on fedorahosted.org is very slow.

Do you see this with only the freeipa trac? Or others?

When did you notice the slowdown?

= How we can help? =
Were is the bottleneck? Where we should push? Who is the responsible manager (if any)?

I don't know yet, will investigate, above info could help us narrow it down.

I don't think complaining to management before you give us time to investigate would be very useful.

Replying to [comment:2 kevin]:

Replying to [ticket:3582 pspacek]:

= phenomenon =
Any access to Trac instances on fedorahosted.org is very slow.

Do you see this with only the freeipa trac? Or others?

I see similar slowdown with the SSSD Trac as well.

So, can anyone note when they saw the slowdown? In the last few days? Weeks? Months?

Is it only downloading tickets? Or all operations?

At least for me, the fh.o Trac instances have always been kind of slow, but the performance seems to be getting worse over time (maybe proportionally to the data fh.o hosts?).

The slowest operation seems to be anything that queries the ticket database. Displaying the open tickets for the SSSD trac (https://fedorahosted.org/sssd/report/3) takes about 10 seconds on average.

ok. I have run a sqlite vacuum on both projects db's... this does seem to improve things from what I can see, but not sure it's up to acceptable for you.

Can you test now?

It might be we are hitting some kind of sqlite issue and should consider moving over to a real database backend for projects.

Thank you Kevin. This is rather unscientific but it does seem faster now, I'd say maybe 2x?

As for moving off sqlite, we have an awful lot of data in this trac instance and we modify it frequently, I don't know what the working limits are for sqlite but I always thought it was for smaller data sets. It wouldn't surprise me if using a more capable db would be an improvement (postgresql?)

I appreciate the interest here and additional cc's, but they don't help any without data. ;)

Are things fast enough for now?

Still taking a lot time to pull up some reports, 30 for report 3.

Re-running a query speeds things up a bit, I imagine that caching plays a role.

Fetching individual tickets took 7-10 seconds each in my tests. Slow but tolerable I suppose.

FreeIPA Trac is now faster, but still it isn't comfortable to use (from my personal point of view).
{{{
$ time wget https://fedorahosted.org/freeipa/ticket/3274 -O /dev/null
real 0m7.342s
real 0m10.690s
real 0m6.263s
real 0m7.965s

$ time wget https://fedorahosted.org/freeipa/report/3 -O /dev/null
real 0m16.710s
real 0m15.166s
real 0m20.910s
real 0m22.004s
}}}

bind-dyndb-ldap Trac has 32x less tickets (100 vs. 3200 in FreeIPA Trac) and still it takes some time to load particular bug.
{{{
$ time wget https://fedorahosted.org/bind-dyndb-ldap/ticket/12 -O /dev/null
real 0m3.456s
real 0m4.117s
real 0m2.875s
real 0m2.978s

$ time wget https://fedorahosted.org/bind-dyndb-ldap/report/3 -O /dev/null
real 0m7.992s
real 0m4.511s
real 0m3.846s
real 0m7.356s
real 0m4.008s
}}}

Results for "empty form page" (logged in as pspacek):
{{{
$ time wget https://fedorahosted.org/bind-dyndb-ldap/newticket -O /dev/null --load-cookies=/tmp/cookies.txt
real 0m3.413s
real 0m3.733s
real 0m2.724s
real 0m3.066s
}}}

From results above - there is approximatelly 3 seconds overhead somewhere added to each request/reply.

There is traceroute from my station to fedorahosted:
{{{
traceroute to 66.135.62.187 (66.135.62.187), 30 hops max, 60 byte packets
1 *
2 *
3 *
4 *
5 *
6 *
7 *
8 ge-0-0-2.vnlams1-core-1.virtela.net (77.247.0.0) 38.585 ms 39.275 ms 38.599 ms
9 asd2-rou-1043.NL.eurorings.net (134.222.152.118) 47.139 ms 38.793 ms 39.374 ms
10 rt2-rou-1022.NL.eurorings.net (134.222.199.106) 43.314 ms 41.072 ms 43.627 ms
11 ldn-s2-rou-1101.UK.eurorings.net (134.222.232.75) 48.504 ms 48.497 ms 47.669 ms
12 ldn-s2-rou-1041.UK.eurorings.net (134.222.228.94) 47.117 ms 92.293 ms 50.010 ms
13 10ge.xe-0-0-0.linxb.ldn-teleh-dis-1.peer1.net (195.66.224.156) 47.931 ms 46.208 ms 47.188 ms
14 10ge.xe-1-1-0.nyc-telx-dis-1.peer1.net (216.187.115.117) 120.018 ms 209.518 ms 119.681 ms
15 10ge.xe-0-0-0.nyc-telx-dis-2.peer1.net (216.187.115.182) 120.327 ms 119.466 ms 119.901 ms
16 10ge.xe-1-2-0.chi-eqx-dis-1.peer1.net (216.187.114.161) 207.870 ms 140.502 ms 140.549 ms
17 10ge-ten1-1-0.dal-eqx-xe2-1.peer1.net (216.187.124.73) 169.704 ms 171.003 ms 169.169 ms
18 10ge-ten1-2.sat-8500v-cor-1.peer1.net (216.187.124.78) 245.670 ms 179.149 ms 178.620 ms
19 10ge.xe-0-0-1.sat-8500v-sbdis-1.peer1.net (216.187.124.66) 182.647 ms 182.640 ms 177.015 ms
20 hosted02.fedoraproject.org (66.135.62.187) 175.639 ms 176.038 ms 176.628 ms
21 hosted02.fedoraproject.org (66.135.62.187) 175.733 ms !X 177.950 ms !X 176.883 ms !X
}}}

180 ms is a lot (I'm based in Europe), but it should not add more than 1 second anyway I guess.

Please let me know you need further information.

So, some things to note:

  • There's 2 nodes (both active) that you get in a dns round robin

  • There's caching involved. The first access will be slower than subsquent ones, and the cache is local to the node. So, if you hit hosted01, then hosted02, it will now be cached on both, but those first two accesses may be slower. In any case the first time someone hits something in a while it will not be cached.

I'm not sure what else to try to improve performance off hand. We could look at moving to a postgresql backend, but then we are going back to a single point of failure, which I dislike.

I have tweaked a few other things... can you retry your tests, with multiple attempts to show caching?

I tried to run report/3 requests for FreeIPA instance and I don't really see substantial improvements and caching results across the nodes. There is clear difference in responses by different nodes. In particular .187 node seems to be faster than .201 node, all .187 node responses are below 15s and vary between 10s and 14s while .201 node responds slower and there seem to be less caching effect visible -- note 28.596s spike while previous request to the same node gave 16.025s.

At the same time, if anything was cached, even 10s response time is clearly unacceptable for caching case. I doubt there were modifications in the tickets at the time since all my requests were done sequentially, over few minutes time when US already went asleep and Europe didn't start seriously working yet.
{{{
time wget https://fedorahosted.org/freeipa/report/3 -O /dev/null
..
real 0m16.025s
..
real 0m16.797s
..
real 0m28.596s
..
real 0m17.107s
..
real 0m10.445s
..
real 0m14.853s
..
real 0m15.593s
..
real 0m10.136s
..
real 0m14.093s
..
real 0m11.215s
}}}

Replying to [comment:18 kevin]:

I'm not sure what else to try to improve performance off hand. We could look at moving to a postgresql backend, but then we are going back to a single point of failure, which I dislike.

Speaking about redundancy - how file (SQLite?) backend helps? I'm not database guy, but I don't see improvement when whole database is in file. Is that file located on some clustered filesystem? Then filesystem is single point of failure (as I saw in our lab) ...

Yes, the two nodes have a replicated glusterfs filesystem between them. So, all data is shared. One node could be completely down and all data access would function as normal.

I'll try and come up with some ways to improve performance more...

Another query for everyone: Is this just trac?

Are other items: downloads, git pulls/pushes, cgit access all fine? Or do you find those slow as well?

Replying to [comment:21 kevin]:

Yes, the two nodes have a replicated glusterfs filesystem between them. So, all data is shared. One node could be completely down and all data access would function as normal.

I'll try and come up with some ways to improve performance more...
I suspect Gluster+SQLite combination from some sort of lock contention. Unfortunatelly, I know next to nothing about Gluster and tools for diagnosing this kind of problems.

Replying to [comment:22 kevin]:

Another query for everyone: Is this just trac?

Are other items: downloads, git pulls/pushes, cgit access all fine? Or do you find those slow as well?
Cgit is fine, git access has some latency after initial connection but it is acceptable. Personally, Trac is causing biggest pain to me.

ok. I found another bottleneck in the freeipa and sssd trac instances.

Can you folks try now and see how things look?

It was trying to sync the git repo on every page load...which as you might imagine slowed things down quite a lot. ;(

'''FreeIPA'''
{{{
$ time wget -O /dev/null 'https://fedorahosted.org/freeipa/report/3'
0m11.516s
0m13.488s
0m11.310s
0m10.005s
0m9.036s
}}}

Access time for latest tickets is much better. Server .201 seems to have +2 seconds for each request.
{{{
$ time wget -O /dev/null https://fedorahosted.org/freeipa/ticket/3274
0m2.272s (66.135.62.187)
0m2.355s (66.135.62.187)
0m2.595s (66.135.62.187)
0m2.420s (66.135.62.187)

0m4.114s (66.135.62.201)
0m4.749s (66.135.62.201)
0m4.138s (66.135.62.201)
0m4.539s (66.135.62.201)
0m4.569s (66.135.62.201)
0m4.459s (66.135.62.201)
}}}

Test with older ticket showed something. Are old data blocks on slower storage or so? Huge differences from previous case seems strange to me.
{{{
$ time wget -O /dev/null https://fedorahosted.org/freeipa/ticket/327
0m22.761s (66.135.62.187)
0m4.386s (66.135.62.187)
0m3.877s (66.135.62.187)

0m16.451s (66.135.62.201)
0m6.954s (66.135.62.201)
0m9.088s (66.135.62.201)
0m6.850s (66.135.62.201)
}}}

'''SSSD'''
{{{
$ time wget -O /dev/null 'https://fedorahosted.org/sssd/report/3'
0m5.519s
0m6.354s
0m6.322s
0m5.864s
0m5.724s
}}}

Can you eliminate git sync for bind-dyndb-ldap Trac too? It is an example of very small Trac instance so could test the impact for the other end of "spectrum".

Replying to [comment:25 pspacek]:

Access time for latest tickets is much better. Server .201 seems to have +2 seconds for each request.

Backups may have been running against the 201 server when you were testing...

Test with older ticket showed something. Are old data blocks on slower storage or so? Huge differences from previous case seems strange to me.

Me too. All of the tickets are in the same trac db on the same storage.

Can you eliminate git sync for bind-dyndb-ldap Trac too? It is an example of very small Trac instance so could test the impact for the other end of "spectrum".

done.

Also, I have been testing here with 'ab'... it does a pretty nice job:

{{{
ab -c 1 -n 3 "https://fedorahosted.org/freeipa/report/3"
}}}

So, where are we now?

Acceptable for now?

We do intend to look at some kind of 'next generation' fedorahosted next year... possibly with something better than trac or setup with real dbs, etc.

Hopefully things as they are now will get us along until then?

I'm going to go ahead and close this out... hopefully things are at least usable for everyone for now.

Later in the year we do plan to look at moving to faster hardware and possibly at that time look at real db backends (or even replacing trac with something better if we can find something better).

Hello Kevin,

I forgot to reply to your comment:27, I'm sorry. Trac is slow, but usable. I'm looking forward to new HW and other changes, it hopefully will help.

Thank you for assistance.

Login to comment on this ticket.

Metadata