#462 Too many open files from pagure_ev
Closed: Fixed a year ago by wombelix. Opened 8 years ago by kevin.

Tracebacks today:

Nov 02 17:01:53 pagure01.fedoraproject.org pagure-stream-server.py[2621]: socket: <socket._socketobject object at 0x4a59210>
Nov 02 17:01:53 pagure01.fedoraproject.org pagure-stream-server.py[2621]: Traceback (most recent call last):
Nov 02 17:01:53 pagure01.fedoraproject.org pagure-stream-server.py[2621]: File "/usr/lib/python2.7/site-packages/trollius/selector_events.py", line 158, in _accept_connection
Nov 02 17:01:53 pagure01.fedoraproject.org pagure-stream-server.py[2621]: File "/usr/lib/python2.7/site-packages/trollius/py33_exceptions.py", line 122, in wrap_error
Nov 02 17:01:53 pagure01.fedoraproject.org pagure-stream-server.py[2621]: File "/usr/lib64/python2.7/socket.py", line 202, in accept
Nov 02 17:01:53 pagure01.fedoraproject.org pagure-stream-server.py[2621]: error: [Errno 24] Too many open files

restarting the service seems to have cleared it. Perhaps it's not closing files/descriptors correctly?


We run into this every so often, it's kinda annoying.

I've proposed some changes at https://pagure.io/pagure/pull-request/860 which may or may not help, let's see :)

This is still happening:

May 04 05:45:45 pagure01.fedoraproject.org pagure-stream-server.py[709]: File "/usr/lib/python2.7/site-packages/trollius/selector_events.py", line 158, in _accept_connection
May 04 05:45:45 pagure01.fedoraproject.org pagure-stream-server.py[709]: File "/usr/lib/python2.7/site-packages/trollius/py33_exceptions.py", line 122, in wrap_error
May 04 05:45:45 pagure01.fedoraproject.org pagure-stream-server.py[709]: File "/usr/lib64/python2.7/socket.py", line 202, in accept
May 04 05:45:45 pagure01.fedoraproject.org pagure-stream-server.py[709]: error: [Errno 24] Too many open files

any idea how to recreate the issue locally?

I actually never could, maybe using a js shell or something to open a lot of connections?

I actually never could, maybe using a js shell or something to open a lot of connections?
I wrote a basic python script to reproduce this issue locally. it seems to work with 2000 threads


import httplib
import threading

def open_connection():
conn = httplib.HTTPConnection("localhost:8080")
conn.request("GET", "/test/issue/26?foo=bar")
r1 = conn.getresponse()
print r1.status, r1.reason
data1 = r1.read()

for i in range(2000):
t = threading.Thread(target=open_connection, args=[])
t.start()
print('started thread', t.ident)

Good idea, you can also try the ab (apache benchmark) tool for that

it seems that this is a bug in trollius.

Another project had a simmilar problem and their solution was to close idle connections and clean up the closed ones https://github.com/chfoo/wpull/issues/167

------- edit

The issue is about connection to a server and writing files. The import point though is that the number of tmp files stayed constant while the tcp connection constantly grew. after creating cleaning function the ammount of tcp connections stayed relatively constant between 10-70

------- end edit

but this doesn't seem to work because someone recently opened another ticket about this same issue. https://github.com/chfoo/wpull/issues/315

Another article suggested that its a ulimit problem and bump it up but when i tried that on a local machine it had no affect. Perhaps i need to turn off my computer and sync the change?? https://support.xmatters.com/hc/en-us/articles/202089439--Too-many-open-files-error-on-Linux-Unix

One thing i noticed though while testing this project is that my test program also failed the same way but if you opened the project in seperate terminals so seperate processes it had no problem. When i tried to fork the process in python i got the same results so it seems it needs to be independent processes.

so what im thinking is that once troulius detects that its getting near the limit it needs to spawn a new parent processses that they all can interact with each other making sure its not spawning no proc for no reason.

and maybe also create a maid function that cleans up the connections once finished/closed.

------- edit

forgot to mention @pingou the ab command was nice :) it helped and i used both to try to fault the program. it seems that ab will close its connection after a limited time so it might not be useful for debugging but its great to get the program to fault quickly.

ab -n 1000 -c 1000 http://localhost:8080/test/issue/26?foo=bar

it seems on my machine i can't go overt 1000 so i had to call this command several times.

------- end edit

Just tested ulimit again and it seems to be per process or per terminal window

when i was running cmd

ab -n 10000 -c 4000 http://localhost:8080/test/issue/26?foo=bar

it would say too many open files but when i bumped by ulimit -n 4096 my machines max. i was able to run 4000 concurrent connections then the command worked

ab -n 10000 -c 4000 http://localhost:8080/test/issue/26?foo=bar

i did the same to the terminal i was running the ev server and it seems to handle more connections.

perhaps temp fix is to bump the ulimit for now ulimit -n unlimited and then create a maid function to properly close connections?

Since #1281 from @amsharma was closed and pointed to this ticket, I just wanted to follow up with a comment here since it's been three months. As I am starting to use Pagure more and more for a lot of ticket-based discussion, the problem with me clicking "Update issue" and losing all of my text is becoming increasingly common and very challenging to my workflow. I'll spend 30 minutes typing a long response and then lose all of it because I click without thinking to make a local copy of my comment first.

If this specific ticket would have a way to resolve this, but it is taking some time, could there be at least some kind of workaround? Like saving a local copy of the ticket field text in the browser with cookies? Or anything?

To me, this is probably the top issue I want to see improved in Pagure because this problem is detrimental to my workflow… retyping a lot of text is more mentally exhausting than it seems…

@jflory7 well, there are two issues, this one is just about no refreshing the UI, it's not about actually loosing the comment. So refreshing the page should still show the comment, just that the live refresh of the page with your comment doesn't work.

So I consider the live refresh as less important, but the loss of data definitely something we want to look into quickly and I'll probably use #1333 as tracker for it (I assumed #1281 was referring to this issue not the one of loosing comment, sorry if I was wrong).

@pingou Ah, sorry - yes, I think #1281 and #1333 are for the same thing. It would be great to have some sort of way to cache that data locally or some other solution. :)

I also think they are related somehow: restarting the server seems to fix both of them.

@petersen nope it doesn't that's why there are two tickets

The last update was 6 years ago, no further requests, updates or actionable tasks since then. Related Tickets closed as fixed. I'm going to close this issue for now to reduce our backlog.

Metadata Update from @wombelix:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata