#48980 ns_handle_pr_read_ready when threaded causes use after free
Opened 3 years ago by firstyear. Modified 2 years ago

Because of the current integration of DS with NS, the function ns_handle_pr_read_ready runs in the event loop rather than a worker thread. This means that we have an overhead when a connection is ready to be read from.

Additionally, this function, actually only then releases the connection, so that the current conn table loop can detect work from it.

As a result of this design, if we are to make the connection job threaded (it is not current threaded), due to the use of preserve FD in the job, this means the FD is still in the event framework with the job. If ns_job_done is called, the FD is left in NS (as the connection table still uses it), and when a read is available, it will trigger event_cb. Of course, we are now acting on a job that is marked deleted and freed, thus we have heap use after free.

The solution is to make the job not use preserve FD, and also to handle the operation inside of the worker_thread, rather than to set the job in the connection table to ready. This would mean that after the initial connection, all the operations related to a connection are managed in a worker thread rather than the (congested) event loop.

This is the use after free.

(gdb) bt
#0  0x00007f84313b08dd in nanosleep () from /lib64/libc.so.6
#1  0x00007f84313b082a in sleep () from /lib64/libc.so.6
#2  0x00007f8434776185 in __sanitizer::SleepForSeconds (seconds=<optimized out>) at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc:114
#3  0x00007f8434764950 in __asan::AsanDie () at ../../../../libsanitizer/asan/asan_rtl.cc:48
#4  0x00007f843476b07d in __sanitizer::Die () at ../../../../libsanitizer/sanitizer_common/sanitizer_common.cc:142
#5  0x00007f8434764497 in __asan::ScopedInErrorReport::~ScopedInErrorReport (this=<synthetic pointer>, __in_chrg=<optimized out>) at ../../../../libsanitizer/asan/asan_report.cc:689
#6  __asan::ReportGenericError (pc=<optimized out>, bp=bp@entry=140204874304192, sp=sp@entry=140204874304176, addr=<optimized out>, is_write=is_write@entry=false, 
    access_size=access_size@entry=8, exp=<optimized out>, fatal=<optimized out>) at ../../../../libsanitizer/asan/asan_report.cc:1074
#7  0x00007f8434764e57 in __asan::__asan_report_load8 (addr=<optimized out>) at ../../../../libsanitizer/asan/asan_rtl.cc:130
#8  0x00007f8433f1be14 in work_q_notify (job=0x60d000249110) at /home/william/development/389ds/nunc-stans/ns_thrpool.c:284
#9  0x00007f8433f1d3a2 in event_cb (job=0x60d000249110) at /home/william/development/389ds/nunc-stans/ns_thrpool.c:519
#10 0x00007f8433f219d7 in event_cb (fd=660, event=2, arg=0x60d000249110) at /home/william/development/389ds/nunc-stans/ns_event_fw_event.c:105
#11 0x00007f8433cdae90 in event_process_active_single_queue (activeq=0x602000394670, base=0x61600001d780) at event.c:1368
#12 event_process_active (base=<optimized out>) at event.c:1438
#13 event_base_loop (base=0x61600001d780, flags=1) at event.c:1639
#14 0x00007f8433f22381 in ns_event_fw_loop (ns_event_fw_ctx=0x61600001d780) at /home/william/development/389ds/nunc-stans/ns_event_fw_event.c:311
#15 0x00007f8433f1d0e7 in event_loop_thread_func (arg=0x60e00015cec0) at /home/william/development/389ds/nunc-stans/ns_thrpool.c:466
#16 0x00007f84318f07df in _pt_root (arg=0x61200008e3c0) at ../../../nspr/pr/src/pthreads/ptthread.c:216
#17 0x00007f84316b06ba in start_thread () from /lib64/libpthread.so.0
#18 0x00007f84313eb3cf in clone () from /lib64/libc.so.6


(gdb) print *job
$2 = {tp = 0x60e0590000ad, ns_event_fw_ctx = 0x61600001d780, func = 0x4321e7 <ns_handle_pr_read_ready>, data = 0x7f841ccb8820, job_type = 404, fd = 0x6040002d5e90, tv = {tv_sec = 0, 
    tv_usec = 0}, signal = 0, ns_event_fw_fd = 0x0, ns_event_fw_time = 0x0, ns_event_fw_sig = 0x0, output_job_type = 4, state = 3, work_thread = 0x0, 
  alloc_event_context = 0x7f8433f1d864 <alloc_event_context>, free_event_context = 0x7f8433f1d882 <free_event_context>, event_cb = 0x7f8433f1d24c <event_cb>, done_cb = 0x0}

Note the output_job_type of 4 (read) and the state 3 (deleted).


NOTE: I would be putting this to 1.3.7

This is the use after free.
This could be the cause of the server crash?

Replying to [comment:1 firstyear]:

NOTE: I would be putting this to 1.3.7

Could you provide why you proposed to set to 1.3.7?

Per weekly meeting, this issue looks severe and needing a fix sooner, e.g., 1.3.6 to enable the current NS...

This issue only occurs when I add a patch to DS: It does NOT affect current production releases.

It's a blocker to proper threading of NS for future DS, and thus I said push out to 1.3.7 because there is heaps to do in this area anyway.

Thank you for the clarification, William.

I guess we should have some NS version info...

For instance, the current 389-ds-base-1.3.5 is built with nunc-stans-0.1.8.

The configure.ac in master still has 0.1.8:
configure.ac:AC_INIT([nunc-stans], [0.1.8], ‚Äčhttp://fedorahosted.org/nunc-stans)

Do we want to bump it + notion of alpha? Do you have any good idea how to add/maintain it?

Thanks!

Metadata Update from @firstyear:
- Issue set to the milestone: 1.3.7 backlog

2 years ago

Metadata Update from @firstyear:
- Issue assigned to firstyear

2 years ago

Metadata Update from @firstyear:
- Issue close_status updated to: None
- Issue set to the milestone: 1.4 backlog (was: 1.3.7 backlog)

2 years ago

Login to comment on this ticket.

Metadata