#49242 Add GDB debugging script
Closed: wontfix 6 years ago Opened 6 years ago by firstyear.

Issue Description

In our stack traces we have many threads, generally idle. This makes it hard for a new user or support to identify issues quickly.

In addition, because of our access log buffering, there is often logs in memory related to a crash that are inaccesible. We give steps to display these on the wiki, but it's not very friendly.

Instead, we should provide a python script for gdb, that allows dumping this information.


Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to review
- Custom field type adjusted to defect
- Issue assigned to firstyear

6 years ago

I'm not sure we can always consider a thread in DS_Sleep as idle, it could eg be a backend op sleeping between txn retry, we should not loose these threads

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.7.0

6 years ago

Yeah, that makes sense. I can remove DS_Sleep from the pattern match then.

I'm still thinking about the best approach, I do not really like to remove threads. Couldn't we instead collapse identical stack traces and report once, saying
thread xx and 22 other threads :
wait_on_new_work
...
thread yy and 3 other threads:
dblayer_lock
...
thread zz
..
thread a
...
...

@lkrispen This update changes how it works.

In a gdb session type:

source /usr/lib64/dirsrv/python/ds_gdb.py

This script does three things. First, it changes the back trace output to show what thread might be idle as:

#0  0x00007ffff3ff981b in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007ffff422e630 in PR_WaitCondVar () at /lib64/libnspr4.so
#2  0x0000000000424914 in [IDLE THREAD] connection_wait_for_new_work (pb=0x60800046c020, interval=4294967295) at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:966
#3  0x0000000000428380 in connection_threadmain () at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1533
#4  0x00007ffff4233fcb in _pt_root () at /lib64/libnspr4.so
#5  0x00007ffff3ff3369 in start_thread () at /lib64/libpthread.so.0
#6  0x00007ffff38cbd0f in clone () at /lib64/libc.so.6

Note the [IDLE THREAD] addition in the stack.

Second, it adds a command called ds-access-log which will show the contents of the in memory access log.

Finally, it adds a command called ds-backtrace which will print a backtrace, but automatically determines threads in the same stack, so it coalesces them together. For example.

Thread 58 (LWP 3372))
Thread 57 (LWP 3371))
Thread 56 (LWP 3370))
Thread 55 (LWP 3369))
Thread 54 (LWP 3368))
Thread 53 (LWP 3367))
Thread 52 (LWP 3366))
Thread 51 (LWP 3365))
Thread 50 (LWP 3364))
Thread 49 (LWP 3363))
Thread 48 (LWP 3362))
Thread 47 (LWP 3361))
Thread 46 (LWP 3360))
Thread 45 (LWP 3359))
Thread 44 (LWP 3358))
Thread 43 (LWP 3357))
Thread 42 (LWP 3356))
Thread 41 (LWP 3355))
Thread 40 (LWP 3354))
Thread 39 (LWP 3353))
Thread 38 (LWP 3352))
Thread 37 (LWP 3351))
Thread 36 (LWP 3350))
Thread 35 (LWP 3349))
#0  0x00007ffff3ff981b in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007ffff422e630 in PR_WaitCondVar () at /lib64/libnspr4.so
#2  0x0000000000424914 in [IDLE THREAD] connection_wait_for_new_work (pb=0x60800046c020, interval=4294967295) at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:966
#3  0x0000000000428380 in connection_threadmain () at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1533
#4  0x00007ffff4233fcb in _pt_root () at /lib64/libnspr4.so
#5  0x00007ffff3ff3369 in start_thread () at /lib64/libpthread.so.0
#6  0x00007ffff38cbd0f in clone () at /lib64/libc.so.6

Because of how these changes were made, the backtrace and other commands still work as usual without the coalescing.

@vashirov Is /usr/lib64/dirsrv/python/ds_gdb.py okay for the rpm?

If you want it to be auto loaded while running gdb against ns-slapd, file name should be objfile-gdb.py where objfile is ns-slapd and it should be in the auto-load path for gdb (in this case /usr/share/gdb/auto-load/usr/sbin/ns-slapd-gdb.py), see https://sourceware.org/gdb/onlinedocs/gdb/objfile_002dgdbdotext-file.html#objfile_002dgdbdotext-file

Path /usr/share/gdb/auto-load belongs to many packages according to yum provides "/usr/share/gdb/auto-load/*", so I think it's pretty safe to do the same trick as other do:
http://pkgs.fedoraproject.org/cgit/rpms/gcc.git/tree/gcc.spec#n1451
http://pkgs.fedoraproject.org/cgit/rpms/mono.git/tree/mono.spec#n356
http://pkgs.fedoraproject.org/cgit/rpms/libreoffice.git/tree/libreoffice.spec#n1355

Also I'm getting errors while I'm trying to use it

NameError: name 'DSErrorLog' is not defined

I guess you meant DSAccessLog()

Another thing, you have

+   $(srcdir)/ldap/admin/src/scripts/ds-replcheck \

which is not related to this change.

Thanks!

@vashirov Is /usr/lib64/dirsrv/python/ds_gdb.py okay for the rpm?

If you want it to be auto loaded while running gdb against ns-slapd, file name should be objfile-gdb.py where objfile is ns-slapd and it should be in the auto-load path for gdb (in this case /usr/share/gdb/auto-load/usr/sbin/ns-slapd-gdb.py), see https://sourceware.org/gdb/onlinedocs/gdb/objfile_002dgdbdotext-file.html#objfile_002dgdbdotext-file
Path /usr/share/gdb/auto-load belongs to many packages according to yum provides "/usr/share/gdb/auto-load/*", so I think it's pretty safe to do the same trick as other do:
http://pkgs.fedoraproject.org/cgit/rpms/gcc.git/tree/gcc.spec#n1451
http://pkgs.fedoraproject.org/cgit/rpms/mono.git/tree/mono.spec#n356
http://pkgs.fedoraproject.org/cgit/rpms/libreoffice.git/tree/libreoffice.spec#n1355

Mate, I could no work out where it was meant to go. That's genius. I'll try and get it to go there.

Also I'm getting errors while I'm trying to use it
NameError: name 'DSErrorLog' is not defined

Opps! Why was this working for me I wonder ....

I guess you meant DSAccessLog()
Another thing, you have
+ $(srcdir)/ldap/admin/src/scripts/ds-replcheck \

which is not related to this change.
Thanks!

Yep, I think I was trying to solve something and forgot to take that out.

0001-Ticket-49242-add-gdb-script-to-rpm.patch

@vashirov This adds the script to the gdb autoload path as you described. It adds two commands, ds-backtrace and ds-access-log. Finally, it has a stack filter that indicates on backtraces (both ds-backtrace and bt) if a stack may be idle.

Just one minor thing:
File name changed, usage needs an update:

 69 +# Usage: from within gdb call:
 70 +#      source ds_gdb.py
 71 +#

The rest looks good to me, but I'll leave the acking to developers.

I updated the usage for you :)

I updated the usage for you :)

Metadata Update from @vashirov:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

commit 5ecd8ec
To ssh://git@pagure.io/389-ds-base.git
c2c512e..5ecd8ec master -> master

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2301

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata