#3924 "Corrupted" name of "Hello" method of org.freedesktop.DBus sssd sbus interface on Fedora Rawhide
Closed: Fixed 2 years ago by jhrozek. Opened 2 years ago by atikhonov.

Most of intg-tests started to fail on Fedora Rawhide CI machine recently.

Logs inspection results in the following possible reasons for fails:

() [sssd] [sbus_dispatch] (0x4000): Dispatching.
() [sssd] [sbus_method_handler] (0x2000): Received D-Bus method org.freedesktop.DBus.Hello on /org/freedesktop/DBus
() [sssd] [sbus_method_handler] (0x0020): Unknown method!
() [sssd] [sbus_dispatch] (0x4000): Dispatching.
() [sssd] [sbus_reply_check] (0x4000): D-Bus error [org.freedesktop.DBus.Error.UnknownMethod]: Hello
() [sssd] [sbus_connect_private_done] (0x0020): Unable to initialize connection [5]: Input/output error
() [sssd] [monitor_quit] (0x0040): Returned with: 3
() [sssd] [monitor_cleanup] (0x0010): Error removing pidfile! (2 [No such file or directory])

I have debugged issue a bit and it seems method name of first ("Hello") method on this interface is somehow "corrupted" right after creation of interface descriptor.
Right after:

    struct sbus_interface bus = SBUS_INTERFACE(
            SBUS_SYNC(METHOD, org_freedesktop_DBus, Hello, sbus_server_bus_hello, server),
            SBUS_SYNC(METHOD, org_freedesktop_DBus, RequestName, sbus_server_bus_request_name, server),
            SBUS_SYNC(METHOD, org_freedesktop_DBus, ReleaseName, sbus_server_bus_release_name, server),

descriptor contains following method names:


-- instead "Hello" there is "org.freedesktop.DBus"

Valgrind doesn't complain when sssd is run inside.

If all other methods in initializer are commented out then "Hello" method has correct name in interface decriptor. Add second method - name of frist is "corrupted".
(I put "corrupted" in "" because it doesn't seem to be memory corruption in the sense of buffer overflow or such)

Breakpoint 1, sbus_server_setup_interface (server=0x942260) at /shared/workspace/sssd/src/sbus/server/sbus_server_interface.c:387

... step after bus variable is initialized ...
(gdb) p bus.methods[0] = {name = 0x7f974e127a92 "Hello", handler = {type = SBUS_HANDLER_SYNC,   ...}
(gdb) p &bus.methods[0] = (const struct sbus_method *) 0x7fffbb5327d0
... step after paths variable is initialized ...
(gdb) p &paths = (struct sbus_path (*)[2]) 0x7fffbb5327d0
(gdb) p bus.methods[0]$4 = {name = 0x7f974e127b53 "/org/freedesktop/DBus", handler = {type = (unknown: 3142787664), ...}

It seems to be some error in gcc because bus.methods and paths are on the same address. paths overwrites bus.methods that is why we see corrupted name of the method.

Alexey, will you please report bug against gcc and move this forward with them?

Thank you.

Metadata Update from @atikhonov:
- Issue assigned to atikhonov (was: pbrezina)

2 years ago

I do not think it is error in gcc.

tl,dr: problem is bus.methods pointer is assigned to the address of temporary "anonymous" variable located on stack. Naturally, compiler is eligible to re-use stack as soon as this variable goes out of scope.

Detailed explanation
Invoking gcc -E results in the following bus initialization code (formatted for readability):

struct sbus_interface bus = 
    ({ sbus_interface(
        ((void *)0),
        (((const struct sbus_method[]) 
                /* ... compile time check of function signature omitted */ ; 
                sbus_method_sync(/* ... full list of params omitted */);

Here sbus_method_sync() is a function that returns instance of struct sbus_method by value.

Then array composed of those values on stack is passed as third argument to the fucntion sbus_interface()

struct sbus_interface
sbus_interface(const char *name,
               const struct sbus_annotation *annotations,
               const struct sbus_method *methods,
               const struct sbus_signal *signals,
               const struct sbus_property *properties)

Then sbus_interface() stores this pointer in bus.methods field. But as soon as sbus_interface() returns, this temporary array goes out of scope and compiler is fully eligible to re-use stack that was occupied by this array.
This is the reason you see in your test that bus.methods points to paths (and in my test I see that bus.methods points to ... bus itself).

I guess recent version of gcc in Fedora Rawhide makes smarter use of stack and so we now hit by this bug. Before it worked "by luck".

Metadata Update from @atikhonov:
- Issue assigned to pbrezina (was: atikhonov)

2 years ago

Metadata Update from @atikhonov:
- Issue tagged with: bug

2 years ago

Actually, problem is the same with other params of sbus_interface():

(const struct sbus_signal[])
    ({ sbus_signal("NameOwnerChanged",
                   ((void *)0));

-- sbus_signal() returns struct by value.
The same with properties.

This issue can be fixed with minimal effort either by making sbus_method/sbus_signal/etc factory methods to return object on heap or by changing sbus_interface() function to take const ptrs and copying content. As far as I understand this code is executed only once during initialization, so performance is not an issue. But I am not sure if any of those solutions are fine in overall context of sbus component.

What worries me that there seems to be no easy way to check other "macro-generated" parts of code for similar problems...

I see. You are correct. Using block to return value inside those macros limits scope of the returned pointers. Running gcc with -03 actually hides this problem so it is not reproducible with rawhide rpms.

I will provide a fix.

I see. You are correct. Using block to return value inside those macros limits scope of the returned pointers. Running gcc with -03 actually hides this problem so it is not reproducible with rawhide rpms.

Not sure how did you test that but I can easily reproduce on rawhide with -O2 and -O3
and infopipe does not work even with -O1 or -O0

Metadata Update from @jhrozek:
- Issue priority set to: blocker (was: minor)

2 years ago

Metadata Update from @jhrozek:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 2.1

2 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4909

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.