#2792 SSSD is not closing sockets properly
Closed: Fixed None Opened 3 years ago by ondrejv2.

After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets:

[root@login05 ondrejv]# ls -l /proc/`pgrep sssd_be`/fd/
total 0
lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null
lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null
lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify
lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170]
lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463]
lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068]
lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757]
lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564]
lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886]
lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591]
lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626]
lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492]
lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970]
l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log
lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278]
lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487]
lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428]
lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435]
lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714]
lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683]
lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893]
lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567]
lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260]
lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973]
lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll]
lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203]
lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282]
lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422]
lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015]
lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826]
lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375]
lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942]
lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722]
lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106]
lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357]
lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb
lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172]
lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954]
lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238]
lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686]
lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663]
lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231]
lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278]
lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921]
lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007]
lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269]
lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983]
lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178]
lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570]
lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094]
lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349]
lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746]
lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051]
lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496]
lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267]
lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692]
lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812]
lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986]
lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005]
lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136]
lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383]
lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227]
lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180]
lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545]
lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173]
lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932]
lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043]
lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523]
lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990]
lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320]
lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302]
lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603]
lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353]
lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837]
lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206]
lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641]
lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790]
lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272]
lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490]
l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log
lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828]
lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082]
lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370]
lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300]
lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237]
lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551]
....

Fields changed

description: After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets:
[root@login05 ondrejv]# ls -l /proc/pgrep sssd_be/fd/
total 0
lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null
lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null
lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify
lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170]
lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463]
lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068]
lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757]
lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564]
lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886]
lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591]
lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626]
lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492]
lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970]
l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log
lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278]
lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487]
lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428]
lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435]
lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714]
lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683]
lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893]
lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567]
lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260]
lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973]
lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll]
lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203]
lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282]
lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422]
lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015]
lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826]
lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375]
lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942]
lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722]
lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106]
lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357]
lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb
lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172]
lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954]
lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238]
lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686]
lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663]
lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231]
lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278]
lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921]
lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007]
lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269]
lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983]
lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178]
lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570]
lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094]
lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349]
lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746]
lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051]
lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496]
lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267]
lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692]
lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812]
lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986]
lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005]
lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136]
lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383]
lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227]
lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180]
lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545]
lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173]
lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932]
lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043]
lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523]
lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990]
lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320]
lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302]
lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603]
lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353]
lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837]
lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206]
lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641]
lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790]
lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272]
lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490]
l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log
lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828]
lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082]
lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370]
lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300]
lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237]
lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551]
.... => After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets:
{{{
[root@login05 ondrejv]# ls -l /proc/pgrep sssd_be/fd/
total 0
lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null
lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null
lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify
lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170]
lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463]
lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068]
lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757]
lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564]
lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886]
lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591]
lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626]
lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492]
lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970]
l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log
lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278]
lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487]
lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428]
lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435]
lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714]
lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683]
lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893]
lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567]
lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260]
lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973]
lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll]
lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203]
lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282]
lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422]
lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015]
lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826]
lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375]
lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942]
lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722]
lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106]
lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357]
lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb
lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172]
lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954]
lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238]
lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686]
lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663]
lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231]
lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278]
lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921]
lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007]
lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269]
lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983]
lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178]
lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570]
lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094]
lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349]
lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746]
lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051]
lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496]
lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267]
lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692]
lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812]
lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986]
lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005]
lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136]
lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383]
lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227]
lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180]
lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545]
lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173]
lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932]
lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043]
lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523]
lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990]
lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320]
lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302]
lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603]
lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353]
lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837]
lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206]
lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641]
lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790]
lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272]
lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490]
l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log
lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828]
lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082]
lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370]
lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300]
lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237]
lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551]
....
}}}

Do these sockets correspond with a high number of open files in /var/lib/sss/pipes ?

No:

[root@login05 ondrejv]# ls -lR /var/lib/sss/pipes
/var/lib/sss/pipes:
total 4
srw-rw-rw-  1 root root    0 Aug 12 17:39 autofs
srw-rw-rw-  1 root root    0 Aug 12 17:39 nss
srw-rw-rw-  1 root root    0 Aug 12 17:39 pam
drwx------. 2 root root 4096 Aug 12 17:39 private

/var/lib/sss/pipes/private:
total 0
srw------- 1 root root  0 Aug 12 17:39 pam
lrwxrwxrwx 1 root root 48 Aug 12 17:39 sbus-dp_default -> /var/lib/sss/pipes/private/sbus-dp_default.30794
srw------- 1 root root  0 Aug 12 17:39 sbus-dp_default.30794
srw------- 1 root root  0 Aug 12 17:39 sbus-monitor

Fields changed

owner: somebody => lslebodn

Hi, shall I file an official RH support request for this one?
It would be nice if we had it fixed for next update.

Replying to [comment:5 ondrejv2]:

Hi, shall I file an official RH support request for this one?
It would be nice if we had it fixed for next update.

Yes please.

I assume that all unclosed unix socket are to /var/lib/sss/pipes/private/sbus-dp_default.30794.
It could be proven with command "lsof -U | grep sssd_be".

It's not clear to me from code how could it happen. There might be some debug messages which could help to reveal the bug. You wrote on sssd-users that you do not have a deterministic reproducer. Would it be possible to enable debugging in domain process? If the full debug_level is not an option than it would be good to use at least 3 (or 6).

# lsof -p `pidof sssd_be`
...
sssd_be 31496 root   23u  sock                0,6      0t0  7670577 can't identify protocol
sssd_be 31496 root   24u  sock                0,6      0t0  7668937 can't identify protocol
sssd_be 31496 root   25u  sock                0,6      0t0  7711331 can't identify protocol
sssd_be 31496 root   26u  sock                0,6      0t0  7672347 can't identify protocol
sssd_be 31496 root   27u  sock                0,6      0t0  7733537 can't identify protocol
sssd_be 31496 root   28u  sock                0,6      0t0  7675184 can't identify protocol
sssd_be 31496 root   29u  sock                0,6      0t0  7766793 can't identify protocol
sssd_be 31496 root   30u  sock                0,6      0t0  7743406 can't identify protocol
sssd_be 31496 root   31u  sock                0,6      0t0  7747472 can't identify protocol
sssd_be 31496 root   32u  sock                0,6      0t0  7789642 can't identify protocol
sssd_be 31496 root   33u  sock                0,6      0t0  7750980 can't identify protocol
sssd_be 31496 root   34u  sock                0,6      0t0  7804859 can't identify protocol
...

No, lsof -U does not show anything interesting. Looks like these are some strange sockets left behind.
My wild guess is that it is somehow also related to the ticket I reported earlier - regarding AD service discovery issues. Would it be possible that if connection to AD DC times out (i.e. firewall blocking access), we do not do a proper cleanup?

No, I can replicate it quite easily - actually right now I need to put in place a cron job restarting SSSD every week.

RH support case #01516475 opened for this

I can see many problems with resolving dnf records:

[sssd[be[default]]] [fo_resolve_service_timeout] (0x0080): Service resolving timeout reached

There is almost 400 problems within 5 days

$ grep fo_resolve_service_timeout sssd_default.log | wc -l
391

It can be a root of problem. I can try to simulate it.

Meanwhile, it would be good if you could identify when new fd is open in sssd_be and provide log files from that time. Because issues with network needn't cause fd leaks.

I can also see many problems related to nsupdate. If you do not need this feature. I would be good to disable it in domain section (@see man sssd-ad -> dyndns_update). So log file will not be flooded with unrelated failures.

Hi,
I have just restarted sssd and see new unidentified socket has been created shortly after daemon restart. One leak (by "leak" I mean socket where lsof can't identify protocol) is created immediately after daemon restart, second one at about Tue Oct 6 18:44:21 CEST 2015 (or say few seconds before this time).

See the log file attached.
As per dyndns - yes we would love to have this functional, but it is not working for some reason. Different story (maybe different ticket). I will disable it next time so the logs are more readable.

Update - I have just disabled DynDNS by
dyndns_update = false
ant it seems to me that it is not leaking any more. Will confirm tomorrow.

Ok, disabling dyndns seems to have helped a little bit, but not much. The trouble still persists.

I have a suspicion it's somehow related to moving from online to offline status.

Could you try to simulate it with sending signals to sssd process. (USR1, USR2) one signal force sssd to go online mode and another to offline. It would be good to know whether it causes FD leaks.

I'm still not able to reproduce it.

I found also something else in sssd, but it's not very related to the FD leaks.

Could you tell what are permissions of file /etc/krb5.conf

(Tue Oct  6 18:43:13 2015) [sssd[be[default]]] [sss_write_domain_mappings] (0x0200): Mapping file for domain [default] is [/var/lib/sss/pubconf/krb5.include.d/domain_realm_default]
(Tue Oct  6 18:43:13 2015) [sssd[be[default]]] [sss_krb5_touch_config] (0x0020): Unable to change mtime of "/etc/krb5.conf" [13]: Permission denied
(Tue Oct  6 18:43:13 2015) [sssd[be[default]]] [sss_write_domain_mappings] (0x0020): Unable to change last modification time of krb5.conf. Created mappings may not be loaded.
}}}

Attaching a fresh log with dyndns disabled.
Not sure where the permission problem comes from. I might have captured the previous log froma diskless machine. Diskless machines can not modify /etc/krb5.conf as this file is shared for all of them.

Tried the Kill -USR1 and -USR2 at about 14:01 today.
First attempt - yes, new FD leaked indeed. Second & third attempt to do the same (i.e. kill -USR1,2) - no further FD leaked.

After some time of analysis, I can see a FD leaks in log files.

grep "Using file descriptor" sssd_default.log.leak.last | head
(Thu Oct  8 10:21:19 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [22] for LDAP connection.
(Thu Oct  8 10:21:25 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [24] for LDAP connection.
(Thu Oct  8 10:21:31 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection.
(Thu Oct  8 10:21:31 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection.
(Thu Oct  8 10:21:34 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection.
(Thu Oct  8 10:21:35 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection.
(Thu Oct  8 10:22:41 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection.
(Thu Oct  8 10:24:06 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [26] for LDAP connection.
(Thu Oct  8 10:26:34 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [27] for LDAP connection.
(Thu Oct  8 10:26:34 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [28] for LDAP connection.

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.13.3
priority: major => critical

I checked some changes in related code between sssd-1-12 branch and sssd-1-13. There are few patches which are only in 1.13 branch.

Would it be possible for you to test 1.13 packages on testing environment?
https://copr.fedoraproject.org/coprs/lslebodn/sssd-1-13/

Tried. Yes the issue persists. Seem like I have been right, it is closely related to the issue I reported before (i.e. few DCs behind a firewall).
Look at the attached lsof trace over a time (10 mins intervals).
See how are the "SYN_SENT" connection attempts related to the FD leaked. Matches 100%.

Fields changed

status: new => assigned

I have a WIP patch for this leak but I found out a workaround with changing timeouts in sssd.conf.

The leak will disappear if the value of dns_resolver_timeout will be higher than value of ldap_network_timeout. If you didn't change the calue of these options than adding following snippet to domain section should help

dns_resolver_timeout = 7

Added, this option to the domain section - did not help (at least with 1.12.4).
sssd_be is still leaking sockets.

It helped for me. But I tested just with two sites.
I will send patch for review to upstream mailing list and then I can prepare test build for you.

Fields changed

patch: 0 => 1

Tried, it fixed the problem, indeed.
Good stuff!
Please let me know when I can expect the fix in RH-6 repo.

Awesome, moving up.

milestone: SSSD 1.13.3 => SSSD 1.13.2

Replying to [comment:27 ondrejv2]:

Tried, it fixed the problem, indeed.
Good stuff!
Please let me know when I can expect the fix in RH-6 repo.

So far it's planned for 6.8, please push in support case if you need the fix sooner...

Fields changed

resolution: => fixed
status: assigned => closed

Metadata Update from @ondrejv2:
- Issue assigned to lslebodn
- Issue set to the milestone: SSSD 1.13.2

2 years ago

Login to comment on this ticket.

Metadata