Learn more about these different git repos.
Other Git URLs
After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets:
[root@login05 ondrejv]# ls -l /proc/`pgrep sssd_be`/fd/ total 0 lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170] lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463] lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068] lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757] lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564] lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886] lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591] lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626] lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492] lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970] l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278] lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487] lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428] lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435] lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714] lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683] lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893] lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567] lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260] lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973] lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll] lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203] lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282] lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422] lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015] lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826] lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375] lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942] lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722] lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106] lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357] lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172] lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954] lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238] lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686] lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663] lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231] lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278] lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921] lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007] lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269] lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983] lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178] lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570] lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094] lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349] lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746] lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051] lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496] lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267] lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692] lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812] lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986] lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005] lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136] lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383] lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227] lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180] lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545] lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173] lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932] lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043] lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523] lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990] lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320] lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302] lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603] lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353] lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837] lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206] lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641] lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790] lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272] lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490] l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828] lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082] lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370] lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300] lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237] lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551] ....
Fields changed
description: After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets: [root@login05 ondrejv]# ls -l /proc/pgrep sssd_be/fd/ total 0 lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170] lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463] lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068] lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757] lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564] lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886] lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591] lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626] lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492] lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970] l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278] lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487] lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428] lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435] lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714] lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683] lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893] lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567] lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260] lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973] lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll] lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203] lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282] lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422] lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015] lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826] lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375] lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942] lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722] lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106] lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357] lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172] lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954] lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238] lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686] lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663] lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231] lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278] lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921] lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007] lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269] lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983] lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178] lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570] lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094] lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349] lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746] lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051] lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496] lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267] lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692] lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812] lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986] lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005] lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136] lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383] lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227] lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180] lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545] lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173] lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932] lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043] lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523] lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990] lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320] lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302] lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603] lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353] lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837] lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206] lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641] lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790] lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272] lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490] l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828] lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082] lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370] lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300] lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237] lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551] .... => After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets: {{{ [root@login05 ondrejv]# ls -l /proc/pgrep sssd_be/fd/ total 0 lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170] lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463] lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068] lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757] lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564] lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886] lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591] lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626] lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492] lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970] l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278] lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487] lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428] lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435] lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714] lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683] lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893] lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567] lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260] lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973] lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll] lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203] lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282] lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422] lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015] lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826] lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375] lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942] lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722] lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106] lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357] lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172] lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954] lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238] lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686] lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663] lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231] lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278] lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921] lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007] lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269] lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983] lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178] lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570] lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094] lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349] lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746] lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051] lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496] lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267] lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692] lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812] lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986] lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005] lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136] lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383] lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227] lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180] lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545] lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173] lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932] lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043] lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523] lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990] lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320] lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302] lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603] lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353] lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837] lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206] lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641] lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790] lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272] lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490] l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828] lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082] lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370] lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300] lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237] lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551] .... }}}
pgrep sssd_be
Do these sockets correspond with a high number of open files in /var/lib/sss/pipes ?
/var/lib/sss/pipes
No:
[root@login05 ondrejv]# ls -lR /var/lib/sss/pipes /var/lib/sss/pipes: total 4 srw-rw-rw- 1 root root 0 Aug 12 17:39 autofs srw-rw-rw- 1 root root 0 Aug 12 17:39 nss srw-rw-rw- 1 root root 0 Aug 12 17:39 pam drwx------. 2 root root 4096 Aug 12 17:39 private /var/lib/sss/pipes/private: total 0 srw------- 1 root root 0 Aug 12 17:39 pam lrwxrwxrwx 1 root root 48 Aug 12 17:39 sbus-dp_default -> /var/lib/sss/pipes/private/sbus-dp_default.30794 srw------- 1 root root 0 Aug 12 17:39 sbus-dp_default.30794 srw------- 1 root root 0 Aug 12 17:39 sbus-monitor
owner: somebody => lslebodn
Hi, shall I file an official RH support request for this one? It would be nice if we had it fixed for next update.
Replying to [comment:5 ondrejv2]:
Yes please.
I assume that all unclosed unix socket are to /var/lib/sss/pipes/private/sbus-dp_default.30794. It could be proven with command "lsof -U | grep sssd_be".
It's not clear to me from code how could it happen. There might be some debug messages which could help to reveal the bug. You wrote on sssd-users that you do not have a deterministic reproducer. Would it be possible to enable debugging in domain process? If the full debug_level is not an option than it would be good to use at least 3 (or 6).
# lsof -p `pidof sssd_be` ... sssd_be 31496 root 23u sock 0,6 0t0 7670577 can't identify protocol sssd_be 31496 root 24u sock 0,6 0t0 7668937 can't identify protocol sssd_be 31496 root 25u sock 0,6 0t0 7711331 can't identify protocol sssd_be 31496 root 26u sock 0,6 0t0 7672347 can't identify protocol sssd_be 31496 root 27u sock 0,6 0t0 7733537 can't identify protocol sssd_be 31496 root 28u sock 0,6 0t0 7675184 can't identify protocol sssd_be 31496 root 29u sock 0,6 0t0 7766793 can't identify protocol sssd_be 31496 root 30u sock 0,6 0t0 7743406 can't identify protocol sssd_be 31496 root 31u sock 0,6 0t0 7747472 can't identify protocol sssd_be 31496 root 32u sock 0,6 0t0 7789642 can't identify protocol sssd_be 31496 root 33u sock 0,6 0t0 7750980 can't identify protocol sssd_be 31496 root 34u sock 0,6 0t0 7804859 can't identify protocol ...
No, lsof -U does not show anything interesting. Looks like these are some strange sockets left behind. My wild guess is that it is somehow also related to the ticket I reported earlier - regarding AD service discovery issues. Would it be possible that if connection to AD DC times out (i.e. firewall blocking access), we do not do a proper cleanup?
No, I can replicate it quite easily - actually right now I need to put in place a cron job restarting SSSD every week.
RH support case #01516475 opened for this
I can see many problems with resolving dnf records:
[sssd[be[default]]] [fo_resolve_service_timeout] (0x0080): Service resolving timeout reached
There is almost 400 problems within 5 days
$ grep fo_resolve_service_timeout sssd_default.log | wc -l 391
It can be a root of problem. I can try to simulate it.
Meanwhile, it would be good if you could identify when new fd is open in sssd_be and provide log files from that time. Because issues with network needn't cause fd leaks.
I can also see many problems related to nsupdate. If you do not need this feature. I would be good to disable it in domain section (@see man sssd-ad -> dyndns_update). So log file will not be flooded with unrelated failures.
sssd log sssd_default.log.test.gz
Hi, I have just restarted sssd and see new unidentified socket has been created shortly after daemon restart. One leak (by "leak" I mean socket where lsof can't identify protocol) is created immediately after daemon restart, second one at about Tue Oct 6 18:44:21 CEST 2015 (or say few seconds before this time).
See the log file attached. As per dyndns - yes we would love to have this functional, but it is not working for some reason. Different story (maybe different ticket). I will disable it next time so the logs are more readable.
Update - I have just disabled DynDNS by dyndns_update = false ant it seems to me that it is not leaking any more. Will confirm tomorrow.
Ok, disabling dyndns seems to have helped a little bit, but not much. The trouble still persists.
I have a suspicion it's somehow related to moving from online to offline status.
Could you try to simulate it with sending signals to sssd process. (USR1, USR2) one signal force sssd to go online mode and another to offline. It would be good to know whether it causes FD leaks.
I'm still not able to reproduce it.
I found also something else in sssd, but it's not very related to the FD leaks.
Could you tell what are permissions of file /etc/krb5.conf
(Tue Oct 6 18:43:13 2015) [sssd[be[default]]] [sss_write_domain_mappings] (0x0200): Mapping file for domain [default] is [/var/lib/sss/pubconf/krb5.include.d/domain_realm_default] (Tue Oct 6 18:43:13 2015) [sssd[be[default]]] [sss_krb5_touch_config] (0x0020): Unable to change mtime of "/etc/krb5.conf" [13]: Permission denied (Tue Oct 6 18:43:13 2015) [sssd[be[default]]] [sss_write_domain_mappings] (0x0020): Unable to change last modification time of krb5.conf. Created mappings may not be loaded. }}}
attachment sssd_default.log.leak.gz
Attaching a fresh log with dyndns disabled. Not sure where the permission problem comes from. I might have captured the previous log froma diskless machine. Diskless machines can not modify /etc/krb5.conf as this file is shared for all of them.
Tried the Kill -USR1 and -USR2 at about 14:01 today. First attempt - yes, new FD leaked indeed. Second & third attempt to do the same (i.e. kill -USR1,2) - no further FD leaked.
After some time of analysis, I can see a FD leaks in log files.
grep "Using file descriptor" sssd_default.log.leak.last | head (Thu Oct 8 10:21:19 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [22] for LDAP connection. (Thu Oct 8 10:21:25 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [24] for LDAP connection. (Thu Oct 8 10:21:31 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection. (Thu Oct 8 10:21:31 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection. (Thu Oct 8 10:21:34 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection. (Thu Oct 8 10:21:35 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection. (Thu Oct 8 10:22:41 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [25] for LDAP connection. (Thu Oct 8 10:24:06 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [26] for LDAP connection. (Thu Oct 8 10:26:34 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [27] for LDAP connection. (Thu Oct 8 10:26:34 2015) [sssd[be[default]]] [sss_ldap_init_send] (0x4000): Using file descriptor [28] for LDAP connection.
milestone: NEEDS_TRIAGE => SSSD 1.13.3 priority: major => critical
I checked some changes in related code between sssd-1-12 branch and sssd-1-13. There are few patches which are only in 1.13 branch.
Would it be possible for you to test 1.13 packages on testing environment? https://copr.fedoraproject.org/coprs/lslebodn/sssd-1-13/
attachment lsof.txt.gz
Tried. Yes the issue persists. Seem like I have been right, it is closely related to the issue I reported before (i.e. few DCs behind a firewall). Look at the attached lsof trace over a time (10 mins intervals). See how are the "SYN_SENT" connection attempts related to the FD leaked. Matches 100%.
status: new => assigned
I have a WIP patch for this leak but I found out a workaround with changing timeouts in sssd.conf.
The leak will disappear if the value of dns_resolver_timeout will be higher than value of ldap_network_timeout. If you didn't change the calue of these options than adding following snippet to domain section should help
dns_resolver_timeout = 7
Added, this option to the domain section - did not help (at least with 1.12.4). sssd_be is still leaking sockets.
It helped for me. But I tested just with two sites. I will send patch for review to upstream mailing list and then I can prepare test build for you.
patch: 0 => 1
Could you try to test packages from following URL? https://lslebodn.fedorapeople.org/fd_leak_repo/
Tried, it fixed the problem, indeed. Good stuff! Please let me know when I can expect the fix in RH-6 repo.
Awesome, moving up.
milestone: SSSD 1.13.3 => SSSD 1.13.2
Replying to [comment:27 ondrejv2]:
So far it's planned for 6.8, please push in support case if you need the fix sooner...
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1274633
rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=1274633 1274633]
master:
sssd-1-13:
sssd-1-12:
resolution: => fixed status: assigned => closed
Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1313014 (Red Hat Enterprise Linux 7)
rhbz: [https://bugzilla.redhat.com/show_bug.cgi?id=1274633 1274633] => [https://bugzilla.redhat.com/show_bug.cgi?id=1274633 1274633], [https://bugzilla.redhat.com/show_bug.cgi?id=1313014 1313014]
Metadata Update from @ondrejv2: - Issue assigned to lslebodn - Issue set to the milestone: SSSD 1.13.2
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/3833
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Log in to comment on this ticket.