Regression looked to have been introduced in 0a05eab2483e6248a3e14a97c214e531828cd9be Refactored wait_for_startup() (part 2). I think we need to add requests.ReadTimeout alongside ConnectionError as a retry-able exception.
requests.ReadTimeout
ConnectionError
2018-03-20 16:38:04 pkispawn : INFO ....... executing 'systemctl start pki-tomcatd@pki-tomcat.service' 2018-03-20 16:38:11 pkispawn : INFO ........... FIPS mode is enabled on this operating system. 2018-03-20 16:38:11 pkispawn : INFO ........... checking http://vm-210.abc.idm.lab.eng.brq.redhat.com:8080/ca 2018-03-20 16:38:12 pkispawn : INFO ........... waiting for server to start (1s) 2018-03-20 16:38:13 pkispawn : INFO ........... waiting for server to start (2s) 2018-03-20 16:38:29 pkispawn : DEBUG ....... Error Type: ReadTimeout 2018-03-20 16:38:29 pkispawn : DEBUG ....... Error Message: HTTPConnectionPool(host='vm-210.abc.idm.lab.eng.brq.redhat.com', port=8080): Read timed out. (read timeout=15) 2018-03-20 16:38:29 pkispawn : DEBUG ....... File "/usr/lib/python3.6/site-packages/pki/server/pkispawn.py", line 533, in main scriptlet.spawn(deployer) File "/usr/lib/python3.6/site-packages/pki/server/deployment/scriptlets/configuration.py", line 1268, in spawn secure_connection=False, File "/usr/lib/python3.6/site-packages/pki/server/deployment/pkihelper.py", line 1081, in wait_for_startup timeout=request_timeout, File "/usr/lib/python3.6/site-packages/pki/server/deployment/pkihelper.py", line 1024, in get_instance_status response = client.get_status(timeout=timeout) File "/usr/lib/python3.6/site-packages/pki/system.py", line 298, in get_status timeout=timeout, File "/usr/lib/python3.6/site-packages/pki/client.py", line 46, in wrapper return func(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/pki/client.py", line 160, in get timeout=timeout, File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 521, in get return self.request('GET', url, **kwargs) File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 521, in send raise ReadTimeout(e, request=request)
Metadata Update from @mharmsen: - Custom field component adjusted to None - Custom field feature adjusted to None - Custom field origin adjusted to None - Custom field proposedmilestone adjusted to None - Custom field proposedpriority adjusted to None - Custom field reviewer adjusted to None - Custom field type adjusted to None - Custom field version adjusted to None - Issue set to the milestone: 0.0 NEEDS_TRIAGE
Metadata Update from @edewata: - Issue priority set to: blocker - Issue set to the milestone: 10.6 (was: 0.0 NEEDS_TRIAGE)
Possibly related to this IPA ticket: https://pagure.io/freeipa/issue/7425
It makes sense to catch ReadTimeout, too. A ReadTimeout occurs, when TCP handshake is successful, but the actual socket read operation times out because the server doesn't respond fast enough. I think it happens when the server has opened the port for listening and it either is not accepting connection yet or not responding fast enough.
ReadTimeout
I suggest you define RETRYABLE_EXCEPTIONS = (requests.exceptions.Timeout, requests.exceptions.ConnectionError) and use the tuple instead of ConnectionError. Python allows you to catch a tuple of exceptions. Timeout also catches ConnectionTimeout.
RETRYABLE_EXCEPTIONS = (requests.exceptions.Timeout, requests.exceptions.ConnectionError)
Timeout
ConnectionTimeout
ConnectionError` is caught inbase/server/python/pki/server/deployment/pkihelper.pyand multiple times inbase/server/python/pki/server/pkispawn.py``.
ConnectionError` is caught in
and multiple times in
https://review.gerrithub.io/c/405344/ should take care of the problem
Fixed in master: 353de17f540a12107509f4b06f207d5d000ff4dc
353de17f540a12107509f4b06f207d5d000ff4dc
Metadata Update from @ftweedal: - Assignee reset
Metadata Update from @ftweedal: - Issue assigned to cheimes - Issue close_status updated to: fixed
Metadata Update from @edewata: - Issue set to the milestone: 10.6.0 (was: 10.6)
Dogtag PKI is moving from Pagure issues to GitHub issues. This means that existing or new issues will be reported and tracked through Dogtag PKI's GitHub Issue tracker.
This issue has been cloned to GitHub and is available here: https://github.com/dogtagpki/pki/issues/3091
If you want to receive further updates on the issue, please navigate to the GitHub issue and click on Subscribe button.
Subscribe
Thank you for understanding, and we apologize for any inconvenience.
Login to comment on this ticket.