#6651 [vault] improve performance
Opened 2 years ago by cheimes. Modified 10 months ago

For productization of IPA vault as secure storage backend for Custodia, we need fast and scalable vault. At the moment performance of vault_retrieve is too low and vault_retrieve calls take too long.

I created a very simple benchmark script. It simulates a small size setup in which 10 containers are start at the same time and request secrets for databases. The scripts runs 10 parallel subprocesses that request 3 secrets each. The test server is a freeipa-server-4.4.2-1.fc24.x86_64 installation in a Vagrant box with 2 CPUs. During the benchmark, load stays below 0.7.

Benchmark result

Parallel request: 10
vault retrieve each: 3
Time per worker:
     connect: 0.666sec / retrieve: 7.792sec 
     connect: 0.667sec / retrieve: 7.588sec 
     connect: 0.261sec / retrieve: 7.551sec 
     connect: 0.307sec / retrieve: 7.702sec 
     connect: 0.259sec / retrieve: 7.226sec 
     connect: 0.328sec / retrieve: 7.557sec 
     connect: 0.639sec / retrieve: 7.788sec 
     connect: 0.347sec / retrieve: 7.806sec 
     connect: 0.334sec / retrieve: 7.362sec 
     connect: 0.283sec / retrieve: 6.479sec 
total time 13.464sec

profiling

In a later I hooked up werkzeug's WSGI profiler to /usr/share/ipa/wsgi.py

    # dnf install python-werkzeug
    # mkdir /var/run/httpd/profile; chown apache:apache /var/run/httpd/profile
    from werkzeug.contrib.profiler import ProfilerMiddleware
    application = ProfilerMiddleware(application, profile_dir='/var/run/httpd/profile')

and analyzed performance

>>> import glob, pstats
>>> stat = pstats.Stats(*glob.glob('/var/run/httpd/profile/*.prof'))
>>> stat.sort_stats('tottime')
<pstats.Stats instance at 0x7f3f752baab8>
>>> stat.print_stats(50)
...
         13420440 function calls (10871945 primitive calls) in 56.277 seconds

   Ordered by: internal time
   List reduced from 1428 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4106   17.769    0.004   17.769    0.004 {method 'read' of '_ssl._SSLSocket' objects}
      299   11.234    0.038   11.236    0.038 {built-in method sasl_interactive_bind_s}
      176    8.710    0.049    8.710    0.049 {method 'do_handshake' of '_ssl._SSLSocket' objects}
     2713    2.654    0.001    2.654    0.001 {built-in method result4}
6838/6539    1.026    0.000    1.027    0.000 {repr}
  2790767    0.811    0.000    0.926    0.000 {isinstance}
2495797/519934    0.776    0.000    1.205    0.000 /usr/lib64/python2.7/json/encoder.py:341(_iterencode_dict)
      598    0.682    0.001    0.685    0.001 {gssapi.raw.ext_cred_store.acquire_cred_from}
     1354    0.645    0.000    0.645    0.000 {built-in method search_ext}
    11977    0.572    0.000    0.603    0.000 /usr/lib64/python2.7/site-packages/ldap/schema/tokenizer.py:10(split_tokens)
247833/543    0.302    0.000    1.777    0.003 /usr/lib/python2.7/site-packages/ipalib/util.py:58(json_serialize)
      613    0.300    0.000    0.300    0.000 {method 'write' of '_ssl._SSLSocket' objects}
138038/299    0.274    0.000    0.523    0.002 /usr/lib/python2.7/site-packages/ipalib/rpc.py:276(json_encode_binary)
     3514    0.273    0.000    0.624    0.000 /usr/lib/python2.7/site-packages/ipalib/parameters.py:433(__init__)
      601    0.259    0.000    0.259    0.000 {method 'sendall' of '_socket.socket' objects}
    62744    0.220    0.000    0.324    0.000 {method 'decode' of 'str' objects}
584832/519934    0.190    0.000    1.376    0.000 /usr/lib64/python2.7/json/encoder.py:417(_iterencode)
536499/486766    0.176    0.000    0.562    0.000 /usr/lib64/python2.7/json/encoder.py:288(_iterencode_list)
      896    0.151    0.000    0.153    0.000 {gssapi.raw.creds.inquire_cred}
     4733    0.146    0.000    0.462    0.000 /usr/lib/python2.7/site-packages/ipalib/parameters.py:946(__json__)
   219189    0.141    0.000    0.151    0.000 {hasattr}
      299    0.137    0.000    0.137    0.000 {built-in method unbind_ext}
      182    0.130    0.001    0.130    0.001 {method 'connect' of '_socket.socket' objects}
39967/3842    0.125    0.000    0.280    0.000 /usr/lib64/python2.7/copy.py:145(deepcopy)
     9842    0.119    0.000    0.178    0.000 /usr/lib64/python2.7/site-packages/ldap/schema/models.py:257(_set_attrs)
   368717    0.116    0.000    0.120    0.000 {getattr}
     1209    0.111    0.000    0.111    0.000 {method 'recv' of '_socket.socket' objects}
     6166    0.099    0.000   14.845    0.002 /usr/lib64/python2.7/site-packages/ldap/ldapobject.py:87(_ldap_call)
        1    0.091    0.091    0.190    0.190 /usr/lib/python2.7/site-packages/ipaserver/plugins/schema.py:753(_calculate_fingerprint)
    11977    0.089    0.000    0.171    0.000 /usr/lib64/python2.7/site-packages/ldap/schema/tokenizer.py:52(extract_tokens)
      472    0.088    0.000    1.479    0.003 /usr/lib64/python2.7/json/encoder.py:186(encode)
     9038    0.087    0.000    2.171    0.000 /usr/lib/python2.7/site-packages/ipapython/ipaldap.py:353(_add_attr_name)
   370437    0.087    0.000    0.087    0.000 {method 'get' of 'dict' objects}
        7    0.087    0.012    1.251    0.179 /usr/lib64/python2.7/site-packages/ldap/schema/subentry.py:81(__init__)
1347/1079    0.087    0.000    5.859    0.005 /usr/lib/python2.7/site-packages/ipapython/ipaldap.py:1314(find_entries)
      299    0.076    0.000   12.005    0.040 /usr/lib/python2.7/site-packages/ipaserver/plugins/ldap2.py:123(create_connection)
   143579    0.067    0.000    0.067    0.000 /usr/lib/python2.7/site-packages/ipalib/base.py:123(__setattr__)
    60658    0.067    0.000    0.067    0.000 {_codecs.utf_8_decode}
    23131    0.066    0.000    0.116    0.000 /usr/lib64/python2.7/abc.py:128(__instancecheck__)
   394038    0.065    0.000    0.065    0.000 {method 'append' of 'list' objects}
   113350    0.064    0.000    0.109    0.000 {setattr}
  926/299    0.064    0.000   39.159    0.131 /usr/lib/python2.7/site-packages/ipalib/frontend.py:451(__do_call)
      294    0.063    0.000   52.216    0.178 /usr/share/ipa/wsgi.py:61(application)
    15529    0.061    0.000    0.193    0.000 /usr/lib/python2.7/site-packages/ipapython/dn.py:1111(_rdns_from_value)
    35819    0.061    0.000    0.117    0.000 /usr/lib64/python2.7/site-packages/ldap/schema/subentry.py:206(getoid)
   218252    0.058    0.000    0.058    0.000 {method 'lower' of 'str' objects}
     8054    0.056    0.000   17.836    0.002 /usr/lib64/python2.7/socket.py:410(readline)
    14579    0.055    0.000    0.085    0.000 {sorted}
    39967    0.052    0.000    0.074    0.000 /usr/lib64/python2.7/copy.py:267(_keep_alive)
      300    0.051    0.000   56.206    0.187 /usr/lib/python2.7/site-packages/ipaserver/rpcserver.py:270(route)

Let's figure out how to improve vault performance.


simple vault perftest script
perftest.py

Related tickets:

== FreeIPA

== Dogtag

Some comments:

vault retrieve is my primary concern. vault add, store and delete are also slow but not as performance critical as retrieval. Let's focus on either retrieval or general vault / IPA client performance improvements first. Once retrieval is fast, we can move to store.

Most vault operations depend on the result of a previous operation. This makes it impossible to benefit from batch operations. It might be possible to batch some operations, but not for vault retrieval.

The SSL operations from WSGI profiling are TLS/SSL connections from IPA framework to Dogtag's REST API. SSL and network operations between IPA client and IPA framework do not show up in the profile because they are handled by Apache and mod_nss. Dogtag's Python API can handle HTTP keep-alive. Internally a pki.client.PKIConnection object holds a requests.Session instance. Python requests automatically performs keep-alive for connections from the same session. The Dogtag plugin does not keep PKIConnection alive between framework calls. It might be feasible to keep the connection alive. I'm a bit worried how this plays along with forking WSGI server (see #6563).

Dogtag supports connection keep-alive (look for Re-using existing connection!):

# curl --keepalive 5 -v --cacert /etc/ipa/ca.crt https://master.ipa.example:8443 https://master.ipa.example:8443 >/dev/null
* Rebuilt URL to: 5/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 0.0.0.5...
* Immediate connect fail for 0.0.0.5: Invalid argument
* Closing connection 0
curl: (7) Couldn't connect to server
* Rebuilt URL to: https://master.ipa.example:8443/
*   Trying 192.168.121.57...
* Connected to master.ipa.example (192.168.121.57) port 8443 (#1)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/ipa/ca.crt
  CApath: none
* NSS: client certificate not found (nickname not specified)
* ALPN/NPN, server did not agree to a protocol
* SSL connection using TLS_RSA_WITH_AES_128_CBC_SHA
* Server certificate:
*       subject: CN=master.ipa.example,O=IPA.EXAMPLE
*       start date: Jan 09 14:50:35 2017 GMT
*       expire date: Dec 30 14:50:35 2018 GMT
*       common name: master.ipa.example
*       issuer: CN=Certificate Authority,O=IPA.EXAMPLE
> GET / HTTP/1.1
> Host: master.ipa.example:8443
> User-Agent: curl/7.47.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Set-Cookie: JSESSIONID=3010A8CC145EAA9EAC5069689104CEEC; Path=/; Secure; HttpOnly
< Content-Type: text/html;charset=ISO-8859-1
< Content-Length: 3183
< Date: Thu, 09 Feb 2017 09:25:57 GMT
< 
{ [3183 bytes data]
100  3183  100  3183    0     0  62232      0 --:--:-- --:--:-- --:--:-- 62232
* Connection #1 to host master.ipa.example left intact
* Rebuilt URL to: https://master.ipa.example:8443/
* Found bundle for host master.ipa.example: 0x55a9326923e0 [can pipeline]
* Re-using existing connection! (#1) with host master.ipa.example
* Connected to master.ipa.example (192.168.121.57) port 8443 (#1)
> GET / HTTP/1.1
> Host: master.ipa.example:8443
> User-Agent: curl/7.47.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Set-Cookie: JSESSIONID=96885044D4FA5A5C5EAE5550C02AEF3B; Path=/; Secure; HttpOnly
< Content-Type: text/html;charset=ISO-8859-1
< Content-Length: 3183
< Date: Thu, 09 Feb 2017 09:25:57 GMT
< 
{ [3183 bytes data]
100  3183  100  3183    0     0  2396k      0 --:--:-- --:--:-- --:--:-- 2396k
* Connection #1 to host master.ipa.example left intact

https://jiffyclub.github.io/snakeviz/ is another useful tool to analyze cprofile stats. It can only analyzse one file. Multiple profile files can be combined with the pstats module:

>>> import glob, pstats
>>> stat = pstats.Stats(*glob.glob('profile/*.prof'))
>>> stat.dump_stats('combined.prof')

Marking as critical, but only tickets in blocking should have the priority. Not the full list in comment 1.

Metadata Update from @cheimes:
- Issue assigned to someone
- Issue set to the milestone: FreeIPA 4.5

2 years ago

Metadata Update from @cheimes:
- Custom field affects_doc reset
- Custom field blockedby reset
- Custom field component reset
- Custom field keywords reset
- Custom field rhbz reset
- Custom field type reset
- Issue close_status updated to: None
- Issue set to the milestone: None (was: FreeIPA 4.5)
- Issue tagged with: integration

2 years ago

Metadata Update from @pvoborni:
- Custom field affects_doc reset
- Custom field tester adjusted to wanted
- Issue set to the milestone: FreeIPA 4.5

2 years ago

Metadata Update from @mbasti:
- Issue set to the milestone: FreeIPA 4.5.1 (was: FreeIPA 4.5)

2 years ago

Metadata Update from @pvoborni:
- Issue set to the milestone: FreeIPA 4.7 (was: FreeIPA 4.5.1)

2 years ago

Vault performance is currently not on the priority list.

Metadata Update from @cheimes:
- Issue priority set to: low (was: important)
- Issue set to the milestone: Future Releases (was: FreeIPA 4.7)

10 months ago

Login to comment on this ticket.

Metadata