#24 Handle the GET tags view by front ending and caching the OSBS registry.
Closed 6 years ago by bowlofeggs. Opened 7 years ago by bowlofeggs.
bowlofeggs/fegistry 4  into  master

@@ -5,12 +5,14 @@ 

        state: present

    with_items:

        - git

+       - python3-dogpile-cache

        - python3-flake8

        - python3-flask

        - python3-mock

        - python3-nose

        - python3-nose-cov

        - python3-PyYAML

+       - python3-requests

        - python3-sphinx

  

  - name: Install the .bashrc

file modified
+14 -1
@@ -12,6 +12,19 @@ 

  that you can use in the config file. In addition to these settings, fegistry defines the following

  settings:

  

+ * ``BACKEND_REGISTRY``: The URL to the backend OSBS registry where the stable container images are

+   stored. fegistry will make requests against this registry for any tags or manifests that are not

+   found in fegistry's cache. Default: ``https://stable-registry.fedoraproject.org``.

+ * ``DOGPILE_CACHE_BACKEND``: The dogpile.cache backend you wish you use. You can peruse the

+   dogpile.cache documentation for the list of

+   `supported backends <http://dogpilecache.readthedocs.io/en/latest/api.html#memory-backends>`_.

+   Default: ``dogpile.cache.memory_pickle``.

+ * ``DOGPILE_CACHE_EXPIRATION_TIME``: The time in seconds that values should remain in the cache.

+   Default: ``86400``.

+ * ``DOGPILE_CACHE_BACKEND_ARGUMENTS``: An associative array of settings that are relevant to the

+   dogpile.cache backend selected in ``DOGPILE_CACHE_BACKEND``. You can peruse the dogpile.cache

+   `documentation <http://dogpilecache.readthedocs.io/en/latest/api.html#memory-backends>`_ for the

+   relevant options for your chosen backend. Default: ``{}``.

  * ``LOG_LEVEL``: The logging level that fegistry should use when logging to syslog. This accepts any

    of the standard Python `logging levels <https://docs.python.org/3/library/logging.html#levels>`_,

-   and it will even upper case for you if you don't like shouting.

+   and it will even upper case for you if you don't like shouting. Default: ``WARNING``.

file modified
+4
@@ -1,2 +1,6 @@ 

  ---

+ # BACKEND_REGISTRY: https://stable-registry.fedoraproject.org

+ # DOGPILE_CACHE_BACKEND: dogpile.cache.memory_pickle

+ # DOGPILE_CACHE_EXPIRATION_TIME: 86400

+ # DOGPILE_CACHE_BACKEND_ARGUMENTS: {}

  # LOG_LEVEL: warning

file modified
+4 -1
@@ -22,7 +22,10 @@ 

  import yaml

  

  

- _DEFAULT_CONFIG = {'LOG_LEVEL': 'WARNING'}

+ _DEFAULT_CONFIG = {

+     'BACKEND_REGISTRY': 'https://stable-registry.fedoraproject.org',

To be honest, I'm not sure I really like this default. The problem is that if someone deploys their own, and forgets to configure, they get a real (working) registry that gives us lots of requests.
I would rather have this use http://example.com/ or the like, and have people explicitly configure it.

+     'DOGPILE_CACHE_BACKEND': 'dogpile.cache.memory_pickle', 'DOGPILE_CACHE_EXPIRATION_TIME': 86400,

Perhaps it's an idea to just do one per line, to make it easier to read at a glance?

+     'DOGPILE_CACHE_BACKEND_ARGUMENTS': {}, 'LOG_LEVEL': 'WARNING'}

  

  

  def load(app):

@@ -16,9 +16,11 @@ 

  """This test suite contains tests on fegistry.views."""

  

  import json

+ import mock

  import unittest

  

  import flask

+ import requests

  

  from fegistry import views

  
@@ -30,6 +32,17 @@ 

          self.app = views.app.test_client()

  

  

+ class CachedViewTestCase(ViewsTestCase):

+     """A superclass for testing cached views. It invalidates the cache before and after the test."""

+     def setUp(self):

+         super(CachedViewTestCase, self).setUp()

+         views.region.invalidate(hard=True)

+ 

+     def tearDown(self):

+         super(CachedViewTestCase, self).tearDown()

+         views.region.invalidate(hard=True)

+ 

+ 

  class TestAddDockerHeaders(unittest.TestCase):

      """This test class contains tests on the add_docker_headers() function."""

      def test_add_docker_headers(self):
@@ -41,6 +54,135 @@ 

          self.assertEqual(response.headers['Docker-Distribution-API-Version'], 'registry/2.0')

  

  

+ class TestGetFromBackendRegistry(CachedViewTestCase):

+     """This test class contains tests for the _get_from_backend_registry() function."""

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com:123/'})

+     @mock.patch('fegistry.views.requests.get')

+     def test_200(self, get):

+         """

+         Ensure correct operation of the _get_from_backend_registry() function when the backend gives

+         a 200 status code.

+         """

+         get.return_value = requests.Response()

+         get.return_value._content = \

+             b'{"name":"fedora","tags":["24","latest","25","26","rawhide"]}\n'

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '61',

+             'Content-Type': 'application/json; charset=utf-8',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 200

+ 

+         text, status_code, headers = views._get_from_backend_registry('/v2/fedora/tags/list')

+ 

+         get.assert_called_once_with('http://example.com:123/v2/fedora/tags/list')

+         self.assertEqual(status_code, 200)

+         self.assertEqual(json.loads(text),

+                          {'name': 'fedora', 'tags': ['24', 'latest', '25', '26', 'rawhide']})

+         expected_headers = {'Content-Type': 'application/json; charset=utf-8'}

+         self.assertEqual(headers, expected_headers)

+ 

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com/'})

A different backend registry url than the first test?

Yeah I threw in a few different formats to exercise the urljoin a bit, just for variety.

+     @mock.patch('fegistry.views.requests.get')

I think what you've done here is fine, but if you're in a situation in the future where you want to mock out HTTP requests you might want to look at vcrpy or betamax which both record real HTTP requests and let you play them back in your tests.

+     def test_404(self, get):

+         """

+         Ensure correct operation of the _get_from_backend_registry() function when the backend

+         gives a 404 status code.

+         """

+         get.return_value = requests.Response()

+         get.return_value._content = (

+             b'{"errors":[{"code":"NAME_UNKNOWN","message":"repository name not known to registry",'

+             b'"detail":{"name":"meaning/of/life"}}]}\n')

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '123',

+             'Content-Type': 'application/json; charset=utf-8',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 404

+ 

+         text, status_code, headers = views._get_from_backend_registry(

+             '/v2/meaning/of/life/manifests/latest')

+ 

+         get.assert_called_once_with('http://example.com/v2/meaning/of/life/manifests/latest')

+         self.assertEqual(status_code, 404)

+         self.assertEqual(

+             json.loads(text),

+             {'errors': [{'code': 'NAME_UNKNOWN', 'message': 'repository name not known to registry',

+                          'detail': {'name': 'meaning/of/life'}}]})

+         expected_headers = {'Content-Type': 'application/json; charset=utf-8'}

+         self.assertEqual(headers, expected_headers)

+ 

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com:1234'})

+     @mock.patch('fegistry.views.requests.get')

+     def test_cache(self, get):

+         """Ensure that the function caches responses."""

+         get.return_value = requests.Response()

+         get.return_value._content = \

+             b'{"name":"fedora","tags":["24","latest","25","26","rawhide"]}\n'

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '61',

+             'Content-Type': 'application/json; charset=utf-8',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 200

+         # Make one call to the function to get the response cached.

+         views._get_from_backend_registry('/v2/fedora/tags/list')

+         # Now let's alter what the backend would return so we can make sure the cached version is

+         # used.

+         get.return_value._content = b'This should not get used yet because the cache should be hit'

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({'Not': 'Yet'})

+ 

+         text, status_code, headers = views._get_from_backend_registry('/v2/fedora/tags/list')

+ 

+         get.assert_called_once_with('http://example.com:1234/v2/fedora/tags/list')

+         self.assertEqual(status_code, 200)

+         self.assertEqual(json.loads(text),

+                          {'name': 'fedora', 'tags': ['24', 'latest', '25', '26', 'rawhide']})

+         expected_headers = {'Content-Type': 'application/json; charset=utf-8'}

+         self.assertEqual(headers, expected_headers)

+ 

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com:1234'})

+     @mock.patch('fegistry.views.requests.get')

+     def test_content_type_header_not_present(self, get):

+         """

+         Ensure correct operation of the list_tags view when the backend doesn't give us a

+         Content-Type header.

+         """

+         get.return_value = requests.Response()

+         get.return_value._content = \

+             b'{"name":"fedora/cockpit","tags":["24","latest","25","26","rawhide"]}\n'

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '69',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 200

+ 

+         text, status_code, headers = views._get_from_backend_registry(

+             '/v2/fedora/cockpit/manifests/latest')

+ 

+         get.assert_called_once_with('http://example.com:1234/v2/fedora/cockpit/manifests/latest')

+         self.assertEqual(status_code, 200)

+         self.assertEqual(

+             json.loads(text),

+             {'name': 'fedora/cockpit', 'tags': ['24', 'latest', '25', '26', 'rawhide']})

+         # Even though the backend didn't give us the Content-Type header, the function should have

+         # added it for us.

+         expected_headers = {'Content-Type': 'application/json; charset=utf-8'}

+         self.assertEqual(headers, expected_headers)

+ 

+ 

  class Testv2(ViewsTestCase):

      """This test class tests the v2() function."""

      def test_v2(self):
@@ -51,3 +193,98 @@ 

          self.assertEqual(json.loads(response.get_data().decode('utf-8')), {})

          self.assertEqual(response.headers['Docker-Distribution-API-Version'], 'registry/2.0')

          self.assertEqual(response.headers['Content-Type'], 'application/json')

+ 

+ 

+ class TestListTags(CachedViewTestCase):

+     """This test class contains tests for the list_tags() function."""

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com'})

+     @mock.patch('fegistry.views.requests.get')

+     def test_200(self, get):

+         """Ensure correct operation of the list_tags view when the backend gives a 200 status code.

+         """

+         get.return_value = requests.Response()

+         get.return_value._content = \

+             b'{"name":"fedora","tags":["24","latest","25","26","rawhide"]}\n'

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '61',

+             'Content-Type': 'application/json; charset=utf-8',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 200

+ 

+         response = self.app.get('/v2/fedora/tags/list')

+ 

+         get.assert_called_once_with('http://example.com/v2/fedora/tags/list')

+         self.assertEqual(response.status_code, 200)

+         self.assertEqual(json.loads(response.get_data().decode('utf-8')),

+                          {'name': 'fedora', 'tags': ['24', 'latest', '25', '26', 'rawhide']})

+         expected_headers = {

+             'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '61',

+             'Docker-Distribution-API-Version': 'registry/2.0'}

+         self.assertEqual(dict(response.headers), expected_headers)

+ 

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com/'})

+     @mock.patch('fegistry.views.requests.get')

+     def test_404(self, get):

+         """Ensure correct operation of the list_tags view when the backend gives a 404 status code.

+         """

+         get.return_value = requests.Response()

+         get.return_value._content = (

+             b'{"errors":[{"code":"NAME_UNKNOWN","message":"repository name not known to registry",'

+             b'"detail":{"name":"meaning/of/life"}}]}\n')

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '123',

+             'Content-Type': 'application/json; charset=utf-8',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 404

+ 

+         response = self.app.get('/v2/meaning/of/life/tags/list')

+ 

+         get.assert_called_once_with('http://example.com/v2/meaning/of/life/tags/list')

+         self.assertEqual(response.status_code, 404)

+         self.assertEqual(

+             json.loads(response.get_data().decode('utf-8')),

+             {'errors': [{'code': 'NAME_UNKNOWN', 'message': 'repository name not known to registry',

+                          'detail': {'name': 'meaning/of/life'}}]})

+         expected_headers = {

+             'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '123',

+             'Docker-Distribution-API-Version': 'registry/2.0'}

+         self.assertEqual(dict(response.headers), expected_headers)

+ 

+     @mock.patch.dict('fegistry.views.app.config', {'BACKEND_REGISTRY': 'http://example.com:1234'})

+     @mock.patch('fegistry.views.requests.get')

+     def test_content_type_header_not_present(self, get):

+         """

+         Ensure correct operation of the list_tags view when the backend doesn't give us a

+         Content-Type header.

+         """

+         get.return_value = requests.Response()

+         get.return_value._content = \

+             b'{"name":"fedora/cockpit","tags":["24","latest","25","26","rawhide"]}\n'

+         get.return_value.headers = requests.structures.CaseInsensitiveDict({

+             'AppServer': 'proxy12.fedoraproject.org', 'AppTime': 'D=157698',

+             'Connection': 'Keep-Alive', 'Content-Length': '69',

+             'Date': 'Thu, 12 Jan 2017 14:57:47 GMT',

+             'Docker-Distribution-API-Version': 'registry/2.0', 'Keep-Alive': 'timeout=15, max=500',

+             'Server': 'Apache/2.4.6 (Red Hat Enterprise Linux)',

+             'Strict-Transport-Security': 'max-age=15768000; includeSubDomains; preload'})

+         get.return_value.status_code = 200

+ 

+         response = self.app.get('/v2/fedora/cockpit/tags/list')

+ 

+         get.assert_called_once_with('http://example.com:1234/v2/fedora/cockpit/tags/list')

+         self.assertEqual(response.status_code, 200)

+         self.assertEqual(

+             json.loads(response.get_data().decode('utf-8')),

+             {'name': 'fedora/cockpit', 'tags': ['24', 'latest', '25', '26', 'rawhide']})

+         expected_headers = {

+             'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '69',

+             'Docker-Distribution-API-Version': 'registry/2.0'}

+         self.assertEqual(dict(response.headers), expected_headers)

file modified
+46
@@ -13,13 +13,21 @@ 

  #

  # You should have received a copy of the GNU General Public License

  # along with fegistry.  If not, see <http://www.gnu.org/licenses/>.

+ from urllib import parse

+ 

+ from dogpile.cache import make_region

  import flask

+ import requests

  

  from fegistry import config

  

  

  app = flask.Flask(__name__)

  config.load(app)

+ region = make_region().configure(

+     app.config['DOGPILE_CACHE_BACKEND'],

+     expiration_time=app.config['DOGPILE_CACHE_EXPIRATION_TIME'],

+     arguments=app.config['DOGPILE_CACHE_BACKEND_ARGUMENTS'])

  

  

  @app.after_request
@@ -46,3 +54,41 @@ 

          flask.Response: A JSON response

      """

      return flask.json.jsonify({})

+ 

+ 

+ @app.route('/v2/<path:repo>/tags/list')

I'm reasonably sure that with the current architecture, you can make the generation of these functions more automatic?
Just a @app.generate_cached_proxy('/v2/<path:repo>/tags/list') or something?

Yeah that's a good idea. I could do something like that (but probably not a decorator, just a function generator) with my next PR since that will add another one. However, the down side would be that this way we get a nice docblock (and the docblocks automatically go into the Sphinx documentation).

+ def list_tags(repo):

+     """

+     Answer the GET /v2/repo/tags/list API call.

+ 

+     Args:

+         repo (str): The repository to query for tags

+ 

+     Returns:

+         flask.Response: A JSON response with the list of tags in the given repository.

+     """

+     return _get_from_backend_registry('/v2/{}/tags/list'.format(repo))

+ 

+ 

+ @region.cache_on_arguments()

+ def _get_from_backend_registry(path):

+     """

+     Retrieve the given path with a GET request from the backend registry, returning a tuple of the

+     text, status code, and filtered headers (expressed as a dictionary).

+ 

+     Args:

+         path (str): The path to retrieve from the backend registry

+ 

+     Returns:

+         tuple: A 3-tuple of the text of the response (str), the status code (int), and a filtered

+                set of headers from the backend registry (dict). Currently, only the Content-Type

+                header is passed through, but the caller should be able to handle additional headers

+                that may be added in the future.

+     """

+     response = requests.get(parse.urljoin(app.config['BACKEND_REGISTRY'], path))

+     # We don't need to pass all the headers from the backend through, so let's just cherry pick the

+     # ones we want.

+     headers = {}

+     headers['Content-Type'] = response.headers.get('Content-Type',

You'll probably want to have a timeout set here, since by default there is no timeout by default and this will hang until the server closes the connection or finishes the response.

You might also consider using a Session to take advantage of connection pooling.

+                                                    'application/json; charset=utf-8')

+     return (response.text, response.status_code, headers)

file modified
+1 -1
@@ -47,6 +47,6 @@ 

      long_description=README, classifiers=CLASSIFIERS, license=LICENSE, maintainer=MAINTAINER,

      maintainer_email=MAINTAINER_EMAIL, platforms=PLATFORMS, url=URL, keywords='fedora',

      packages=find_packages(exclude=('fegistry.tests', 'fegistry.tests.*')),

-     include_package_data=True, zip_safe=False, install_requires=['flask', 'PyYAML'],

+     include_package_data=True, zip_safe=False, install_requires=['flask', 'PyYAML', 'requests'],

I would appreciate it if this also gets split over multiple lines, so it gets easier to read.

      tests_require=['flake8', 'mock', 'nose', 'nose-cov'],

      test_suite="nose.collector")

This pull request contains two commits. The first commit adds support for the list tags view, and the second uses dogpile.cache to cache responses from the backend registry to help take load off of it.

1 new commit added

  • Use dogpile cache to cache responses from the backend registry.
7 years ago

rebased

7 years ago

To be honest, I'm not sure I really like this default. The problem is that if someone deploys their own, and forgets to configure, they get a real (working) registry that gives us lots of requests.
I would rather have this use http://example.com/ or the like, and have people explicitly configure it.

A different backend registry url than the first test?

Perhaps it's an idea to just do one per line, to make it easier to read at a glance?

I would appreciate it if this also gets split over multiple lines, so it gets easier to read.

I'm reasonably sure that with the current architecture, you can make the generation of these functions more automatic?
Just a @app.generate_cached_proxy('/v2/<path:repo>/tags/list') or something?

Yeah I threw in a few different formats to exercise the urljoin a bit, just for variety.

Yeah that's a good idea. I could do something like that (but probably not a decorator, just a function generator) with my next PR since that will add another one. However, the down side would be that this way we get a nice docblock (and the docblocks automatically go into the Sphinx documentation).

I think what you've done here is fine, but if you're in a situation in the future where you want to mock out HTTP requests you might want to look at vcrpy or betamax which both record real HTTP requests and let you play them back in your tests.

You'll probably want to have a timeout set here, since by default there is no timeout by default and this will hang until the server closes the connection or finishes the response.

You might also consider using a Session to take advantage of connection pooling.

Other than the timeout comment I'm fine with this.

We are killing this project, so I am closing this PR.

Pull-Request has been closed by bowlofeggs

6 years ago