#7280 The container registry is lying and I have gone mad
Closed: Fixed 9 months ago Opened 9 months ago by bowlofeggs.

  • Describe what you need us to do:
    The container registry lists 'f29' as a valid tag here:
$ http https://registry.fedoraproject.org/v2/fedora/tags/list
HTTP/1.1 200 OK
Accept-Ranges: bytes
Age: 68
AppServer: proxy04.fedoraproject.org
AppTime: D=589
Connection: Keep-Alive
Content-Length: 292
Content-Type: application/json; charset=utf-8
Date: Thu, 04 Oct 2018 03:16:14 GMT
Docker-Distribution-Api-Version: registry/2.0
Keep-Alive: timeout=15, max=500
Referrer-Policy: same-origin
Server: Apache/2.4.34 (Fedora)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept
Via: 1.1 varnish (Varnish/5.1)
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Varnish: 304495 855896
X-Xss-Protection: 1; mode=block

{
    "name": "fedora",
    "tags": [
        "25",
        "26-modular",
        "29-aarch64",
        "29",
        "rawhide",
        "30",
        "28-aarch64",
        "28-armhfp",
        "28-x86_64",
        "30-s390x",
        "24",
        "26",
        "27",
        "28",
        "29-ppc64le",
        "29-s390x",
        "29-x86_64",
        "30-aarch64",
        "30-ppc64le",
        "30-x86_64",
        "latest",
        "28-ppc64le",
        "27-aarch64",
        "27-armhfp",
        "27-ppc64le",
        "27-x86_64"
    ]
}

However, if you try to retrieve the f29 image, the registry says it doesn't have one:

$ http https://registry.fedoraproject.org/v2/fedora/manifests/f29
HTTP/1.1 404 Not Found
Age: 0
AppServer: proxy04.fedoraproject.org
AppTime: D=163015
Connection: Keep-Alive
Content-Length: 93
Content-Type: application/json; charset=utf-8
Date: Thu, 04 Oct 2018 03:20:29 GMT
Docker-Distribution-Api-Version: registry/2.0
Keep-Alive: timeout=15, max=500
Referrer-Policy: same-origin
Server: Apache/2.4.34 (Fedora)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept
Via: 1.1 varnish (Varnish/5.1)
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Varnish: 856018
X-Xss-Protection: 1; mode=block

{
    "errors": [
        {
            "code": "MANIFEST_UNKNOWN",
            "detail": {
                "Tag": "f29"
            },
            "message": "manifest unknown"
        }
    ]
}

This is actually a bit inconsistent - if you try enough times you might find that f29 is now present but rawhide is missing. I've run it over and over tonight in a CentOS CI job and haven't ever found a time that it said it had both rawhide and f29 - it was always missing both or one of them.

I first started noticing this during the infrastructure outage this evening, but now that the outage is over it seems to still be happening. It seems like it is likely due to the outage somehow, but I am not sure what changed during the outage. Hopefully that's a useful clue?

  • When do you need this? (YYYY/MM/DD)
    ASAP.

  • When is this no longer needed or useful? (YYYY/MM/DD)
    If we stop running a registry.

  • If we cannot complete your request, what is the impact?
    The registry does not reliably serve f29 or rawhide images, which likely affects many projects (it at least affects Bodhi's CI jobs).


There does seem to be a correlation with which proxy I get and whether it has the manifest I request. For example, with Rawhide I see that proxy06 has it (consistently):

http https://registry.fedoraproject.org/v2/fedora/manifests/rawhide
HTTP/1.1 200 OK                                                  
Accept-Ranges: bytes
Age: 33                   
AppServer: proxy06.fedoraproject.org
AppTime: D=1735                                                                                                                                                                                                   
Connection: Keep-Alive                                                                                                                                                                                            
Content-Length: 2164
Content-Type: application/vnd.docker.distribution.manifest.v1+prettyjws
Date: Thu, 04 Oct 2018 03:32:37 GMT
Docker-Content-Digest: sha256:de4eb6f8d2ab5a8dc5db53d390c938fb50a0f8c2eb59167512de5aa05cad8b8f
Docker-Distribution-Api-Version: registry/2.0
Etag: "sha256:de4eb6f8d2ab5a8dc5db53d390c938fb50a0f8c2eb59167512de5aa05cad8b8f"
Keep-Alive: timeout=15, max=500
Referrer-Policy: same-origin
Server: Apache/2.4.34 (Fedora)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept                                                                                         
Via: 1.1 varnish (Varnish/5.1)
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN         
X-Varnish: 951132 1180576
X-Xss-Protection: 1; mode=block

{                                            
   "schemaVersion": 1,             
   "name": "fedora",                         
   "tag": "rawhide",           
   "architecture": "amd64", 
   "fsLayers": [              
      {                                                                
         "blobSum": "sha256:d3afc103d44cd915b9d8e149f1b8f910e55a0ca245df35db02d7cf4f20dac548"
      }                       
   ],                          
   "history": [            
      {                 
         "v1Compatibility": "{\"architecture\":\"amd64\",\"comment\":\"Created by Image Factory\",\"config\":{\"Systemd\":false,\"Hostname\":\"\",\"Entrypoint\":null,\"Env\":[\"DISTTAG=f30container\",\"FGC=f30\"],\"OnBuild\":null,\"OpenStdin\":false,\"MacAddress\":\"\",\"User\":\"\",\"VolumeDriver\":\"\",\"AttachStderr\":false,\"AttachStdout\":false,\"NetworkDisabled\":false,\"StdinOnce\":false,\"Cmd\":[\"/bin/bash\"],\"WorkingDir\":\"\",\"AttachStdin\":false,\"Volumes\":null,\"Tty\":false,\"Domainname\":\"\",\"Image\":\"\",\"Labels\":{\"version\":\"30\",\"vendor\":\"Fedora Project\",\"name\":\"fedora\",\"license\":\"MIT\"},\"ExposedPorts\":null},\"container_config\":{\"Systemd\":false,\"Hostname\":\"\",\"Entrypoint\":null,\"Env\":null,\"OnBuild\":null,\"OpenStdin\":false,\"MacAddress\":\"\",\"User\":\"\",\"VolumeDriver\":\"\",\"AttachStderr\":false,\"AttachStdout\":false,\"NetworkDisabled\":false,\"StdinOnce\":false,\"Cmd\":null,\"WorkingDir\":\"\",\"AttachStdin\":false,\"Volumes\":null,\"Tty\":false,\"Domainname\":\"\",\"Image\":\"\",\"Labels\":null,\"ExposedPorts\":null},\"created\":\"2018-09-10T09:55:12Z\",\"docker_version\":\"1.10.1\",\"id\":\"988782b210ca91a905f9d659ccc6ad0c547c1b6345b7e8225bf1034b3f9ebd88\",\"os\":\"linux\"}"             
      }                
   ],                            
   "signatures": [                                                                                   
      {       
         "header": {                     
            "jwk": {
               "crv": "P-256",
               "kid": "LQ55:IFOY:KG7P:F7KG:A7BW:PUOT:LRIM:5BJ5:I25A:S7IO:SFAD:7NK7",
               "kty": "EC",
               "x": "w4TjROKHhBLOep-dSwVSSRtgxprQNvAv4srfuyLQhnI",                                   
               "y": "BIjrjLGRljCjgl3dySX2CFu6niWxl3aNLdB6RWj39eA"
            },      
            "alg": "ES256"
         },                         
         "signature": "wjc4MqjMIfg6vk5hzYDHZ4tNKMJJxD6g9GE_K1bgd2YsPLbhfHNXScItmS3diNS6c_mhKLDsg6DWnqwp4ebElw",                                                                                                   
         "protected": "eyJmb3JtYXRMZW5ndGgiOjE1MTcsImZvcm1hdFRhaWwiOiJDbjAiLCJ0aW1lIjoiMjAxOC0xMC0wNFQwMzozMjozN1oifQ"                                                                                            
      }             
   ]                                                                   
}

Where proxy04 consistently doesn't have it:

http https://registry.fedoraproject.org/v2/fedora/manifests/rawhide
HTTP/1.1 404 Not Found        
Age: 33                        
AppServer: proxy04.fedoraproject.org
AppTime: D=633           
Connection: Keep-Alive         
Content-Length: 182
Content-Type: application/json; charset=utf-8
Date: Thu, 04 Oct 2018 03:32:36 GMT
Docker-Distribution-Api-Version: registry/2.0
Keep-Alive: timeout=15, max=500
Referrer-Policy: same-origin
Server: Apache/2.4.34 (Fedora)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept                                                                                 
Via: 1.1 varnish (Varnish/5.1)
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Varnish: 797193 797172
X-Xss-Protection: 1; mode=block                                                                                                                                                                                                                                                                                                                                                                                                       {                                                                                                                                                                                                                      "errors": [                                                                                                                                                                                                            {                                                                                                                                                                                                                      "code": "MANIFEST_UNKNOWN",                                                                                                                                                                           
            "detail": {
                "Name": "fedora",
                "Revision": "sha256:d1df13939e52b24fef32271b8f1543e90c38a1a47e84c859db45d437e463d0f3"
            },
            "message": "manifest unknown"
        }           
    ]                         
}

Yeah, the problem seems to be inconsistency between the gluster volumes...

oci-registery01:
oci-registry01.phx2.fedoraproject.org:/registry 186G 89G 98G 48% /srv/docker

oci-regestry02:
oci-registry02.phx2.fedoraproject.org:/registry 102652956 62462152 40190804 61% /srv/docker

So, I unmounted 02 and restarted glusterd and it connected and they now show the same size, but there's only one (01) gluster brick it's using.

I think perhaps we should just make a nfs volume for this and copy the contents off and call it a day.

+1 to NFS. It seems that I get consistent results now (rawhide seems to always work, f29 seems to always fail).

We talked about this a bit on IRC today. It sounds like someone (maybe @kevin or @puiterwijk?) will make an NFS share for the registry.

Once that's in place, I will attempt to merge the data from the two registries into the NFS share. The suggestion from @puiterwijk is that the files are unique and so it should be ok to merge the file system trees without doing any overwriting.

I would prefer to do this when the registry processes are not running. Do I need to schedule a formal outage to do that?

Metadata Update from @bowlofeggs:
- Issue assigned to bowlofeggs

9 months ago

I have adjusted the title of this ticket in response to meaningful feedback from @labbott and @smooge.

Thanks to a bunch of help from @puiterwijk, I was able to get this resolved.

Metadata Update from @bowlofeggs:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

9 months ago

Login to comment on this ticket.

Metadata