#7157 Deploy regindexer
Closed: Fixed 5 years ago Opened 5 years ago by otaylor.

High Level

regindexer is a tool for generating a static index of a registry. Having such an index is required to allow browsing Fedora-generated Flatpaks in GNOME Software or kdediscover. This is something we want for F29, so it would be great to have regindexer deployed in Fedora infrastructure before the F29 beta freeze, or as soon as possible if that's not possible.

We need:

  • An instance of of regindexer-daemon running with appropriate /etc/fedmsg.d configuration.
  • The correct sync setup to copy the results from /var/lib/regindexer to the staging server and then to the proxies.
  • HTTP configuration to map the queries generated by the flatpak client into the correct static URLs

Review request for regindexer: https://bugzilla.redhat.com/show_bug.cgi?id=1612141

Web structure

flatpak remote-add --oci fedora https://foo.example.com 

Will result in flatpak looking for an index at: https://foo.example.com/index/static?annotation:org.flatpak.ref:exists=1&architecture=amd64&os=linux&tag=latest

Adding an anchor will change the tag that is looked for:

flatpak remote-add --oci fedora https://foo.example.com#testing

Gives: https://foo.example.com/index/static?annotation:org.flatpak.ref:exists=1&architecture=amd64&os=linux&tag=testing

Since registry.fedoraproject.org already has a sync’ed static webroot, which doesn’t use the index/ subdomain, it seems like we can just put the regindexer index under index/, and make

flatpak remote-add --oci https://registry.fedoraproject.org

work.

regindexer configuration

The main config file for index generation and the daemon is /etc/regindexer/config.yaml. The necessary content looks like (with adaptions to the final setup):

icons_dir: /var/lib/regindexer/icons/
icons_uri: /app-icons/
indexes:
    flatpak:
        output: /var/lib/regindexer/index/flatpak.json
        registry: http://registry.fedoraproject.org:5000
        registry_public: /
        tags: ['latest']
        required_annotations: ['org.flatpak.ref']
        extract_icons: True
    flatpak_testing:
        output: /var/lib/regindexer/index/flatpak-testing.json
        registry: https://registry.fedoraproject.org    
        registry_public: /
        tags: ['testing']
        required_annotations: ['org.flatpak.ref']
        extract_icons: True
    flatpak_amd64:
        output: /var/lib/regindexer/index/flatpak-amd64.json
        registry: https://registry.fedoraproject.org    
        registry_public: /
        tags: [‘latest’]
        required_annotations: ['org.flatpak.ref']
        architectures: ['amd64']
        extract_icons: True
    flatpak_testing_amd64:
        output: /var/lib/regindexer/index/flatpak-testing-amd64.json
        registry: https://registry.fedoraproject.org    
        registry_public: /
        tags: [‘testing’]
        required_annotations: ['org.flatpak.ref']
        architectures: ['amd64']
        extract_icons: True
daemon: 
    topic_prefix: org.fedoraproject
    environment: {stg/prod}

A looping template would be useful when adding more architectures. [It would be possible to extend the file format to have defaults and cut down on some of the boilerplate, but that's not implemented at the moment.]

Syncing

There could be multiple ways of doing this - a shared nfs volume, running an rsyncd instance with common transient storage with regindexer, syncing over https. Some considerations:

  • All the proxy servers must have a consistent ETag for the indexes, or they will be redownloaded constantly . This means that they should be getting the mtimes from a common source.
  • If regindexer writes the index and icons to transient storage, then the ETags preferably do not change when restarting the indexer. This could be accomplished by rsyncing to a persistent location with --checksum and without --times.
  • If regindexer writes the index and icons to transient storage, then care must be taken not to sync before regindexer finishes indexing the first time after a restart. This is a bit tricky and might require regindexer changes.

HTTP Configuration

As above, flatpak expects the index to support the Flagstate query protocol, and will include the arch and tag in the query string. So something like the following

Alias "/stable/index/" "/srv/www/regindexer/index/"
Alias "/app-icons/" "/srv/www/regindexer/icons/"

<Directory “/srv/www/regindexer/index/">
    Options +FollowSymLinks

    ExpiresActive on
    ExpiresDefault "access plus 30 minutes"

    RewriteEngine on
    RewriteBase /index/

    RewriteCond "&%{QUERY_STRING}" &annotation(%3A|:)org.flatpak.ref(%3A|:)exists=1
    RewriteCond "&%{QUERY_STRING}" &tag=testing
    RewriteCond "&%{QUERY_STRING}" &architecture=([^&]+)
    RewriteRule "static" flatpak-testing-%1.json [L]

    RewriteCond "&%{QUERY_STRING}" &annotation(%3A|:)org.flatpak.ref(%3A|:)exists=1
    RewriteCond "&%{QUERY_STRING}" &architecture=([^&]+)
    RewriteRule "static" flatpak-%1.json [L]

    RewriteCond "&%{QUERY_STRING}" &annotation(%3A|:)org.flatpak.ref(%3A|:)exists=1
    RewriteCond "&%{QUERY_STRING}" &tag=testing
    RewriteRule "static" flatpak-testing.json [L]

    RewriteCond "&%{QUERY_STRING}" &annotation(%3A|:)org.flatpak.ref(%3A|:)exists=1
    RewriteRule "static" flatpak.json [L]

    AllowOverride None
    Require all granted
</Directory>

<Directory "/srv/www/regindexer/icons/">
    ExpiresActive on
    ExpiresDefault "access plus 1 year"

    AllowOverride None
    Options +Indexes
    Require all granted
</Directory>

The icons are extracted with content-addressed URI's - which is why the expiry is set as close to infinite as recommended by the HTTP standards. It's possible that the ExpiresDefault "access plus 30 minutes" for the indexes is annoying long - for example, updates pushed to testing might not be visible upon receipt of a notification because the client is still using the old cached copy. If we deploy this way we'll need to pay attention to whether it results in annoying quirks and we should use a shorter value and count on ETags to avoid too much traffic.


Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

5 years ago

We don't have registry content on proxies as far as I know, it's all on a pair of servers that the proxies send requests to.

So, could we just run this on one of those and serve it from there? Or should we look at moving this out to the proxies for speed/caching reasons?

My understanding was that the human readable web content generated by 'reg' - e.g. https://registry.fedoraproject.org/index.html - is sync'ed to the proxies - this like a machine-readable, more complete version of that.

A generous estimate the eventual traffic is:

 2 *  " du -hsc /usr/share/app-info/xmls/fedora.xml.gz" - 8 MB
 * Number of Fedora systems updating Flatpaks              -  100k
 / frequency that clients check                                -  once a day?
=====
800 GB/day

In the short term it's going to be much less, but since (if I'm right) registry.fedoraproject.org is already a mix of proxied content and sync'ed content, it seems natural to just set things up this way to begin with (maybe some performance improvements for international clients.)

We don't want to serve it off the CDN because the index acts as the "source of truth" for Flatpaks updates - by changing it, you can change what Fedora systems download and install.

The scraping and generation service could run on one of the registry.fedoraproject.org machines if desired.

ok, this should be all setup in staging now.

There don't appear to be any flatpaks currently for it to index however, so not sure how testable it is.

I did point it to the prod candidate registry and it did find the eog one.

Please let us know if there's any issues or adjustments needed.

:coffee:

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata