#6022 (re)setup amazon mirroring for Fedora
Closed: Fixed 4 years ago by kevin. Opened 6 years ago by kevin.

Long ago we setup mirrors in amazon for both Fedora and EPEL for instances there to use. We found that the Fedora ones were almost never used, so we dropped that and kept the EPEL ones.

Several things have changed since then:

  • There's hopefully many more Fedora instances in amazon.
  • Amazon is offering to help us setup a cloudfront type something like Debian has ( https://cloudfront.debian.net/ ) that can be used as a general mirror for Fedora instances outside amazon and can be offered in a number of regions (possibly ones where we don't have many mirrors currently).

Items we need to decide:

  • what Fedora releases / arches / repos should be mirrored? Perhaps we should just start with Fedora 24 and 25 x86_64 Everything repo?
  • what changes do we need to make to our s3-mirror setup to more efficently mirror up this content. It's been suggested we look at 'aws s3' instead of our current 's3-cmd', and also consider syncing to one region and then doing something on the amazon side to sync the others from that one.
  • Any changes in mirrormanager side. I think we have all the ip blocks for the regions for the existing EPEL mirror, so these changes should likely be minimal.

Adding: @dustymabe @mattdm @ausil and @adrian here for comment/input.


  • what Fedora releases / arches / repos should be mirrored? Perhaps we should just start with Fedora 24 and 25 x86_64 Everything repo?

I'd like to eventually get to "everything" -- and maybe the container registry, too, and, while we're at it, ostrees. But if we want to start small, I think just F25 and then F26 is fine. (I'd rather all of the current releases than part of all plus previous release.)

Any changes in mirrormanager side. I think we have all the ip blocks for the regions for the existing EPEL mirror, so these changes should likely be minimal.

Are these IP blocks up to date, and do they cover all of the new regions as the pop up like mushrooms? We need a SoP and someone responsible for keeping 'em up to date (unless it can somehow be automated).

what changes do we need to make to our s3-mirror setup to more efficently mirror up this content. It's been suggested we look at 'aws s3' instead of our current 's3-cmd', and also consider syncing to one region and then doing something on the amazon side to sync the others from that one.

I looked briefly at this. Existing script is at https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/s3-mirror/

It looks like it's doing parallel uploads to multiple regions. S3 has a feature called Cross-Region Replication which seems ideal and I think would speed things up considerably. https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/

Also, it appears that s3cmd does an md5sum of everything— it uses that and size, but apparently not date — while aws s3 uses date and size but no checksum. That's going to be much faster, but I'm not sure if it's safe. On the other hand, we don't recommend -C (forced checksums)
for mirror rsync, do we?

Also, the current script uses --delay-updates, which is supposed to make the operation atomic (at the end of the full transfer), but... the s3cmd docs mark that as obsolete. Could we do some sort of
blue/green? David, do you have any suggestions here? (There also doesn't seem to be a "delete after", so unless we can do full blue-green I guess we'd want to do two runs, with the sync without delete and then immediately after again with a cleanup pass with delete.)

(Note that replacement of individual files is atomic in S3, so at least there won't be problems on the per-file level.)

I'd like to eventually get to "everything" -- and maybe the container registry, too, and, while we're at it, ostrees. But if we want to start small, I think just F25 and then F26 is fine. (I'd rather all of the current releases than part of all plus previous release.)

ok. I guess that would be Everything repo + updates + updates testing for f25 and Everything + updates-testing for f26 to start with? x86_64 only right?

Are these IP blocks up to date, and do they cover all of the new regions as the pop up like mushrooms? We need a SoP and someone responsible for keeping 'em up to date (unless it can somehow be automated).

It's already all automated. It's a script that runs (once a week?) that gets all the blocks in each region, checks against it's list and adds new ones and removes old ones. Should be all set in that case.

ok. I guess that would be Everything repo + updates + updates testing for f25 and Everything + updates-testing for f26 to start with? x86_64 only right?

Unless Amazon has an objection, I'd like to include the ISOs and images, too. It'd be nice of aws s3 sync would translate hardlinks to redirects, but... it doesn't. So, maybe just some sort of special-casing to just upload those?

It's already all automated. It's a script that runs (once a week?) that gets all the blocks in each region, checks against it's list and adds new ones and removes old ones. Should be all set in that case.

Cool.

Metadata Update from @codeblock:
- Issue assigned to codeblock

6 years ago

This seems to have gotten stuck. What needs to happen to unstick it? :)

@mattdm I've started looking at this. I'm researching the S3 cross-region replication stuff and looking at using aws instead of s3cmd.

My tentative plan is to do this incrementally so we're not making a lot of changes at once... So something like:

  • Change our current EPEL sync to use aws s3 and make sure it works fine.
  • Enable the cross-sync and see what (if anything) we have to change.
  • Add Fedora into the mix.
  • Make sure everything is working fine and not taking forever to sync.
  • Enable Fedora in mm.

I've started looking at switching the EPEL sync over and figuring out the differences between aws s3 and s3cmd in terms of command syntax and so forth. That's where we are right now.

@codeblock - any updates on this? @puiterwijk was interested in this when I talked to him at flock

@dustymabe It's still on my list/I'm still working on it. The sync happens on the (currently frozen) mirrormanager backend box anyway, so I probably won't try to land anything until after beta freeze.

@codeblock — ping! what's the status on this?

I just had a look. Getting the whole fedora tree into amazon would be this simple change:

diff --git a/roles/s3-mirror/files/s3sync b/roles/s3-mirror/files/s3sync
index f8d04137c..ed21d44b0 100755
--- a/roles/s3-mirror/files/s3sync
+++ b/roles/s3-mirror/files/s3sync
@@ -32,7 +32,7 @@ S3CMD_ARGS="sync \
   --exclude-from /usr/local/etc/s3-mirror-excludes.txt \
   "

-content="epel"
+content="epel fedora"
 targets="s3-mirror-us-east-1 s3-mirror-us-west-1 s3-mirror-us-west-2 s3-mirror-eu-west-1 s3-mirror-ap-northeast-1"
 report=0

diff --git a/roles/s3-mirror/templates/report_mirror.conf b/roles/s3-mirror/templates/report_mirror.conf
index e51e1d974..7068c42b9 100644
--- a/roles/s3-mirror/templates/report_mirror.conf
+++ b/roles/s3-mirror/templates/report_mirror.conf
@@ -52,7 +52,7 @@ rsyncd=/var/log/rsyncd.log
 # path= is the path on your local disk to the top-level directory for this Category

 [Fedora Linux]
-enabled=0
+enabled=1
 path=/srv/pub/fedora/linux

 [Fedora EPEL]

It would then require adding the categories to mirrormanager and it should be good to go.

Currently the aws mirrors are all marked as private and would not be available outside of amazon.

I wrote a new script which syncs both epel and Fedora using aws s3 sync and Amazon accelerated uploads.

Sorry for lack of update here. We found that cross-region replication wouldn't work since you can only replicate to one other region (and there's no concept of chaining regions or anything like that). However, aws s3 sync can sync one bucket to another.

So the way I've been going forward on this is: Upload to an accelerated upload endpoint (basically cloudfront in front of S3), then use bucket-syncing to sync from that bucket to the other ones.

Right now my script is one script that functions linearly: First upload EPEL, then upload Fedora, then sync first region to second region, second to third, etc, etc.

The bucket-syncing can probably happen at any time, so we can probably run those processes independently of the main upload script.

It's also probably worth noting that no matter what we do, the update won't be atomic, so there's always a chance that if new metadata lands but packages haven't synced yet (or vice versa), then a user might try to download a package that doesn't exist on the mirror. I think in theory this isn't a problem - at worst, things should just fall back to another mirror for that package.

Anyway there has been progress here, sorry I haven't been transparent enough about it.

Currently the aws mirrors are all marked as private and would not be available outside of amazon.

We'll be switching to new buckets as part of switching to the new script (this was so we can try out the new script without interfering with the in-use buckets). So part of this switchover will be updating the mirrormanager URLs to point to the new buckets as well.

So @codeblock, what are next (remaining?) steps and ETA at this point?

@pfrields So right now I'm experimenting with what I can do to get the upload time down more. It (aws s3 sync) has to check every file before it uploads anything to see what has changed, and this causes the syncs to be slow.

It has a few options that I'm experimenting with (--size-only, --exact-timestamp), which alter how files are checked and might be quicker and those are what I'm trying now, but less accurate.

Once I can get the time down to something reasonable, next steps are moving the script to be in ansible and have ansible set up the cron. Then we update the urls in mirrormanager and should be done. That should be pretty trivial.

So we're almost there, but I'm stuck until I can figure out how to not make it take ~a day to sync the Fedora tree. I might reach out to the aws-cli folks and see if they have any suggestions.

Almost everything is "content immutable" objects right? That's true of RPMs and ostree objects. Just need a whitelist of things which can change (e.g. for ostree repos it's the summary file and refs) or conversely assume objects/ is content-immutable (except for .commitmeta unfortunately).

Similarly for rpm-md repos I believe it's only repomd.xml which is not content-immutable.

Any progress? I'd love to be able to announce this with the F28 release.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Asignee

6 years ago

@codeblock Any news? I'd love to be able to announce this with the F29 release :)

@mattdm yes, actually. We have the sync working now, and we have things going through CloudFront with an S3 bucket origin.

It's in mirrormanager now. I just need to add the amazon netblocks and make it active, I think.

This should be all done I think? @codeblock anything left on this?

I hope that the work on mirroring on AWS is not yet complete, as IMHO current state is far from ideal. From what I experience as a user of Fedora on Amazon EC2, enabling CloudFront mirror made accessing Fedora repositories significantly slower.

In the past, when accessing metalinks from EC2 instances in eu-central-1 region (Frankfurt, Germany) they were resolved to fast local mirrors, such as mirrors.n-ix.net, an up-to-date mirror with high throughput and average ping below 5 ms.

Currently metalinks are resolved to CloudFront mirror and a set of mirrors located in North America (mostly US, some CA). The first mirror returned, the CloudFront one, is most often outdated (since, as I read on IRC, one sync cycle takes 2 days) and has incomplete content (notably, files with plus sign in names are not served correctly, just opened as #7440). This leads to DNF trying other mirrors, which are all located in North America and are slow when accessed from Europe.

The result is that downloading packages is significantly slower than it used to be when S3 mirror did not exist.

Yes we are still working on issues...

  • The 2 day sync was when using a concurrency of 1. we cranked this up, so it should be a lot faster now.
    @codeblock can check it after tomorrow.

  • That you get ip's in NA is pretty unexpected. It should still give you mirrors local to you, unless geoip is marking all amazon ips as NA?

I've tested a bunch of AWS floating IPs that are located in Frankfurt, Germany. All of them were correctly localized by MaxMind GeoIP2 City Database Demo, but geoiplookup on Fedora Infra misrecognizes some of them as US hosts. For example:

[mizdebsk@batcave01 ~]$ geoiplookup -f /srv/web/infra/bigfiles/geoip/GeoIP.dat 3.121.240.12
GeoIP Country Edition: US, United States
[mizdebsk@batcave01 ~]$ geoiplookup -f /srv/web/infra/bigfiles/geoip/GeoLiteCity.dat 3.121.240.12
GeoIP City Edition, Rev 1: US, N/A, N/A, N/A, N/A, 37.750999, -97.821999, 0, 0

So, is this still happening? I think our sync times are down, but @codeblock would know more.

As to the GeoIP, perhaps we have old content there? @codeblock can you look into that as well?

Our GeoIP data is using very dead data. It will need a project to rewrite everything to use the new geoip tools using mmdblookup and removing all geoip. This will be needed for everything from mirrormanager, dns and geoip.fedoraproject.org plus some other items. I think it should be a different ticket than this one.

@smooge I rewrote the geo-dns one, it's a different setup (but same source database I believe) than the rest.

It looks like there's a new version of the geoip python library for the new format (https://github.com/maxmind/GeoIP2-python) -- but in the geo-dns script I parsed the data myself.

Do all of these use the roles/geoip-city-wsgi/app/files/geoip-city.wsgi python script? Or is there more that I'm missing somewhere? I can work on rewriting it.

Moving to #7661

ok, this should be long since setup... and recently @codeblock re-worked the sync script so it finishes in minutes.

If you are still seeing out of date mirrors in amazon, please open a new ticket and we can sort it out, but it should be pretty quickly in sync now.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata