Learn more about these different git repos.
Other Git URLs
We need to be able to copy in ostree content from our Fedora CoreOS builds into our ostree repos. This is done manually right now by a Fedora release engineer during a release. We'd like to automate this based on fedora messaging messages.
Since I'll be helping develop and manage this "glue" it would be nice if we can run it in openshift so that I can view logs and such without having to gain more access to fedora releng/infra systems. Currently that would require a writable /mnt/koji inside of the openshift app that's being developed, which is probably a lot to ask.
/mnt/koji
General architecture:
We'll probably want a split repo setup where we import most content into a staging repo (like the compose repo we have today) and then sync over part of it when we do an actual release.
cross referencing: https://github.com/coreos/fedora-coreos-tracker/issues/199
Metadata Update from @mohanboddu: - Issue tagged with: meeting
from the meeting today:
#info We decided to create a separate volume for just ostree stuff with rw access and dustymabe will test it in stg first
So we're going to migrate the ostree repos to a separate netapp volume (with the same level of backup support) and grant read/write access to that to openshift. There is a wrinkle in that we use two directories:
/mnt/koji/compose/ostree/
/mnt/koji/ostree/
Is it possible to make a single netapp volume have an ostree directory that gets mounted under those locations so that we don't have to change any of our existing consumers? I like the idea of it being a single netapp volume vs multiple as it would be easier to maintain and we get deduplication.
dustymabe will test it in stg first
I think I would like to test this in stg rather than communishift so that we can get a more realistic test. @kevin can you create a volume in stage and show me how to mount it ?
I can try and set this up on friday (2019-10-04)
ok we have worked on this quite a bit today. We have a netapp share fedora-ostree-content that is shared to the staging openshift via two PVs fedora-ostree-content-volume-{1,2} that point at the same NFS share. We were then able to mount those two PVs into pods within two different projects within openshift. We can also mount those shares under /mnt/koji/ostree and /mnt/koji/compose/ostree even though it's a single netapp volume.
fedora-ostree-content
fedora-ostree-content-volume-{1,2}
/mnt/koji/ostree
/mnt/koji/compose/ostree
Next steps:
get the code reviewed and approved for use in Fedora Infra
I asked and was told I could use this ticket as an RFE for getting the ostree importer into Fedora. Can we please get a security audit of https://github.com/coreos/fedora-coreos-releng-automation/tree/master/coreos-ostree-importer ?
Metadata Update from @dustymabe: - Issue assigned to kevin
Can we please get a security audit of https://github.com/coreos/fedora-coreos-releng-automation/tree/master/coreos-ostree-importer ?
The security audit process is undergoing some changes. In the infrastructure meeting today @kevin volunteered to try to wrangle those changes and get this assigned for review.
The script looks okay as of commit 7e7ffb5fb6729beecbc8e857064296c16bb152f9. If any significant changes occur, please re-request a security audit.
7e7ffb5fb6729beecbc8e857064296c16bb152f9
Also, for future cases: please note that it would help if people tag security audits with the "security" tag, so that they show up in my overviews.
thanks @puiterwijk !
Updated next steps:
Are the next steps here still accurate or did some of those get sorted out?
@jlebon @dustymabe started working on it with the help from @kevin , we will know more by the end of the day.
@kevin and I met today and worked on a strategy for permissions. For now we think we can appropriately run things by using a less restrictive umask (0002) that will allow for the group permissions to include write for newly created files.
0002
group
write
We also found that we needed /usr/bin/ostree to create some files with group write permissions. There is an open PR for that now (thanks @jlebon). In the meantime we can workaround with something like find . -type d | xargs sudo chmod g+w after each time we touch the repo in each respective app.
/usr/bin/ostree
find . -type d | xargs sudo chmod g+w
status update. I've done a lot of work to get coreos-ostree-importer running in stage and listening to fedora messages, processing the messages, and also sending a response message. I've updated our pipeline to also send a request message to the importer.
At this point we are still here:
For the 2nd item, @jlebon is working on that fix and it shouldn't be long. For the 1st bullet I'll try to work with mohan and kevin to see if we can schedule some time to do the migration.
Regarding:
here is a rough outline for the migration:
0.
1.
2.
3.
4.
5.
This is in progress now. I'll check off the following items as they get done.
[x] disable new-updates-sync [x] make sure for real that no composes are running [x] do a final final rsync [x] move ostree directories to new directory name [x] mount over ostree mount locations with the new netapp volume [x] verify clients can still rpm-ostree upgrade [x] mount up net-app volume into openshift pods in prod openshift [x] watch new pungi composes to make sure content gets synced correctly [x] watch new-updates-sync to make sure content gets synced correctly [ ] watch coreos-ostree-importer to make sure content gets synced correctly
[x] move ostree directories to new directory name [x] mount over ostree mount locations with the new netapp volume [x] verify clients can still rpm-ostree upgrade
we are still monitoring composes to make sure things are progressing.. the pungi initiated composes were stuck because robosignatory wasn't ablt to sign ostree commits (we forgot to add the mount to the robosignatory machine). We added the mount and those commits got signed and we are now progressing.
Another thing we did after we did the final rsync was to run find . -type d | xargs chmod g+w in order to give directories group write permissions but this took FOREVER on the /mnt/koji/ostree/repo repo because it had to look at every file under objects to test if it was a directory or not. An optimization here is to use the fact that files under ./objects/* are directories and files under ./objects/*/* are normal files. This optimized command excludes ./objects from the find and runs much faster:
find . -type d | xargs chmod g+w
/mnt/koji/ostree/repo
./objects/*
./objects/*/*
./objects
find
(ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chmod g+w
ok one issue that we've hit with sharing the NFS mount into the openshift pods is that we have several different processes touching these shared mounts and the file/directory ownership is a bit all over the place. We have:
0:0
root
263:263
ftpsync
988
992
One proposal here is to pick a uid/gid for all files/directories in the repo and then setgid on the directories so that new files will be group owned by that gid. I'm not sure if this would work everywhere or not.
files created by robosignatory (signing): ids in the range of 988->992 (fedmsg hub?) signed ostree commits, files under refs/heads/
robosignatory now runs as uid=992 gid=988 after the switch to Fedora Messaging. The other numbers in the range are from when it ran as part of fedmsg (gid=991) on the autosign box. This means that new files created by robosignatory should have uid=992 and gid=988.
OK here are some facts:
prod repo:
263
$randomuid:99
99
compose repo:
0
OK so a few suggestions here after talking to kevin:
For the compose repo I think this depends on robosignatory. kevin mentioned we might be able to just switch the uid/gid for robosig to be ftpsync (we should think about this more).
If we can't switch robosig something like this could work:
In this last option I think new-updates-sync would still work but not sure.
Yeah, we should be able to change robosignatory user to whatever uid/gid we want... it's not used anywhere but on the autosign01 system.
This plan should be ok. It also fixes another issue/thought I had: We should look at hardlinking the two directories to save room. Hardlinking needs the files to have the same checksum and ownership, so moving everything to ftpsync would work there.
How time sensitive is this? We are currently in freeze and this would need a freeze break... can we wait until after? Or should we go ahead with a freeze break request?
That's great assuming other consumers of any files robosignatory creates can handle the changed file ownership.
Yeah we probably need to talk about this a bit more to see what's possible.
I see beta release target is 2020-03-17. I guess that's OK. One thing we probably need to do is prune some stuff though since we only added a small amount of storage the other day.
2020-03-17
Yeah, we should be able to change robosignatory user to whatever uid/gid we want... it's not used anywhere but on the autosign01 system. That's great assuming other consumers of any files robosignatory creates can handle the changed file ownership.
Yeah, I would think so...
This plan should be ok. It also fixes another issue/thought I had: We should look at hardlinking the two directories to save room. Hardlinking needs the files to have the same checksum and ownership, so moving everything to ftpsync would work there. Yeah we probably need to talk about this a bit more to see what's possible.
ok...
How time sensitive is this? We are currently in freeze and this would need a freeze break... can we wait until after? Or should we go ahead with a freeze break request? I see beta release target is 2020-03-17. I guess that's OK. One thing we probably need to do is prune some stuff though since we only added a small amount of storage the other day.
ok. We could request a freeze break to manually prune? Or if you like I could just grow the volume more?
maybe just add a todo to check it early next week and see if it's about to run out and if it's getting close add some more
OK - one update to https://pagure.io/releng/issue/8811#comment-629051
OK here are some facts: prod repo: files created by new-updates-sync: 263:263 (ftpsync:ftpsync) files created by coreos-ostree-importer: $randomuid:99 runs in openshift with random uid. gid for files created is 99
The nightly branched/rawhide pungi runs also run an ostree pull-local to sync content over in to the prod repo.
I'm going to open an issue to see if we can explore using new-updates-sync there.
What we have done yesterday:
# 20200408 ostree repo work work items for today: - open ticket to investigate moving nightly branched/rawhide to new-updates-sync - @dusty: @dusty will work on this one - @dusty: opened ticket: https://pagure.io/releng/issue/9392 - switch robosig to uid/gid 263:263 - @dusty: @kevin, can you take care of this part? - @dusty: kevin did agree to work on this piece - it's working mostly now, but still needs work on the unit file to start correctly. - consider this done for now. - check to see what user/group the nightly branched/rawhide composes run the OSTree syncing into the prod compose repo as - @mohan says they run as root - @dusty: can we make just the ostree pull run as ftpsync/ftpsync? XXX: MOHAN KEVIN WDYT ^^ - I could give it a try after the current compose is done - @dusty: scratch that - not needed as we just - check to make sure no composes are running (i.e. nothing writing to the repos) - @mohan: Currently there is a rawhide compose running, it will be done at approximately 5 EDT - https://koji.fedoraproject.org/koji/tasks?owner=releng&state=active&view=tree&method=all&order=-id - maybe we should force netapp to take a snapshot here - safety precaution before we run the chmod/chown commands below - run command to switch all directories to setgid - these commands should execute in < 60 seconds - `cd /mnt/koji/compose/ostree/repo` - `(ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chmod g+s` - `cd /mnt/koji/ostree/repo/` - `(ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chmod g+s` - run command to switch all directories 263:263 - these commands should execute in < 60 seconds - `cd /mnt/koji/compose/ostree/repo` - `(ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chown 263:263` - `cd /mnt/koji/ostree/repo/` - `(ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chown 263:263` - run command to switch all files/directories in the repos to 263:263 - unfortunately this will probably take a long time 😭 - but that is OK because the directories are the important ones I think - so we should just be able to let this run and not worry about it too much - `cd /mnt/koji/compose/ostree/repo && chown -R 263:263` - Replaced it with `chown 263:263 -R /mnt/koji/compose/ostree/repo` - `cd /mnt/koji/ostree/repo/ && chown -R 263:263` - Replaced it with `chown 263:263 -R /mnt/koji/ostree/repo`
Can we do one more set of commands to add group write perms on already created directories?
cd /mnt/koji/compose/ostree/repo (ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chmod g+w cd /mnt/koji/ostree/repo/ (ls -d ./objects/*; find . -path ./objects -prune -o -type d) | xargs chmod g+w
Can we do one more set of commands to add group write perms on already created directories? cd /mnt/koji/compose/ostree/repo (ls -d ./objects/; find . -path ./objects -prune -o -type d) | xargs chmod g+w cd /mnt/koji/ostree/repo/ (ls -d ./objects/; find . -path ./objects -prune -o -type d) | xargs chmod g+w
Done
here are a few followup PRs that need to be merged and applied in order for this to be complete:
Now that those two PRs are merged I think we can call this DONE!
Metadata Update from @dustymabe: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Some new files have the wrong permissions:
2021-05-11 21:57:25,201 DEBUG builtins - Running command: ['ostree', '--repo=/mnt/koji/ostree/repo', 'refs'] 2021-05-11 21:57:31,131 DEBUG builtins - Running command: ['ostree', '--repo=/mnt/koji/compose/ostree/repo', 'refs'] 2021-05-11 21:57:33,563 WARNING builtins - Directory /mnt/koji/compose/ostree/repo/refs/heads/fedora/32/aarch64/updates does not have rwx+setgid group permissions! 2021-05-11 21:57:33,564 WARNING builtins - Directory /mnt/koji/compose/ostree/repo/refs/heads/fedora/32/ppc64le/updates does not have rwx+setgid group permissions! 2021-05-11 21:57:33,564 WARNING builtins - Directory /mnt/koji/compose/ostree/repo/refs/heads/fedora/32/x86_64/updates does not have rwx+setgid group permissions! 2021-05-11 21:57:33,564 WARNING builtins - Directory /mnt/koji/compose/ostree/repo/refs/heads/fedora/34/x86_64/updates does not have rwx+setgid group permissions! 2021-05-11 21:57:33,564 WARNING builtins - Directory /mnt/koji/compose/ostree/repo/refs/heads/fedora/34/ppc64le/updates does not have rwx+setgid group permissions! 2021-05-11 21:57:33,564 WARNING builtins - Directory /mnt/koji/compose/ostree/repo/refs/heads/fedora/34/aarch64/updates does not have rwx+setgid group permissions! XXX: Found directories with unexpected permissions/ownership 2021-05-11 21:57:33,566 INFO builtins - Processing messages with topic: org.fedoraproject.prod.coreos.build.request.ostree-import
Let's fix them with something like:
cd /mnt/koji/compose/ostree/repo/refs/heads/fedora find . \! -perm 2775 -type d | xargs chmod --verbose 2775
Thanks @nirik for running 👆
Login to comment on this ticket.