#2315 koji-gc complains about default lockfile location
Opened 3 years ago by kevin. Modified 2 years ago

# sudo -u apache /usr/local/bin/lock-wrapper koji-gc-delete /usr/sbin/koji-gc --debug --action=deleteUsing option ('main', 'keytab') from config file
Using option ('main', 'principal') from config file
Using option ('main', 'krb_rdns') from config file
Using option ('main', 'serverca') from config file
Using option ('main', 'server') from config file
Using option ('main', 'weburl') from config file
Using option ('main', 'smtp_host') from config file
Using option ('main', 'from_addr') from config file
Using option ('main', 'unprotected_keys') from config file
delay: 432000 seconds
grace_period: 2419200 seconds
Traceback (most recent call last):
  File "/usr/sbin/koji-gc", line 1008, in <module>
    lock_fd = os.open(options.lock_file, os.O_CREAT | os.O_RDWR)
FileNotFoundError: [Errno 2] No such file or directory: '/run/user/48/koji-gc.lock'
$ ls -la /run/user/
total 0
drwxr-xr-x.  3 root root  60 Jun 15 19:15 .
drwxr-xr-x. 28 root root 960 Jun  8 06:44 ..
drwx------.  4 root root 100 Jun 15 03:12 0

This is on Fedora 32, so could be some systemd change?


It is not related to F32 - apache is not logged in, so /run/users/48 is not created by pamd. I've not expected that it would be run this way. Do you think, that changing location in koji-gc.conf is ok or do you think we need some more robust default behaviour?

Metadata Update from @tkopecek:
- Custom field Size adjusted to None

3 years ago

I'm not sure what best practice is here for a lockfile on a non daemon process like this. ;(

Perhaps it would make sense for it to use a /var/tmp/ lock file? But then it could be DOSed by someone making a lock there... perhaps /var/tmp/koji-gc/ and make sure it's owned by the correct user or error?

I don't know why apache isn't logged in there, it's running httpd and koji-gc was running as it as well (via sudo)

It looks to be a contested systemd behaviour https://bugzilla.redhat.com/show_bug.cgi?id=967509

@ktdreyer any suggestions? (as you were participating in #1333) :-)

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.22
- Issue tagged with: bug

3 years ago

Kevin, I'm guessing this is part of a cron job in Fedora that runs as "apache" locally? Can you provide more context here? Does the apache UID have access to the keytab that koji-gc uses? I thought Fedora used the oscar user

I think the problem is that we do not clearly document how users ought to enable koji-gc. I added systemd unit files (similar to kojira) in #2199, but we should add instructions to the Server HowTo guide as well so the end-to-end process is clear.

Here is how I do this in my ansible playbook:

- name: Install Koji utils package
  package:
    name: koji-utils
    state: present

- name: copy /etc/koji-gc/koji-gc.conf
  copy:
    src: files/koji-gc.conf
    dest: /etc/koji-gc/koji-gc.conf
    owner: root
    group: root
    mode: 0644

# Submitted upstream at https://pagure.io/koji/issue/2198
- name: install koji-gc systemd unit files
  copy:
    src: "files/{{ item }}"
    dest: "/etc/systemd/system/{{ item }}"
    owner: root
    group: root
    mode: 0644
  notify:
  - reload systemd
  with_items:
  - koji-gc.service
  - koji-gc.timer

- name: patch for GSSAPI support
  include: gssapi.yml

- name: enable koji-gc.timer
  service:
    name: koji-gc.timer
    enabled: true

A note about security here- I don't want to run koji-gc as root, because we run far too many things as root already in Koji. But the systemd files from #2199 do run it as root currently. Eventually I would like to use a dedicated unprivileged "koji-gc" UID or something. Maybe now is the time to land such a change in Koji before it's too hard for the user community to migrate to a non-root UID.

Kevin, I'm guessing this is part of a cron job in Fedora that runs as "apache" locally? Can you provide more context here? Does the apache UID have access to the keytab that koji-gc uses? I thought Fedora used the oscar user

Yep it is:

SCRIPT=/usr/sbin/koji-gc
MAILTO=releng-cron@lists.fedoraproject.org
0 8 * * * apache /usr/local/bin/lock-wrapper koji-gc-delete $SCRIPT --action=delete --lock-file /var/tmp/koji-gc.lock
0 10 * * * apache /usr/local/bin/lock-wrapper koji-gc-prune $SCRIPT --action=prune --lock-file /var/tmp/koji-gc.lock
0 9 * * * apache /usr/local/bin/lock-wrapper koji-gc-trash $SCRIPT --action=trash --lock-file /var/tmp/koji-gc.lock

It has it's own keytab:
-rw-r-----. 1 apache root 778 May 25 17:55 /etc/krb5.koji-gc_koji.fedoraproject.org.keytab

And yes, it's the 'oscar' user.

I think the problem is that we do not clearly document how users ought to enable koji-gc. I added systemd unit files (similar to kojira) in #2199, but we should add instructions to the Server HowTo guide as well so the end-to-end process is clear

...snip...

Well, this exact setup worked fine when we were using rhel7 and python2. When we moved to a new datacenter I moved the hubs to Fedora 32 and python3.
There is likely something with newer systemd where it doesn't create that /run/user/ dir in all cases. Apache is being used to run httpd. If I su to apache, it still doesn't create that dir. Perhaps it's only login users?

A note about security here- I don't want to run koji-gc as root, because we run far too many things as root already in Koji. But the systemd files from #2199 do run it as root currently. Eventually I would like to use a dedicated unprivileged "koji-gc" UID or something. Maybe now is the time to land such a change in Koji before it's too hard for the user community to migrate to a non-root UID.

Yeah, I think a seperate user would be good for this. Of course it needs privs to retag things.

Note, that separate user "on the machine" needn't be separate user "on the hub". You can still use same keytab, so same permissions.

Yes, that directory is now created only in login shells, so it will be always missing for httpd. Maybe ansible with login shell?

I think we should make the following changes:

  1. The koji-utils package should create a new koji-gc unprivileged account on the system.
  2. The koji-gc.service file should run the koji-gc service as the koji-gc service account instead of root.
  3. We should document how users ought to install and configure koji-gc. Specifically:
    1. yum install koji-utils
    2. Place the keytab on the filesystem
    3. Set permissions on keytab so koji-gc can read the file
    4. Enable the systemd unit timer

With systemd timers, users will not need need a lock wrapper or lock file, because systemd will only ever create one instance of the oneshot service. If the systemd oneshot service is still running when the timer rolls around a second time, it does not start a second instance. I think we could drop the use of /usr/local/bin/lock-wrapper. We could also change the behavior of --lock-file so that koji-gc does not use any lock file if the user does not specify the --lock-file option.

@tkopecek what do you think?

Kevin, I have another question. Is there any reason why Fedora runs the three separate koji-gc actions explicitly with three separate cron jobs, instead of just running one single koji-gc invocation?

+1 to that plan.

Kevin, I have another question. Is there any reason why Fedora runs the three separate koji-gc actions explicitly with three separate cron jobs, instead of just running one single koji-gc invocation?

no idea. I see it was checked into puppet that way about 12 years ago by jkeating. No idea why it was that way...

Cool. I will work on this next week.

+1

To separate actions: we had it similarly internally - before major rewrite about 5 versions ago, there were no threads and whole run took many hours. As there were some changes to tags meanwhile, it could easily happen, that process failed somewhere in the middle. We've also run it separately against some groups of tags (e.g. *-candidate). This shouldn't be needed nowadays.

Note to self: The current Fedora packaging guidelines for system users are at https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/

@ktdreyer
I filed #2362 to follow up the system account changes

Metadata Update from @julian8628:
- Issue set to the milestone: 1.23 (was: 1.22)

3 years ago

I'm still working on this. It's tricky because we have no unit tests for this utility. I'm going to refactor the locking code to a separate method, and drop some needless code (specifically we call logout() when we're already logged out).

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.24 (was: 1.23)

3 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.25 (was: 1.24)

3 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.26 (was: 1.25)

2 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.27 (was: 1.26)

2 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: None (was: 1.27)

2 years ago

Login to comment on this ticket.

Metadata