#9958 Planned Outage - Blockerbugs - 2021-05-18 17:00 UTC
Closed: Fixed 2 days ago by kevin. Opened a month ago by frantisekz.

Planned Outage - Blockerbugs - 2021-05-18 17:00 UTC

There will be an outage starting at 2021-05-18 17:00UTC,
which will last approximately 2 hours.

To convert UTC to your local time, take a look at
http://fedoraproject.org/wiki/Infrastructure/UTCHowto
or run:

date -d '2021-05-18 17:00UTC'

Reason for outage:

Host upgrade from Fedora 32 to Fedora 33.

Affected Services:

https://qa.fedoraproject.org/blockerbugs/

Ticket Link:

https://pagure.io/fedora-infrastructure/issue/9958

Please join #fedora-admin or #fedora-noc on irc.freenode.net
or add comments to the ticket for this outage above.


I have proposed a PR for the outage notice for this outage on status.fp.o

https://github.com/fedora-infra/statusfpo/pull/20/files

Once this outage has been confirmed, we can merge this notice, and push the changes live to status.fp.o

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops, outage

a month ago

@frantisekz was this work started / and or completed?

Sorry for the delayed update, the progress is as follows:

blockerbugs production instance has been upgraded to F33 and it is mostly working but we have hit some problems that have yet to be resolved.

Working

  • web interface
  • login

Not Working

  • regular sync with bugzilla (manual sync works fine, it's just the cronjob that isn't working)

There are some issues which appear to be auth-related - the cron job we use to do the regular syncs fails immediately with:

pam_sss(crond:account): Access denied for user blockerbugs: 6 (Permission denied)
(blockerbugs) PAM ERROR (Permission denied)
(blockerbugs) FAILED to authorize user with PAM (Permission denied)

This worked on stg but doesn't work in production and we're really not sure why. The blockerbugs user is a valid fas user in ldap and it should work for this use AFAIK. That being said, the stg instance is using local user and the prod instance is not.

For the moment, there isn't a whole lot of change going on in the blocker bugs. As long as we remember to run a sync a couple of times per day, it's working "well enough" so long as we get the regular sync working before much longer.

I don't think that the ticket is quite complete yet because we're going to have to rebuild the instance again once we get the whole auth thing figured out.

I think at this point it's probibly best to make a local user for it... since we know that works.

I think at this point it's probibly best to make a local user for it... since we know that works.

Whats the status here? I went and cleared the outage from status... didn't seem like it would matter to any users that there was still work to do.

The production instance is syncing now that there's a local user. The issue can now be closed

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 days ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog