From 0fd77e5926d5ea13499e7e1798e1da05a90c96e8 Mon Sep 17 00:00:00 2001 From: Jeremy Cline Date: Feb 13 2017 12:41:58 +0000 Subject: Create a sysadmin guide folder to mirror the dev guide Since we'll probably want to add "Getting started" docs for sysadmins here, I made the sysadmin and dev docs mirror each other in structure. Signed-off-by: Jeremy Cline --- diff --git a/docs/dev-guide/sops.rst b/docs/dev-guide/sops.rst index ab38b78..1560fb0 100644 --- a/docs/dev-guide/sops.rst +++ b/docs/dev-guide/sops.rst @@ -20,7 +20,7 @@ Adding a Standard Operating Procedure ===================================== To add a standard operating procedure, create a new `reStructedText `_ file in the `sop -directory `_ +directory `_ and then add it to the `index file `_. diff --git a/docs/index.rst b/docs/index.rst index 9377d34..67da8a8 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,13 +1,24 @@ -.. Fedora Infrastructure Best Practices documentation master file, created by +.. Fedora Infrastructure documentation master file, created by sphinx-quickstart on Wed Jan 25 17:17:34 2017. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. +=================================== Fedora Infrastructure Documentation =================================== -This documents the standard operating procedures (SOPs) and application best -practices for Fedora Infrastructure applications. +This contains a development and system administration guide for the Fedora +Infrastructure team. + +The development guide covers how to get started with application development +as well as application best practices. You will also find several sample +projects that serve as demonstrations of these best practices and as an +excellent starting point for new projects. + + +The system administration guide covers how to get involved in the system +administration side of Fedora Infrastructure as well as the standard operating +procedures (SOPs) we use. .. toctree:: @@ -15,7 +26,7 @@ practices for Fedora Infrastructure applications. :caption: Contents: dev-guide/index - sops/index + sysadmin-guide/index Indices and tables diff --git a/docs/sops/2-factor.rst b/docs/sops/2-factor.rst deleted file mode 100644 index c127df6..0000000 --- a/docs/sops/2-factor.rst +++ /dev/null @@ -1,102 +0,0 @@ -.. title: Two Factor Auth -.. slug: fas-two-factor -.. date: 2013-09-19 updated: 2016-03-11 -.. taxonomy: Contributors/Infrastructure - -=============== -Two factor auth -=============== - -Fedora Infrastructure has implemented a form of two factor auth for people who -have sudo access on Fedora machines. In the future we may expand this to -include more than sudo but this was deemed to be a high value, low hanging -fruit. - ----------------- -Using two factor ----------------- - -http://fedoraproject.org/wiki/Infrastructure_Two_Factor_Auth - -To enroll a Yubikey, use the fedora-burn-yubikey script like normal. -To enroll using FreeOTP or Google Authenticator, go to -https://admin.fedoraproject.org/totpcgiprovision - -What's enough authentication? -============================= -FAS Password+FreeOTP or FAS Password+Yubikey -Note: don't actually enter a +, simple enter your FAS Password and press your -yubikey or enter your FreeOTP code. - ---------------------------------------------- -Administrating and troubleshooting two factor ---------------------------------------------- - -Two factor auth is implemented by a modified copy of the -https://github.com/mricon/totp-cgi project doing the authentication and -pam_url submitting the authentication tokens. - -totp-cgi runs on the fas servers (currently fas01.stg and fas01/fas02/fas03 in -production), listening on port 8443 for pam_url requests. - -FreeOTP, Google authenticator and yubikeys are supported as tokens to use with -your password. - -FreeOTP, Google authenticator: -============================== - -FreeOTP application is preferred, however Google authenticator works as well. -(Note that Google authenticator is not open source) - -This is handled via totpcgi. There's a command line tool to manage users, -totpprov. See 'man totpprov' for more info. Admins can use this tool to revoke -lost tokens (google authenticator only) with 'totpprov delete-user username' - -To enroll using FreeOTP or Google Authenticator for production machines, go to -https://admin.fedoraproject.org/totpcgiprovision - -To enroll using FreeOTP or Google Authenticator for staging machines, go to -https://admin.stg.fedoraproject.org/totpcgiprovision/ - -You'll be prompted to login with your fas username and password. - -Note that staging and production differ. - -YubiKeys: -========= - -Yubikeys are enrolled and managed in FAS. Users can self-enroll using the -fedora-burn-yubikey utility included in the fedora-packager package. - -What do I do if I lose my token? -================================ -Send an email to admin@fedoraproject.org that is encrypted/signed with your -gpg key from FAS, or otherwise identifies you are you. - -How to remove a token (so the user can re-enroll)? -================================================== -First we MUST verify that the user is who they say they are, using any of the -following: - -- Personal contact where the person can be verified by member of - sysadmin-main. - -- Correct answers to security questions. - -- Email request to admin@fedoraproject.org that is gpg encrypted by the key - listed for the user in fas. - -Then: - -1. For google authenticator, login to one of the fas machines and run: -sudo totpprov delete-user username - -2. For yubikey: login to one of the fas machines and run: -/usr/local/bin/yubikey-remove.py username - -The user can then go to https://admin.fedoraproject.org/totpcgiprovision/ -and reprovision a new device. - -If the user emails admin@fedoraproject.org with the signed request, make sure -to reply to all indicating that a reset was performed. This is so that other -admins don't step in and reset it again after its been reset once. diff --git a/docs/sops/accountdeletion.rst b/docs/sops/accountdeletion.rst deleted file mode 100644 index 7d31123..0000000 --- a/docs/sops/accountdeletion.rst +++ /dev/null @@ -1,278 +0,0 @@ -.. title: Account Deletion SOP -.. slug: infra-fas-account-deletion -.. date: 2013-05-08 -.. taxonomy: Contributors/Infrastructure - -==================== -Account Deletion SOP -==================== - -For the most part we do not delete accounts. In the case that a deletion -is paramount, it will need to be coordinated with appropriate entities. - -Disabling accounts is another story but is limited to those with the -appropriate privileges. Reasons for accounts to be disabled can be one of -the following: - - * Person has placed SPAM on the wiki or other sites. - * It is seen that the account has been compromised by a third party. - * A person wishes to leave the Fedora Project and wants the account - disabled. - -Contents --------- - -* Disabling - - - Disable Accounts - - 1.2 Disable Groups - -* User Requested disables - -* Renames - - - Rename Accounts - - Rename Groups - -* Deletion - - - Delete Accounts - - Delete Groups - - -Disable -======= - -Disabling accounts is the easiest to accomplish as it just blocks people -from using their account. It does not remove the account name and -associated UID so we don't have to worry about future, unintentional -collisions. - -Disable Accounts ----------------- - -To begin with, accounts should not be disabled until there is a ticket in -the Infrastructure ticketing system. After that the contents inside the -ticket need to be verified (to make sure people aren't playing pranks or -someone is in a crappy mood). This needs to be logged in the ticket (who -looked, what they saw, etc). Then the account can be disabled.:: - - ssh db02 - sudo -u postgres pqsql fas2 - - fas2=# begin; - fas2=# select * from people where username = 'FOOO'; - - -Here you need to verify that the account looks right, that there is only -one match, or other issues. If there are multiple matches you need to -contact one of the main sysadmin-db's on how to proceed.:: - - fas2=# update people set status = 'admin_disabled' where username = 'FOOO'; - fas2=# commit; - fas2=# /q - -Disable Groups --------------- - -There is no explicit way to disable groups in FAS2. Instead, we close the -group for adding new members and optionally remove existing members from -it. This can be done from the web UI if you are an administrator of the -group or you are in the accounts group. First, go to the group info page. -Then click the (edit) link next to Group Details. Make sure that the -Invite Only box is checked. This will prevent other users from requesting -the group on their own. - -If you want to remove the existing users, View the Group info, then click -on the View Member List link. Click on All under the Results heading. Then -go through and click on Remove for each member. - -Doing this in the database instead can be quicker if you have a lot of -people to remove. Once again, this requires someone in sysadmin-db to do -the work:: - - ssh db02 - sudo -u postgres pqsql fas2 - - fas2=# begin; - fas2=# update group, set invite_only = true where name = 'FOOO'; - fas2=# commit; - fas2=# begin; - fas2=# select p.name, g.name, r.role_status from people as p, person_roles as r, groups as g - where p.id = r.person_id and g.id = r.group_id - and g.name = 'FOOO'; - fas2=# -- Make sure that the list of users in the groups looks correct - fas2=# delete from person_roles where person_roles.group_id = (select id from groups where g.name = 'FOOO'); - fas2=# -- number of rows in both of the above should match - fas2=# commit; - fas2=# /q - -User Requested Disables -======================= - -According to our Privacy Policy, a user may request that their personal -information from FAS if they want to disable their account. We can do this -but need to do some extra work over simply setting the account status to -disabled. - -Record User's CLA information ------------------------------ - -If the user has signed the CLA/FPCA, then they may have contributed something -to Fedora that we'll need to contact them about at a later date. For that, we -need to keep at least the following information: - -* Fedora username -* human name -* email address - -All of this information should be on the CLA email that is sent out when a -user signs up. We need to verify with spot (Tom Callaway) that he has that -record. If not, we need to get it to him. Something like:: - - select id, username, human_name, email, telephone, facsimile, postal_address from people where username = 'USERNAME'; - -and send it to spot to keep. - -Remove the personal information -------------------------------- - -The following sequence of db commands should do it:: - - fas2=# begin; - fas2=# select * from people where username = 'USERNAME'; - -Here you need to verify that the account looks right, that there is only -one match, or other issues. If there are multiple matches you need to -contact one of the main sysadmin-db's on how to proceed.:: - - fas2=# update people set human_name = '', gpg_keyid = null, ssh_key = null, unverified_email = null, comments = null, postal_address = null, telephone = null, facsimile = null, affiliation = null, ircnick = null, status = 'inactive', locale = 'C', timezone = null, latitude = null, longitude = null, country_code = null, email = 'disabled1@fedoraproject.org' where username = 'USERNAME'; - -Make sure only one record was updated:: - - fas2=# select * from people where username = 'USERNAME'; - -Make sure the correct record was updated:: - - fas2=# commit; - -.. note:: The email address is both not null and unique in the database. Due - to this, you need to set it to a new string for every user who requests - deletion like this. - -Renames -======= -In general, renames do not require as much work as deletions but they -still require coordination. This is because renames do not change the -UID/GID but some of our applications save information based on -username/groupname rather than UID/GID. - -Rename Accounts ---------------- - -.. warning:: Needs more eyes - This list may not be complete. - -* Check the databases for koji, pkgdb, and bodhi for occurrences of the - old username and update them to the new username. -* Check fedorapeople.org for home directories and yum repositories under - the old username that would need to be renamed -* Check (or ask the user to check and update) mailing list subscriptions - on fedorahosted.org and lists.fedoraproject.org under the old - username@fedoraproject.org email alias -* Check whether the user has a username@fedoraproject.org bugzilla - account in python-fedora and update that. Also ask the user to update - that in bugzilla. -* If the user is in a sysadmin-* group, check for home directories on - bastion and other infrastructure boxes that are owned byt them and - need to be renamed (Could also just tell the user to backup any files - there themselves b/c they're getting a new home directory). -* grep through ansible for occurrences of the username -* Check for entries in trac on fedorahosted.org for the username as an - "Assigned to" or "CC" entry. -* Add other places to check here - -Rename Groups -------------- - -.. warning:: Needs more eyes - This list may not be complete. - -* grep through ansible for occurrences of the group name. -* Check for group-members,group-admins,group-sponsors@fedoraproject.org - email alias presence in any fedorahosted.org or - lists.fedoraproject.org mailing list -* Check for entries in trac on fedorahosted.org for the username as an - "Assigned to" or "CC" entry. -* Add other places to check here - -Deletion -======== - -Deletion is the toughest one to audit because it requires that we look -through our systems looking for the UID and GID in addition to looking for -the username and password. The UID and GID are used on things like -filesystem permissions so we have to look there as well. Not catching -these places may lead to security issus should the UID/GID ever be reused. - -.. note:: Recommended to rename instead - When not strictly necessary to purge all traces of an account, it's - highlyrecommended to rename the user or group to something like - DELETED_oldusername instead of deleting. This avoids the problems and - additional checking that we have to do below. - -Delete Accounts ---------------- - -.. warning:: Needs more eyes - This list may be incomplete. Needs more people to look at this and find - places that may need to be updated - -* Check everything for the #Rename Accounts case. -* Figure out what boxes a user may have had access to in the past. This - means you need to look at all the groups a user may ever have been - approved for (even if they are not approved for those groups now). For - instance, any git*, svn*, bzr*, hg* groups would have granted access - to hosted03 and hosted04. packager would have granted access to - pkgs.fedoraproject.org. Pretty much any group grants access to - fedorapeople.org. -* For those boxes, run a find over the files there to see if the UID - owns any files on the system:: - - # find / -uid 100068 -print - - Any files owned by that uid must be reassigned to another user or - removed. - -.. warning:: What to do about backups? - Backups pose a special problem as they may contain the uid that's being - removed. Need to decide how to handle this - -* Add other places to check here - -Delete Groups -------------- - -.. warning:: Needs more eyes - This list may be incomplete. Needs more people to look at this and find - places that may need to be updated - -* Check everything for the #Rename Groups case. -* Figure out what boxes may have had files owned by that group. This - means that you'd need to look at the users in that group, what boxes - they have shell accounts on, and then look at those boxes. groups used - for hosted would also need to add hosted03 and hosted04 to that list - and the box that serves the hosted mailing lists. -* For those boxes, run a find over the files there to see if the GID - owns any files on the system:: - - # find / -gid 100068 -print - - Any files owned by that GID must be reassigned to another group or - removed. - -.. warning:: What to do about backups? - Backups pose a special problem as they may contain the gid that's being - removed. Need to decide how to handle this - -* Add other places to check here diff --git a/docs/sops/anitya.rst b/docs/sops/anitya.rst deleted file mode 100644 index 7e01e29..0000000 --- a/docs/sops/anitya.rst +++ /dev/null @@ -1,155 +0,0 @@ -.. title: Anitya Infrastructure SOP -.. slug: infra-anitya -.. date: 2016-11-30 -.. taxonomy: Contributors/Infrastructure - -========================= -Anitya Infrastructure SOP -========================= - -Anitya is used by Fedora to track upstream project releases and maps them -to downstream distribution packages, including (but not limited to) Fedora. - -Anitya production instance: https://release-monitoring.org - -Anitya project page: https://github.com/fedora-infra/anitya - -Contents -======== - -1. Contact Information -2. Building and Deploying a Release -3. Administrating release-monitoring.org - - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, #fedora-apps -Persons - pingou, jcline -Location - ? -Servers - anitya-backend01.vpn.fedoraproject.org - anitya-frontend01.vpn.fedoraproject.org -Purpose - Map upstream releases to Fedora packages. - -Hosts -===== -The current deployment is made up of two hosts, anitya-backend01 and -anitya-frontend01. - -anitya-frontend01 ------------------ -This host runs: - -- The apache/mod_wsgi application for release-monitoring.org - -- A fedmsg-relay instance for anitya's local fedmsg bus - -This host relies on: - -- A postgres db server running on anitya-backend01 - -- Lots of external third-party services. The anitya webapp can scrape - pypi, rubygems.org, sourceforge and many others on command. - -Things that rely on this host: -- The Fedora Infrastructure bus subscribes to the anitya bus published - here by the local fedmsg-relay daemon at tcp://release-monitoring.org:9940 - -- the-new-hotness is a fedmsg-hub plugin running in FI on hotness01. It - listens for anitya messages from here and performs actions on koji and - bugzilla. - -- anitya-backend01 expects to publish fedmsg messages via - anitya-frontend01's fedmsg-relay daemon. Access should be restricted by - firewall. - -anitya-backend01 ----------------- -This is responsible for running the anitya backend cronjobs. It also is -the host for the Anitya PostgreSQL database server. - -The services and jobs on this host are: -- A cronjob that retrieves all projects from the PostgreSQL database and - checks the upstream project to see if there's a new version. This is run - every 12 hours. - -- A PostgreSQL database server to be used by that cron job and by - anitya-frontend01. - -- A database backup job that runs daily. Database dumps are available at - `the normal database dump location - `_. - -This host relies on: -- The fedmsg-relay daemon running on anitya-frontend01. - -- Lots of external third-party services. The cronjob makeall kinds of - requests out to the Internet that can fail in various ways. - -Things that rely on this host: -- The webapps running on anitya-frontend01 relies on the postgres db - server running on this node. - - -Releasing -========= - -The first step to making a new release is creating a Git tag for the release. - -Building --------- -After `upstream `_ tags a new release in Git, a new -release can be built. The specfile is stored in the `Anitya repository -`_. Refer to the -`Infrastructure repo SOP `_ -to learn how to build the RPM. - -Deploying ---------- -At the moment, there is no staging deployment of Anitya. - -Once the new version is built, it needs to be deployed. To deploy the new version, you need -`ssh access `_ to -batcave01.phx2.fedoraproject.org and `permissions to run the Ansible playbook -`_. - -All the following commands should be run from batcave01. - -Configuration -^^^^^^^^^^^^^ -First, ensure there are no configuration changes required for the new update. If there are, -update the Ansible anitya role(s) and run the deployment playboook:: - - $ sudo rbac-playbook groups/anitya.yml - -Packages -^^^^^^^^ -Both anitya-backend01 and anitya-frontend01 need the new package. To upgrade, run -the upgrade playbook:: - - $ sudo rbac-playbook manual/upgrade/anitya.yml - -This will upgrade the anitya package, perform any database migrations with Alembic, -and restart the Apache web server. - -Congratulations! The new version should now be deployed. - - -Administrating release-monitoring.org -===================================== -Anitya offers some tools to administer the web application itself. These are useful -for when users accidentally create duplicate projects, versions found get messed up, -etc. - -Flags -^^^^^ -Anitya lets users flag projects for administrator attention. This is accessible to -administrators in the `flags tab `_. diff --git a/docs/sops/ansible.rst b/docs/sops/ansible.rst deleted file mode 100644 index f3ecc96..0000000 --- a/docs/sops/ansible.rst +++ /dev/null @@ -1,175 +0,0 @@ -.. title: Ansible Infrastructure SOP -.. slug: infra-ansible -.. date: 2015-03-03 -.. taxonomy: Contributors/Infrastructure - -======================================= -Ansible infrastructure SOP/Information. -======================================= - -Background -========== - -Fedora infrastructure used to use func and puppet for system change management. -We are now using ansible for all system change mangement and ad-hoc tasks. - -Overview -======== - -Ansible runs from batcave01 or backup01. These hosts run a ssh-agent that -has unlocked the ansible root ssh private key. (This is unlocked manually -by a human with the passphrase each reboot, the passphrase itself is not -stored anywhere on the machines). Using 'sudo -i' sysadmin-main members -can use this agent to access any machines with the ansible root ssh public -key setup, either with 'ansible' for one-off commands or 'ansible-playbook' -to run playbooks. - -Playbooks are idempotent (or should be). Meaning you should be able to re-run -the same playbook over and over and it should get to a state where 0 items -are changing. - -Additionally (see below) there is a rbac wrapper that allows members of some -other groups to run playbooks against specific hosts. - -git repo(s) ------------ - -There are 2 git repositories associated with ansible: - -/git/ansible on batcave01. - This is a public repository. Never commit private data to this repo. - You can access it also via a cgit web interface at: - https://infrastructure.fedoraproject.org/cgit/ansible.git/ - You can check it out on batcave01 with: 'git clone /git/ansible' - You can also use it remotely if you have your ssh set to proxy your access - via bastion01: ``git clone ssh://batcave01/git/ansible`` - - Users in the 'sysadmin' group have commit access to this repo. - All commits are emailed to 'sysadmin-members' as well as announced - on IRC in #fedora-noc. - -/git/ansible-private on batcave01. - This is a private repository for passwords and other sensitive data. - It is not available in cgit, nor should it be cloned or copied remotely. - It's only available to members of 'sysadmin-main'. - -Cron job/scheduled runs ------------------------ - -With use of run_ansible-playbook_cron.py that is run daily via cron we walk through -playbooks and run them with `--check --diff` params to perform a dry-run. - -This way we make sure all the playbooks are idempotent and there is no -unexpected changes on servers (or playbooks). - -Logging -------- - -We have in place a callback plugin that stores history for any ansible-playbook runs -and then sends a report each day to sysadmin-logs-members with any CHANGED or FAILED -actions. Additionally, there's a fedmsg plugin that reports start and end of ansible -playbook runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of when -and what commands and playbooks were run. - -role based access control for playbooks ---------------------------------------- - -There's a wrapper script on batcave01 called 'rbac-playbook' that allows non sysadmin-main -members to run specific playbooks against specific groups of hosts. This is part of the -ansible_utils package. The upstream for ansible_utils is: https://bitbucket.org/tflink/ansible_utils - -To add a new group: - -1. add the playbook name and sysadmin group to the rbac-playbook (ansible-private repo) -2. add that sysadmin group to sudoers on batcave01 (also in ansible-private repo) - -To use the wrapper:: - -sudo rbac-playbook playbook.yml - -Directory setup -================ - -Inventory ---------- - -The inventory directory tells ansible all the hosts that are managed by it and -the groups they are in. All files in this dir are concatenated together, so you -can split out groups/hosts into separate files for readability. They are in ini -file format. - -Additionally under the inventory directory are host_vars and group_vars subdirectories. -These are files named for the host or group and containing variables to set -for that host or group. You should strive to set variables in the highest level -possible, and precedence is in: global, group, host order. - -Vars ----- - -This directory contains global variables as well as OS specific variables. Note that -in order to use the OS specific ones you must have 'gather_facts' as 'True' or ansible -will not have the facts it needs to determine the OS. - -Roles ------ - -Roles are a collection of tasks/files/templates that can be used on any host or group -of hosts that all share that role. In other words, roles should be used except in cases -where configuration only applies to a single host. Roles can be reused between hosts and -groups and are more portable/flexable than tasks or specific plays. - -Scripts -------- - -In the ansible git repo under scripts are a number of utilty scripts for sysadmins. - -Playbooks ---------- - -In the ansible git repo there's a directory for playbooks. The top level contains -utility playbooks for sysadmins. These playbooks perform one-off functions or gather -information. Under this directory are hosts and groups playbooks. These playbooks are -for specific hosts and groups of hosts, from provision to fully configured. You should -only use a host playbook in cases where there will never be more than one of that thing. - -Tasks ------ - -This directory contains one-off tasks that are used in playbooks. Some of these should -be migrated to roles (we had this setup before roles existed in ansible). Those that -are truely only used on one host/group could stay as isolated tasks. - -Syntax ------- - -Ansible now warns about depreciated syntax. Please fix any cases you see related to -depreciation warnings. - -Templates use the jinja2 syntax. - -Libvirt virtuals -================ -* TODO: add steps to make new libvirt virtuals in staging and production -* TODO: merge in new-hosts.txt - -Cloud Instances -=============== -* TODO: add how to make new cloud instances -* TODO: merge in from ansible README file. - -rdiff-backups -============= -see: https://infrastructure.fedoraproject.org/infra/docs/rdiff-backup.rst - -Additional Reading/Resources -============================ - -Upstream docs: - https://docs.ansible.com/ - -Example repo with all kinds of examples: - * https://github.com/ansible/ansible-examples - * https://gist.github.com/marktheunissen/2979474 - -Jinja2 docs: - http://jinja.pocoo.org/docs/ diff --git a/docs/sops/apps-fp-o.rst b/docs/sops/apps-fp-o.rst deleted file mode 100644 index 571d2cf..0000000 --- a/docs/sops/apps-fp-o.rst +++ /dev/null @@ -1,38 +0,0 @@ -.. title: apps.fedoraproject.org SOP -.. slug: infra-apps-fp-o -.. date: 2014-06-29 -.. taxonomy: Contributors/Infrastructure - -apps-fp-o SOP -============= - -Updating and maintaining the landing page at https://apps.fedoraproject.org/ - -Contact Information -------------------- - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-apps, #fedora-admin -Servers: - proxy0* -Purpose: - Have a nice landing page for all our webapps. - -Description ------------ - -We have a number of webapps, many of which our users don't know about. This -page was created so there was a central place where users could stumble -through them and learn. - -The page is generated by a ansible role in ansible/roles/apps-fp-o/ -It makes use of an RPM package, the source code for which is at -https://github.com/fedora-infra/apps.fp.o - -You can update the page by updating the apps.yaml file in that ansible -module. - -When ansible is run next, the two ansible handlers should see your -changes and regenerate the static html and json data for the page. diff --git a/docs/sops/archive-old-fedora.rst b/docs/sops/archive-old-fedora.rst deleted file mode 100644 index f23f363..0000000 --- a/docs/sops/archive-old-fedora.rst +++ /dev/null @@ -1,81 +0,0 @@ -.. title: How to Archive Old Fedora Releases. -.. slug: archive-old-fedora -.. date: 2016-04-08 updated: 2016-04-08 -.. taxonomy: Releng/Infrastructure - -==================================== - How to Archive Old Fedora Releases -==================================== - -The Fedora download servers contain terabytes of data, and to allow -for mirrors to not have to take all of that data, infrastructure -regularly moves data of end of lifed releases (from /pub/fedora/linux) -to the archives section (/pub/archive/fedora/linux) - -Steps Involved -============== - -1. log into batcave01.phx2.fedoraproject.org and ssh to bodhi-backend01 - -2. Then change into the releases directory. - - cd /pub/fedora/linux/releases - -4. Check to see that the target directory doesn't already exist. - - ls /pub/archive/fedora/linux/releases/ - -5. If the target directory does not already exist, do a recursive link - copy of the tree you want to the target - - cp -lvpnr 21 /pub/archive/fedora/linux/releases/21 - -6. If the target directory already exists, then we need to do a - recursive rsync to update any changes in the trees since the - previous copy. - - rsync -avSHP --delete ./21/ /pub/archive/fedora/linux/releases/21/ - -7. We now do the updates and updates/testing in similar ways. - - cd ../updates/ - cp -lpnr 21 /pub/archive/fedora/linux/updates/21 - cd testing - cp -lpnr 21 /pub/archive/fedora/linux/updates/testing/21 - - cd ../updates/ - rsync -avSHP 21/ /pub/archive/fedora/linux/updates/21/ - cd testing - rsync -avSHP 21/ /pub/archive/fedora/linux/updates/testing/21/ - -8. Announce to the mirror list this has been done and that in 2 weeks - you will move the old trees to archives. - -9. In two weeks, log into mm-backend01 and run the archive script - - sudo -u mirrormanager mm2_move-to-archive --originalCategory="Fedora Linux" - \--archiveCategory="Fedora Archive" --directoryRe='/21/Everything' - -10. If there are problems, the postgres DB may have issues and so you need to - get a DBA to update the backend to fix items. - -11. Wait an hour or so then you can remove the files from the main tree. - - ssh bodhi-backend01 - cd /pub/fedora/linux - cd releases/21 - ls # make sure you have stuff here - rm -rf * - ln ../20/README . - cd ../../updates/21 - ls # make sure you have stuff here - rm -rf * - ln ../20/README . - cd ../testing/21 - ls # make sure you have stuff here - rm -rf * - ln ../20/README . - - - - diff --git a/docs/sops/arm.rst b/docs/sops/arm.rst deleted file mode 100644 index beb3118..0000000 --- a/docs/sops/arm.rst +++ /dev/null @@ -1,199 +0,0 @@ -.. title: Fedora ARM Infrastructure -.. slug: infra-arm -.. date: 2015-03-24 -.. taxonomy: Contributors/Infrastructure - -========================= -Fedora ARM Infrastructure -========================= - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main, sysadmin-releng -Location - Phoenix -Servers - arm01, arm02, arm03, arm04 -Purpose - Information on working with the arm SOCs - -Description -=========== - -We have 4 arm chassis in phx2, each containing 24 SOCs (System On Chip). - -Each chassis has 2 physical network connections going out from it. -The first one is used for the management interface on each SOC. -The second one is used for eth0 for each SOC. - -Current allocations (2016-03-11): - -arm01 - one retrace instance, the rest primary builders attached to koji.fedoraproject.org -arm02 - primary arch builders attached to koji.fedoraproject.org -arm03 - In cloud network, public qa/packager and copr instances -arm04 - primary arch builders attached to koji.fedoraproject.org - -Hardware Configuration -======================= - -Each SOC: - -* Has eth0 and eth1 (unused) and a management interface. -* has 4 cores -* Has 4GB ram -* Has a 300GB disk - -SOCs are addressed by:: - - arm{Chassisnumber}-builder{number}.arm.fedoraproject.org - -Where Chassisnumber is 01 to 04 -and -number is 00-23 - -PXE installs -============ -Kickstarts for the machines are in the kickstarts repo. - -PXE config is on noc01. (or cloud-noc01.cloud.fedoraproject.org for arm03) - -The kickstart installs the latests Fedora and sets them up with a base package set. - -IPMI tool Management -==================== - -The SOCs are managed via their mgmt interfaces using a custom ipmitool -as well as a custom python script called 'cxmanage'. The ipmitool changes -have been submitted upstream and cxmanage is under review in Fedora. - -The ipmitool is currently installed on noc01 and it has ability to -talk to them on their management interface. noc01 also serves dhcp and -is a pxeboot server for the SOCs. - -However you will need to add it to your path:: - - export PATH=$PATH:/opt/calxeda/bin/ - -Some common commands: - -To set the SOC to boot the next time only with pxe:: - - ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org chassis bootdev pxe - -To set the SOC power off:: - - ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org power off - -To set the SOC power on:: - - ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org power on - -To get a serial over lan console from the SOC:: - - ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org -I lanplus sol activate - -DISK mapping -============ - -Each SOC has a disk. They are however mapped to the internal 00-23 in a non -direct manner:: - - HDD Bay EnergyCard SOC (Port 1) SOC Num - 0 0 3 03 - 1 0 0 00 - 2 0 1 01 - 3 0 2 02 - 4 1 3 07 - 5 1 0 04 - 6 1 1 05 - 7 1 2 06 - 8 2 3 11 - 9 2 0 08 - 10 2 1 09 - 11 2 2 10 - 12 3 3 15 - 13 3 0 12 - 14 3 1 13 - 15 3 2 14 - 16 4 3 19 - 17 4 0 16 - 18 4 1 17 - 19 4 2 18 - 20 5 3 23 - 21 5 0 20 - 22 5 1 21 - 23 5 2 22 - -Looking at the system from the front, the bay numbering starts from left to -right. - -cxmanage -======== - -The cxmanage tool can be used to update firmware or gather diag info. - -Until cxmanage is packaged, you can use it from a python virtualenv:: - - virtualenv --system-site-packages cxmanage - cd cxmanage - source bin/activate - pip install --extra-index-url=http://sources.calxeda.com/python/packages/ cxmanage - - deactivate - -Some cxmanage commands - -:: - - cxmanage sensor arm03-builder00-mgmt.arm.fedoraproject.org - Getting sensor readings... - 1 successes | 0 errors | 0 nodes left | . - - MP Temp 0 - arm03-builder00-mgmt.arm.fedoraproject.org: 34.00 degrees C - Minimum : 34.00 degrees C - Maximum : 34.00 degrees C - Average : 34.00 degrees C - ... (and about 20 more sensors)... - -:: - - cxmanage info arm03-builder00-mgmt.arm.fedoraproject.org - Getting info... - 1 successes | 0 errors | 0 nodes left | . - - [ Info from arm03-builder00-mgmt.arm.fedoraproject.org ] - Hardware version : EnergyCard X04 - Firmware version : ECX-1000-v2.1.5 - ECME version : v0.10.2 - CDB version : v0.10.2 - Stage2boot version : v1.1.3 - Bootlog version : v0.10.2 - A9boot version : v2012.10.16-3-g66a3bf3 - Uboot version : v2013.01-rc1_cx_2013.01.17 - Ubootenv version : v2013.01-rc1_cx_2013.01.17 - DTB version : v3.7-4114-g34da2e2 - -firmware update:: - - cxmanage --internal-tftp 10.5.126.41:6969 --all-nodes fwupdate package ECX-1000_update-v2.1.5.tar.gz arm03-builder00-mgmt.arm.fedoraproject.org - -(note that this runs against the 00 management interface for the chassis and -updates all the nodes), and that we must run a tftpserver on port 6969 for -firewall handling. - -Links -====== -http://sources.calxeda.com/python/packages/cxmanage/ - -Contacts -========= -help.desk@boston.co.uk is the contact to send repair requests to. diff --git a/docs/sops/askbot.rst b/docs/sops/askbot.rst deleted file mode 100644 index 29742a1..0000000 --- a/docs/sops/askbot.rst +++ /dev/null @@ -1,315 +0,0 @@ -.. title: Ask Fedora SOP -.. slug: infra-ask-fedora -.. date: 2015-03-28 -.. taxonomy: Contributors/Infrastructure - -============== -Ask Fedora SOP -============== - -To set up http://ask.fedoraproject.org based on Askbot as a question and -answer support forum for the Fedora community. A devel instance could be -seen at http://ask01.dev.fedoraproject.org and the staging instance is at -http://ask.stg.fedoraproject.org/ - -This page describes how to set up and customize it from scratch. - -Contents -======== - -1. Contact Information -2. Creating database -3. Setting up the forum -4. Adding administrators -5. Change settings within the forum -6. Database tweaks -7. Debugging - - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Persons - mether pjp -Sponsor - nirik -Location - phx2 -Servers - ask01 , ask01.stg, ask01.dev -Purpose - To host Ask Fedora - - -Creating database -================= - -We use the postgresql database backend. To add the database to a -postgresql server:: - - # psql -U postgres - postgres# create user askfedora with password 'xxx'; - postgres# create database askfedora; - postgres# ALTER DATABASE askfedora owner to askfedora; - postgres# \q; - -Now setup the db tables if this is a new install:: - - python manage.py syncdb - python manage.py migrate askbot - python manage.py migrate django_authopenid #embedded login application - - -Setting up the forum -==================== - -Askbot is packaged and available in Rawhide, Fedora 16 and EPEL 6. On a -RHEL 6 system, you need to install EPEL 6 repo first.:: - - # yum install askbot - -The /etc/askbot/sites/ask/conf/settings.py file should look something -like:: - - DATABASE_ENGINE = 'postgresql_psycopg2' - DATABASE_NAME = 'testaskbot' - DATABASE_USER = 'askbot' - DATABASE_PASSWORD = 'xxxxx' - DATABASE_HOST = '127.0.0.1' - DATABASE_PORT = '5432' - - # Outgoing mail server settings - # - DEFAULT_FROM_EMAIL = 'askfedora@fedoraproject.org' - EMAIL_SUBJECT_PREFIX = '[Askfedora]' - EMAIL_HOST='127.0.0.1' - EMAIL_PORT='25' - - # This variable points to the Askbot plugin which will be used for user - # authentication. Not enabled yet because we don't need FAS auth but use - # Fedora id as a openid provider. - # - # ASKBOT_CUSTOM_AUTH_MODULE = 'authfas' - - Now Ask Fedora website should be accessible from the browser. - - -Adding administrators -===================== - -As of Askbot version 0.7.21, the first user who logs in automatically -becomes the administrator. In previous versions, you have to do the -following.:: - - # cd /etc/askbot/sites/ask/conf/ - # python manage.py add_admin 1 - Do you really wish to make user (id=1, name=pjp) a site administrator? - yes/no: yes - -Once a user is marked as a administrator, he or she can go into anyone's -profile, go the "moderation" tab in the end and mark them as administrator -or moderator as well as block or suspend a user. - - -Change settings within the forum -================================ - -* Data entry and display: - - Disable "Allow asking questions anonymously" - - Enable "Force lowercase the tags" - - Change "Format of tag list" to "cloud" - - Change "Minimum length of search term for Ajax search" to "3" - - Change "Number of questions to list by default" to "50" - - Change "What should "unanswered question" mean?" to "Question has no - - answers" - -* Email and email alert settings - - Change "Default news notification frequency" to "Instantly" - -* Flatpages - about, privacy policy, etc. - Change "Text of the Q&A forum About page (html format)" to the following:: - - Ask Fedora provides a community edited knowledge base and support forum - for the Fedora community. Make sure you read the FAQ and search for - existing questions before asking yours. If you want to provide feedback, - just a question in this site! Tag your questions "meta" to highlight your - questions to the administrators of Ask Fedora. - -* Login provider settings - - Disable "Activate local login" - -* Q&A forum website parameters and urls - - Change "Site title for the Q&A forum" to "Ask Fedora: Community Knowledge - Base and Support Forum" - - Change "Comma separated list of Q&A site keywords" to "Ask Fedora, forum, - community, support, help" - - Change "Copyright message to show in the footer" to "All content is under - Creative Commons Attribution Share Alike License. Ask Fedora is community - maintained and Red Hat or Fedora Project is not responsible for content" - - Change "Site description for the search engines" to "Ask Fedora: Community - Knowledge Base and Support Forum" - - Change "Short name for your Q&A forum" to "Ask Fedora" - - Change "Base URL for your Q&A forum, must start with http or https" to - "http://ask.fedoraproject.org" - -* Sidebar widget settings - main page - - Disable "Show avatar block in sidebar" - - Disable "Show tag selector in sidebar" - -* Skin and User Interface settings - - Upload "Q&A site logo" - - Upload "Site favicon". Must be a ICO format file because that is the only one IE supports as a fav icon. - - Enable "Apply custom style sheet (CSS)" - - Upload the following custom CSS:: - - #ab-main-nav a { - color: #333333; - background-color: #d8dfeb; - border: 1px solid #888888; - border-bottom: none; - padding: 0px 12px 3px 12px; - height: 25px; - line-height: 30px; - margin-right: 10px; - font-size: 18px; - font-weight: 100; - text-decoration: none; - display: block; - float: left; - } - - #ab-main-nav a.on { - height: 24px; - line-height: 28px; - border-bottom: 1px solid #0a57a4; - border-right: 1px solid #0a57a4; - border-top: 1px solid #0a57a4; - border-left: 1px solid #0a57a4; /*background:#A31E39; */ - background: #0a57a4; - color: #FFF; - font-weight: 800; - text-decoration: none - } - - #ab-main-nav a.special { - font-size: 18px; - color: #072b61; - font-weight: bold; - text-decoration: none; - } - - /* tabs stuff */ - .tabsA { float: right; } - .tabsC { float: left; } - - .tabsA a.on, .tabsC a.on, .tabsA a:hover, .tabsC a:hover { - background: #fff; - color: #072b61; - border-top: 1px solid #babdb6; - border-left: 1px solid #babdb6; - border-right: 1px solid #888a85; - border-bottom: 1px solid #888a85; - height: 24px; - line-height: 26px; - margin-top: 3px; - } - - .tabsA a.rev.on, tabsA a.rev.on:hover { - padding: 0px 2px 0px 7px; - } - - .tabsA a, .tabsC a{ - background: #f9f7eb; - border-top: 1px solid #eeeeec; - border-left: 1px solid #eeeeec; - border-right: 1px solid #a9aca5; - border-bottom: 1px solid #888a85; - color: #888a85; - display: block; - float: left; - height: 20px; - line-height: 22px; - margin: 5px 0 0 4px; - padding: 0 7px; - text-decoration: none; - } - - .tabsA .label, .tabsC .label { - float: left; - font-weight: bold; - color: #777; - margin: 8px 0 0 0px; - } - - .tabsB a { - background: #eee; - border: 1px solid #eee; - color: #777; - display: block; - float: left; - height: 22px; - line-height: 28px; - margin: 5px 0px 0 4px; - padding: 0 11px 0 11px; - text-decoration: none; - } - - a { - color: #072b61; - text-decoration: none; - cursor: pointer; - } - - div.side-box - { - width:200px; - padding:10px; - border:3px solid #CCCCCC; - margin:0px; - background: -moz-linear-gradient(top, #DDDDDD, #FFFFFF); - } - -Database tweaks -=============== - -To automatically delete expired sessions, we run a trigger -that makes PostgreSQL delete them upon inserting a new one. - -The code used to create this trigger was:: - - askfedora=# CREATE FUNCTION delete_old_sessions() RETURNS trigger - askfedora-# LANGUAGE plpgsql - askfedora-# AS $$ - askfedora$# BEGIN - askfedora$# DELETE FROM django_session WHERE expire_date>> execfile('shelldb.py') - -At this point you have access to a `db` SQLAlchemy Session instance, a `t` -`transaction` module, and `m` for the `bodhi.models`. - - -:: - # Fetch an update, and tweak it as necessary. - >>> up = m.Update.get(u'u'FEDORA-2016-4d226a5f7e', db) - - # Commit the transaction - >>> t.commit() - - -Here is an example of merging two updates together and deleting the original. - -:: - >>> up = m.Update.get(u'FEDORA-2016-4d226a5f7e', db) - >>> up.builds - [, ] - >>> b = up.builds[0] - >>> up2 = m.Update.get(u'FEDORA-2016-5f63a874ca', db) - >>> up2.builds - [] - >>> up.builds.remove(b) - >>> up.builds.append(up2.builds[0]) - >>> delete_update(up2) - >>> t.commit() - - -Troubleshooting and Resolution -============================== - -Atomic OSTree compose failure ------------------------------ - -If the Atomic OSTree compose fails with some sort of `Device or Resource busy` error, then run `mount` to see if there -are any stray `tmpfs` mounts still active:: - - tmpfs on /var/lib/mock/fedora-22-updates-testing-x86_64/root/var/tmp/rpm-ostree.bylgUq type tmpfs (rw,relatime,seclabel,mode=755) - -You can then `umount /var/lib/mock/fedora-22-updates-testing-x86_64/root/var/tmp/rpm-ostree.bylgUq` and resume the push again. - - -nfs repodata cache IOError --------------------------- - -Sometimes you may hit an IOError during the updateinfo.xml generation -process from createrepo_c:: - - IOError: Cannot open /mnt/koji/mash/updates/epel7-160228.1356/../epel7.repocache/repodata/repomd.xml: File /mnt/koji/mash/updates/epel7-160228.1356/../epel7.repocache/repodata/repomd.xml doesn't exists or not a regular file - -This issue will be resolved with NFSv4, but in the mean time it can be worked -around by removing the `.repocache` directory and resuming the push:: - - rm -fr /mnt/koji/mash/updates/epel7.repocache diff --git a/docs/sops/bugzilla.rst b/docs/sops/bugzilla.rst deleted file mode 100644 index 40a670a..0000000 --- a/docs/sops/bugzilla.rst +++ /dev/null @@ -1,124 +0,0 @@ -.. title: Bugzilla Sync SOP -.. slug: infra-bugzilla-sync -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -================================ -Bugzilla Sync Infrastructure SOP -================================ - -We do not run bugzilla.redhat.com. If bugzilla itself is down we need to -get in touch with Red Hat IT or one of the bugzilla hackers (for instance, -Dave Lawrence (dkl)) in order to fix it. - -Infrastructure has some scripts that perform administrative functions on -bugzilla.redhat.com. These scripts sync information from FAS and the -Package Database into bugzilla. - -Contents -======== - -1. Contact Information -2. Description -3. Troubleshooting and Resolution - - 1. Errors while syncing bugzilla with the PackageDB - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Persons - abadger1999 -Location - Phoenix, Denver (Tummy), Red Hat Infrastructure -Servers - (fas1, app5) => Need to migrate these to bapp1, bugzilla.redhat.com -Purpose - Sync Fedora information to bugzilla.redhat.com - -Description -=========== - -At present there are two scripts that sync information from Fedora into -bugzilla. - -export-bugzilla.py ------------------- - -``export-bugzilla.py`` is the first script. It is responsible for syncing -Fedora Accounts into bugzilla. It adds Fedora packages and bug triagers -into a bugzilla group that gives the users extra permissions within -bugzilla. This script is run off of a cron job on FAS1. The source code -resides in the FAS git repo in ``fas/scripts/export-bugzilla.*`` however the -code we run on the servers presently lives in ansible:: - - roles/fas_server/files/export-bugzilla - -pkgdb-sync-bugzilla -------------------- - -The other script is pkgdb-sync-bugzilla. It is responsible for syncing the -package owners and cclists to bugzilla from the pkgdb. The script runs off -a cron job on app5. The source code is in the packagedb bzr repo is -``packagedb/fedora-packagedb-stable/server-scripts/pkgdb-sync-bugzilla.*``. -Just like FAS, a separate copy is presently installed from ansbile to -``/usr/local/bin/pkgdb-sync-bugzilla`` but that should change ASAP as the -present fedora-packagedb package installs ``/usr/bin/pkgdb-sync-bugzilla``. - -Troubleshooting and Resolution -============================== - -Errors while syncing bugzilla with the PackageDB ------------------------------------------------- - -One frequent problem is that people will sign up to watch a package in the -packagedb but their email address in FAS isn't a bugzilla email address. -When this happens the scripts that try to sync the packagedb information -to bugzilla encounter an error and send an email like this:: - - Subject: Errors while syncing bugzilla with the PackageDB - - The following errors were encountered while updating bugzilla with information - from the Package Database. Please have the problems taken care of: - - ({'product': u'Fedora', 'component': u'aircrack-ng', 'initialowner': u'baz@zardoz.org', - 'initialcclist': [u'foo@bar.org', u'baz@zardoz.org']}, 504, 'The name foo@bar.org is not a - valid username. \n Either you misspelled it, or the person has not\n registered for a - Red Hat Bugzilla account.') - -When this happens we attempt to contact the person with the problematic -mail address and get them to change it. Here's a boilerplate message:: - - To: foo@bar.org - Subject: Fedora Account System Email vs Bugzilla Email - - Hello, - - You are signed up to receive bug reports against the aircrack-ng package - in Fedora. Unfortunately, the email address we have for you in the - Fedora Account System is not a valid bugzilla email address. That means - that bugzilla won't send you mail and we're getting errors in the script - that syncs the cclist into bugzilla. - - There's a few ways to resolve this: - - 1) Create a new bugzilla account with the email foo@bar.org as - an account at https://bugzilla.redhat.com. - - 2) Change an existing account on https://bugzilla.redhat.com to use the - foo@bar.org email address. - - 3) Change your email address in https://admin.fedoraproject.org/accounts - to use an email address that matches with an existing bugzilla email - address. - - Please let me know what you want to do! - - Thank you, - -If the user does not reply someone in the cvsadmin group needs to go into -the pkgdb and remove the user from the cclist for the package. diff --git a/docs/sops/bugzilla2fedmsg.rst b/docs/sops/bugzilla2fedmsg.rst deleted file mode 100644 index ad8bebe..0000000 --- a/docs/sops/bugzilla2fedmsg.rst +++ /dev/null @@ -1,74 +0,0 @@ -.. title: bugzilla2fedmsg SOP -.. slug: infra-bugzilla2fedmsg -.. date: 2016-04-07 -.. taxonomy: Contributors/Infrastructure - -=================== -bugzilla2fedmsg SOP -=================== - -Receive events from bugzilla over the RH "unified messagebus" and rebroadcast -them over our own fedmsg bus. - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc -Servers - bugzilla2fedmsg01 -Purpose - Rebroadcast bugzilla events on our bus. - -Description ------------ - -bugzilla2fedmsg is a small service running as the 'moksha-hub' process which -receives events from bugzilla via the RH "unified messagebus" and rebroadcasts -them to our fedmsg bus. - -.. note:: Unlike *all* of our other fedmsg services, this one runs as the - 'moksha-hub' process and not as the 'fedmsg-hub'. - -The bugzilla2fedmsg package provides a plugin to the moksha-hub that -connects out over the STOMP protocol to a 'fabric' of JBOSS activemq FUSE -brokers living in the Red Hat DMZ. We authenticate with a cert/key pair that is -kept in /etc/pki/fedmsg/. Those brokers should push bugzilla events over -STOMP to our moksha-hub daemon. When a message arrives, we query bugzilla -about the change to get some 'more interesting' data to stuff in our -payload, then we sign the message using a fedmsg cert and fire it off to the -rest of our bus. - -This service has no database, no memcached usage. It depends on those STOMP -brokers and being able to query bugzilla.rh.com. - -Relevant Files --------------- - -All managed by ansible, of course: - - STOMP config: /etc/moksha/production.ini - fedmsg config: /etc/fedmsg.d/ - certs: /etc/pki/fedmsg - code: /usr/lib/python2.7/site-packages/bugzilla2fedmsg.py - -Useful Commands ---------------- - -To look at logs, run:: - - $ journalctl -u moksha-hub -f - -To restart the service, run:: - - $ systemctl restart moksha-hub - -Internal Contacts -------------------- - -If we need to contact someone from the RH internal "unified messagebus" team, -search for "unified messagebus" in mojo. It is operated as a joint project -between RHIT and PnT Devops. See also the ``#devops-message`` IRC channel, -internally. diff --git a/docs/sops/cloud.rst b/docs/sops/cloud.rst deleted file mode 100644 index c40deb7..0000000 --- a/docs/sops/cloud.rst +++ /dev/null @@ -1,138 +0,0 @@ -.. title: Fedora OpenStack Cloud -.. slug: infra-openstack -.. date: 2015-04-28 -.. taxonomy: Contributors/Infrastructure - -================ -Fedora OpenStack -================ - -Quick Start -=========== - -Controller:: - - sudo rbac-playbook hosts/fed-cloud09.cloud.fedoraproject.org.yml - -Compute nodes:: - - sudo rbac-playbook groups/openstack-compute-nodes.yml - -Description -=========== - -If you need to install OpenStack install, either make sure the machine is clean. -Or use ``ansible.git/files/fedora-cloud/uninstall.sh`` script to brute force wipe off. - -.. note:: by default, the script does not wipe LVM group with VM, you have to clean - them manually. There is commented line in that script. - -On fed-cloud09, remove the file ``/etc/packstack_sucessfully_finished`` to enforce run of packstack and few other commands. - -After that wipe, you have to:: - - ifdown eth1 - configure eth1 to become normal Ethernet with ip - yum install openstack-neutron-openvswitch - /usr/bin/systemctl restart neutron-ovs-cleanup - ifup eth1 - -Additionally when reprovision OpenStack, all volumes on DellEqualogic are -preserved and you have to manually remove them (or remove them from OS before -it is reprovision). SSH to DellEqualogic (credentials are at the bottom of -``/etc/cinder/cinder.conf``) and run:: - - show (to get list of volumes) - volume select offline - volume delete - -Before installing make sure: - - * make sure rdo repo is enabled - * ``yum install openstack-packstack openstack-packstack-puppet openstack-puppet-modules`` - * ``vim /usr/lib/python2.7/site-packages/packstack/plugins/dashboard_500.py`` - and missing parentheses:: - - ``host_resources.append((ssl_key, 'ssl_ps_server.key'))`` - -Now you can run playbook:: - - sudo rbac-playbook hosts/fed-cloud09.cloud.fedoraproject.org.yml - -If you run it after wipe (i.e. db has been reset), you have to: - - * import ssh keys of users (only possible via webUI - RHBZ 1128233 - * reset user passwords - - -Compute nodes -============= - -Compute node is much easier and is written as role. Use:: - - vars_files: - - ... SNIP - - /srv/web/infra/ansible/vars/fedora-cloud.yml - - "{{ private }}/files/openstack/passwords.yml" - - roles: - ... SNIP - - cloud_compute - -Define a host variable in ``inventory/host_vars/FQDN.yml``:: - - compute_private_ip: 172.23.0.10 - -You should also add IP to ``vars/fedora-cloud.yml`` - -And when adding new compute node, please update ``files/fedora-cloud/hosts`` - -.. important:: When reinstalling make sure you removed all members on Dell Equalogic - (credentials are in /etc/cinder/cinder.conf on compute node) otherwise the - space will be blocked!!! - -Updates -======= -Our openstack cloud should have updates applied and reboots when the rest of our servers -are updated and rebooted. This will cause an outage, please make sure to schedule it. - -1. Stop copr-backend process on copr-be.cloud.fedoraproject.org -2. Kill all copr-builder instances. -3. Kill all transient/scratch instances. -4. Update all instances we control. copr, persistent, infrastructure, qa etc. -5. Shutdown all instances -6. Update and reboot fed-cloud09 -7. Update and reboot all compute nodes -8. Start up all instances that are shutdown in step 5. - -TODO: add commands for above as we know them. - -Troubleshooting -=============== - -* could not connect to VM? - check your security group, default SG does not - allow any connection. -* packstack end up with error, it is likely race condition in puppet - BZ 1135529. Just run it again. - -* ERROR : append() takes exactly one argument (2 given - ``vi /usr/lib/python2.7/site-packages/packstack/plugins/dashboard_500.py`` - and add one more surrounding () - -* Local ip for ovs agent must be set when tunneling is enabled - restart fed-cloud09 or: - ssh to fed-cloud09; ifdown eth1; ifup eth1; ifup br-ex - -* mongodb problem? follow - https://ask.openstack.org/en/question/54015/mongodbpp-error-when-installing-rdo-on-centos-7/?answer=54076#post-id-54076 - -* ``WARNING:keystoneclient.httpclient:Failed to retrieve management_url from token``:: - - keystone --os-token $ADMIN_TOKEN --os-endpoint \ - https://fedorainfracloud.org:35357/v2.0/ endpoint-create --region 'RegionOne' \ - --service 91358b81b1aa40d998b3a28d0cfc86e7 --region 'RegionOne' --publicurl \ - 'https://fedorainfracloud.org:5000/v2.0' --adminurl 'http://172.24.0.9:35357/v2.0' \ - --internalurl 'http://172.24.0.9:5000/v2.0' - -Fedora Classroom about our instance -=================================== -http://meetbot.fedoraproject.org/fedora-classroom/2015-05-11/fedora-classroom.2015-05-11-15.02.log.html diff --git a/docs/sops/collectd.rst b/docs/sops/collectd.rst deleted file mode 100644 index 2147325..0000000 --- a/docs/sops/collectd.rst +++ /dev/null @@ -1,72 +0,0 @@ -.. title: Collectd SOP -.. slug: collectd -.. date: 2016-03-22 -.. taxonomy: Contributors/Infrastructure - -============ -Collectd SOP -============ - -Collectd ( https://collectd.org/ ) is a client/server setup that gathers system information -from clients and allows the server to display that information over various time periods. - -Our server instance runs on log01.phx2.fedoraproject.org and most other servers run -clients that connect to the server and provide it with data. - -======== - -1. Contact Information -2. Collectd info - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - https://admin.fedoraproject.org/collectd/ -Servers - log01 and all/most other servers as clients -Purpose - provide load and system information on servers. - -Configuration -============= - -The collectd roles configure collectd on the various machines: - -collectd/base - This is the base client role for most servers. -collectd/server - This is the server for use on log01. -collectd/other - There's various other subroles for different types of clients. - -Web interface -============= - -The server web interface is available at: - -https://admin.fedoraproject.org/collectd/ - -Restarting -========== - -collectd runs as a normal systemd or sysvinit service, so you can: -systemctl restart collectd or service collectd restart -to restart it. - -Removing old hosts -================== - -Collectd keeps information around until it's deleted, so you may need to -sometime go remove data from a host or hosts thats no longer used. -To do this: - -1. Login to log01 -2. cd /var/lib/collectd/rrd -3. sudo rm -rf oldhostname - -Bug reporting -============= - -Collectd is in Fedora/EPEL and we use their packages, so report bugs to bugzilla.redhat.com. diff --git a/docs/sops/contenthosting.rst b/docs/sops/contenthosting.rst deleted file mode 100644 index 76743f5..0000000 --- a/docs/sops/contenthosting.rst +++ /dev/null @@ -1,142 +0,0 @@ -.. title: Content Hosting Infrastructure SOP -.. slug: infra-content-hosting -.. date: 2012-07-17 -.. taxonomy: Contributors/Infrastructure - -================================== -Content Hosting Infrastructure SOP -================================== - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main, fedora-infrastructure-list - -Location - Phoenix - -Servers - secondary1, netapp[1-3], torrent1 - -Purpose - Policy regarding hosting, removal and pruning of content. - -Scope - download.fedora.redhat.com, alt.fedoraproject.org, - archives.fedoraproject.org, secondary.fedoraproject.org, - torrent.fedoraproject.org - -Description -=========== - -Fedora hosts both Fedora content and some non-Fedora content. Our -resources are finite and as such we have to have some policy around when -to remove old content. This SOP describes the test to remove content. The -spirit of this SOP is to allow more people to host content and give it a -try, prove that it's useful. If it's not popular or useful, it will get -removed. Also out of date or expired content will be removed. - -What hosting options are available ----------------------------------- - -Aside from the hosting at http://fedorahosted.org/ we have a series of -mirrors we're allowing people to use. They are located at: - -* http://archive.fedoraproject.org/pub/archive/ - For archives of historical Fedora releases -* http://secondary.fedoraproject.org/pub/fedora-secondary/ - For secondary architectures -* http://alt.fedoraproject.org/pub/alt/ - For misc content / catchall -* http://torrent.fedoraproject.org/ - For torrent hosting -* http://spins.fedoraproject.org/ - For official Fedora Spins hosting, mirrored somewhat -* http://download.fedoraproject.com/pub/ - For official Fedora Releases, mirrored widely - -Who can host? What can be hosted? ---------------------------------- -Any official Fedora content can hosted and made available for mirroring. -Official content is determined by the Council by virtue of allowing people -to use the Fedora trademark. People representing these teams will be -allowed to host. - -Non Official Hosting --------------------- - -People wanting to host unofficial bits may request approval for hosting. -Create a ticket at https://fedorahosted.org/fedora-infrastructure/ -explaining what and why Fedora should host it. Such will be reviewed by -the Fedora Infrastructure team. - -Requests for non-official hosting that may conflict with existing Fedora -policies will be escalated to the Council for approval. - -Licensing ---------- -Anything hosted with Fedora must come with a Free software license that is -approved by Fedora. See http://fedoraproject.org/wiki/Licensing for -more. - -Requesting Space -================ - -* Make sure you have a Fedora account - - https://admin.fedoraproject.org/accounts/ -* Ensure you have signed the Fedora Project Contributor Agreement (FPCA) -* Submit a hosting request - - https://fedorahosted.org/fedora-infrastructure/ - - * Include who you are, and any group you are working with (e.g. a SIG) - * Include Space requirements - * Include an estimate of the number of downloads expected (if you can). - * Include the nature of the bits you want to host. - -* Apply for group hosted-content - - https://admin.fedoraproject.org/accounts/group/view/hosted-content - -Using Space -=========== - -A dedicated namespace in the mirror will be assigned to you. It will be -your responsibility to upload content, remove old content, stay within -your quota, etc. If you have any questions or concerns about this please -let us know. Generally you will use rsync. For example:: - - rsync -av --progress ./my.iso secondary01.fedoraproject.org:/srv/pub/alt/mySpace/ - -.. important:: - None of our mirrored content is backed up. Ensure that you keep backups of - your content. - -Content Pruning / Purging / Removal -=================================== - -The following guidelines / tests will be used to determine whether or not -to remove content from the mirror. - -Expired / Old Content ----------------------- - -If content meets any of the following criteria it may be removed: - -* Content that has reached the end of life (is no longer receiving updates). -* Pre-release content that has been superceded. -* EOL releases that have been moved to archives. -* N-2 or greater releases. If more than 3 versions of a piece of content - are on the mirror, the oldest may be removed. - -Limited Use Content -------------------- -If content meets any of the following criteria it may be removed: - -* Content with exceedingly limited seeders or downloaders, with little - prospect of increasing those numbers and which is older then 1 year. - -* Content such as videos or audio which are several years old. - -Catch All Removal ------------------- - -Fedora reserves the right to remove any content for any reason at any -time. We'll do our best to host things but sometimes we'll need space or -just need to remove stuff for legal or policy reasons. diff --git a/docs/sops/copr.rst b/docs/sops/copr.rst deleted file mode 100644 index dc98340..0000000 --- a/docs/sops/copr.rst +++ /dev/null @@ -1,195 +0,0 @@ -.. title: Copr -.. slug: infra-copr -.. date: 2015-01-13 -.. taxonomy: Contributors/Infrastructure -==== -Copr -==== - -Copr is build system for 3rd party packages. - -Frontend: - - http://copr.fedorainfracloud.org/ -Backend: - - http://copr-be.cloud.fedoraproject.org/ -Package signer: - - copr-keygen.cloud.fedoraproject.org -Dist-git - - copr-dist-git.fedorainfracloud.org - -Devel instances (NO NEED TO CARE ABOUT THEM, JUST THOSE ABOVE): - - http://copr-fe-dev.cloud.fedoraproject.org/ - - http://copr-be-dev.cloud.fedoraproject.org/ - - copr-keygen-dev.cloud.fedoraproject.org - - copr-dist-git-dev.fedorainfracloud.org - -Contact Information -==================== -Owner - msuchy (mirek) -Contact - #fedora-admin, #fedora-buildsys -Location - Fedora Cloud -Purpose - Build system - -TROUBLESHOOTING -================ - -Almost every problem with Copr is due problem in OpenStack, in such case:: - - $ ssh root@copr-be.cloud.fedoraproject.org - # copr-backend-service stop - # source /home/copr/cloud/ec2rc.sh - # /home/copr/delete-forgotten-instances.pl - # # wait a minute and check - # euca-describe-instances - # # sometimes you have to run delete-forgotten-instances.pl as openstack is sometimes stuborn. - # copr-backend-service start - -If this does not help you, then stop and kill all OpenStack VM builders and:: - - $ ssh root@fed-cloud02.cloud.fedoraproject.org - # source keystonerc - # for i in $(nova-manage floating list | grep 7ed4d | grep None | sort | awk '{ print $2}') - do nova-manage floating delete $i - nova-manage floating create $i - done - -or even (USUALLY NOT NEEDED):: - - for i in /etc/init.d/openstack-*; do $i condrestart; done - -and then start copr backend service again. - - # copr-backend-service restart - -Sometimes OpenStack can not handle spawning too much VM at the same time. -So it is safer to edit on copr-be.cloud.fedoraproject.org:: - - vi /etc/copr/copr-be.conf - -and change:: - - group0_max_workers=12 - -to "6". Start copr-backend service and some time later increase it to -original value. Copr automaticaly detect change in script and increase -number of workers. - -Backend Troubleshoting ----------------------- - -Information about status of Copr backend services: - - # copr-backend-service status - - -Utilization of workers: - - # ps axf - -Worker process change $0 to list which task they are working on and on which builder. - -To list which VM builders are tracked by copr-vmm service: - - # /usr/bin/copr_get_vm_info.py - - -Deploy information -================== - -Using playbooks and rbac:: - - $ sudo rbac-playbook groups/copr-backend.yml - $ sudo rbac-playbook groups/copr-frontend.yml - $ sudo rbac-playbook groups/copr-keygen.yml - $ sudo rbac-playbook groups/copr-dist-git.yml - -https://git.fedorahosted.org/cgit/copr.git/plain/copr-setup.txt - -On backend should run copr-backend service (which spawns several processes). -Backend spawns VM from Fedora Cloud. You could not login to those machines directly. -You have to:: - - $ ssh root@copr-be.cloud.fedoraproject.org - # su - copr - $ source /home/copr/cloud/ec2rc.sh - $ euca-describe-instances - # # instance type m1.builder are those spawned by backend, check 18th column with internal IP - # # log there if you want - $ ssh root@172.16.3.3 - # or terminate that instance (ID is in 2nd column) - # euca-terminate-instances i-000003b3 - # #you can delete all instances in error state or simply forgotten by: - # /home/copr/delete-forgotten-instances.pl - -Order of start up ------------------ - -When reprovision you should start first: copr-keygen and copr-dist-git machines (in any order). -Then you can start copr-be. Well you can start it sooner, but make sure that copr-* services are stopped. - -Copr-fe machine is completly independent and can be start any time. If backend is stopped it will just queue jobs. - -Logs -==== - -For backend - /var/log/copr/backend.log /var/log/copr/workers/worker-* - /var/log/copr/spawner.log /var/log/copr/job_grab.log - /var/log/copr/actions.log /var/log/copr/vmm.log - -For frontend: - httpd logs: /var/log/httpd/{error,access}_log - -For keygen: - /var/log/copr-keygen/main.log - -For dist-git: - /var/log/copr-dist-git/main.log - -httpd logs: - /var/log/httpd/{error,access}_log - -Services -======== - -For backend use script - copr-backend-service {start|stop|restart} - - this handle all copr* services (job grabber, vmm, workers, ...) - logstash - redis - lighttpd - -For frontend: - httpd - logstash - postgresql - -For keygen: - signd - -For dist-git: - httpd - copr-dist-git - -PPC64LE Builders -================ - -Builders for PPC64 are located at rh-power2.fit.vutbr.cz and anyone with access to buildsys ssh key can get there using keys as - msuchy@rh-power2.fit.vutbr.cz - -There are commands: -$ ls bin/ -destroy-all.sh reinit-vm26.sh reinit-vm28.sh virsh-destroy-vm26.sh virsh-destroy-vm28.sh virsh-start-vm26.sh virsh-start-vm28.sh -get-one-vm.sh reinit-vm27.sh reinit-vm29.sh virsh-destroy-vm27.sh virsh-destroy-vm29.sh virsh-start-vm27.sh virsh-start-vm29.sh - -bin/destroy-all.sh destroy all VM and reinit them -reinit-vmXX.sh copy VM image from template -virsh-destroy-vmXX.sh destroys VM -virsh-start-vmXX.sh starts VM -get-one-vm.sh start one VM and return its IP - this is used in Copr playbooks. - -In case of big queue of PPC64 tasks simply call bin/destroy-all.sh and it will destroy stuck VM and copr backend will spawn new VM. diff --git a/docs/sops/cyclades.rst b/docs/sops/cyclades.rst deleted file mode 100644 index 5c84381..0000000 --- a/docs/sops/cyclades.rst +++ /dev/null @@ -1,33 +0,0 @@ -.. title: cyclades -.. slug: infra-cyclades -.. date: 2011-12-12 -.. taxonomy: Contributors/Infrastructure -======== -Cyclades -======== - -cyclades notes - -1. login as root - default password is tslinux -2. change password for root and admin to our password from the - phx2-access.txt file in the private repo -3. port forward to the web browser for the cyclades - ``ssh -L 8080:rack47-serial.phx2.fedoraproject.org:80`` -4. connect to localhost:8080 in your web browser -5. login with root and the password you set above -6. click on 'security' -7. click on 'moderate' -8. logout, port forward port 443 as above: - ``ssh -L 8080:rack47-serial.phx2.fedoraproject.org:443`` -9. click on the 'wizard' button at lower left -10. proceed through the wizard - Info needed: - - serial ports are set to 115200 8N1 by default - - do not setup buffering - - give it the ip of our syslog server - -11. click 'apply changes' -12. hope -13. log back in -14. name/setup the port aliases - diff --git a/docs/sops/darkserver.rst b/docs/sops/darkserver.rst deleted file mode 100644 index 22ca7cf..0000000 --- a/docs/sops/darkserver.rst +++ /dev/null @@ -1,109 +0,0 @@ -.. title: Darkserver SOP -.. slug: infra-darkserver -.. date: 2012-03-22 -.. taxonomy: Contributors/Infrastructure - -============== -Darkserver SOP -============== - -To setup a http://darkserver.fedoraproject.org based on Darkserver project -to provide GNU_BUILD_ID information for packages. A devel instance can be -seen at http://darkserver01.dev.fedoraproject.org and staging instance is -http://darkserver01.stg.phx2.fedoraproject.org/. - -This page describes how to set up the server. - -Contents -======== - -1. Contact Information -2. Installing the server -3. Setting up the database -4. SELinux Configuration -5. Koji plugin setup -6. Debugging - - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin -Persons: - kushal mether -Sponsor: - nirik -Location: - phx2 -Servers: - darkserver01 , darkserver01.stg, darkserver01.dev -Purpose: - To host Darkserver - - -Installing the Server -===================== -:: - - root@localhost# yum install darkserver - - -Setting up the database -======================= -We are using MySQL as database. We will need two users, one for -koji-plugin and one for darkserver.:: - - root@localhost# mysql -u root - mysql> CREATE DATABASE darkserver; - mysql> GRANT INSERT ON darkserver.* TO kojiplugin@'koji-hub-ip' IDENTIFIED BY 'XXX'; - mysql> GRANT SELECT ON darkserver.* TO dark@'darkserver-ip' IDENTIFIED BY 'XXX'; - -Setup this db configuration in the conf file under ``/etc/darkserver/darkserverweb.conf``:: - - [darkserverweb] - host=db host name - user=dark - password=XXX - database=darkserver - -Now setup the db tables if it is a new install. - -(For this you may need to ``'GRANT * ON darkserver.*'`` to the web user, and -then ``'REVOKE * ON darkserver.*'`` after running.) - -:: - - root@localhost# python /usr/lib/python2.6/site-packages/darkserverweb/manage.py syncdb - -SELinux Configuration -===================== - -Do the follow to allow the webserver to connect to the database.:: - - root@localhost# setsebool -P httpd_can_network_connect_db 1 - -Setting up the Koji plugin -========================== - -Install the package.:: - - root@localhost# yum install darkserver-kojiplugin - -Then fill up the configuration file under ``/etc/koji-hub/plugins/darkserver.conf``:: - - [darkserver] - host=db host name - user=kojiplugin - password=XXX - database=darkserver - port=3306 - -Then enable the plugin in the koji hub configuration. - -Debugging -========= -Set DEBUG to True in ``/etc/darkserver/settings.py`` file and restart Apache. - diff --git a/docs/sops/database.rst b/docs/sops/database.rst deleted file mode 100644 index c2b2574..0000000 --- a/docs/sops/database.rst +++ /dev/null @@ -1,237 +0,0 @@ -.. title: Database Infrastructure SOP -.. slug: infra-database -.. date: 2016-09-24 -.. taxonomy: Contributors/Infrastructure - -=========================== -Database Infrastructure SOP -=========================== - -Our database servers provide database storage for many of our apps. - -Contents - -1. Contact Information -2. Description -3. Creating a New Postgresql Database -4. Troubleshooting and Resolution - - 1. Connection issues - 2. Some useful queries - - 1. What queries are running - 2. Seeing how "dirty" a table is - 3. XID Wraparound - - 3. Restart Procedure - - 1. Koji - - 2. Bodhi - -5. Note about TurboGears and MySQL -6. Restoring from backups or specific dbs - - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main, sysadmin-dba group - -Location - Phoenix - -Servers - sb01, db03, db-fas01, db-datanommer02, db-koji01, db-s390-koji01, db-arm-koji01, db-ppc-koji01, db-qa01, dbqastg01 - -Purpose - Provides database connection to many of our apps. - -Description -=========== - -db01, db03 and db-fas01 are our primmary servers. -db01 and db-fas01 run PostgreSQL. -db03 contain mariadb. -db-koji01, db-s390-koji01, db-arm-koji01, db-ppc-koji01 contain secondary kojis. -db-qa01 and db-qastg01 contain taskotron and resultsdb. -db-datanommer02 contains all storage messages from postgresql database. - - -Creating a New Postgresql Database -================================== - -Creating a new database on our postgresql server isn't hard but there's -several steps that should be taken to make the database server as secure -as possible. - -We want to separate the database permissions so that we don't have the -user/password combination that can do anything it likes to the database on -every host (the webapp user can usually do a lot of things even without those -extra permissions but every little bit helps). - -Say we have an app called "raffle". We'd have three users: - -* raffleadmin: able to make any changes they want to this particular - database. It should not be used in day to day but only for things - like updating the database schema when an update occurs. - We could very likely disable this account in the db whenever we are not - using it. -* raffleapp: the database user that the web application uses. This will - likely need to be able to insert and select from all tables. It will - probably need to update most tables as well. There may be some tables - that it does *not* need delete on. It should almost certainly not - need schema modifying permissions. (With postgres, it likely also - needs permission to insert/select on sequences as well). -* rafflereadonly: Only able to read data from tables, not able to modify - anything. Sadly, we aren't using this often but it can be useful for - scripts that need to talk directly to the database without modifying it. - -:: - - db2 $ sudo -u postgres createuser -P -E NEWDBadmin - Password: - db2 $ sudo -u postgres createuser -P -E NEWDBapp - Password: - db2 $ sudo -u postgres createuser -P -E NEWDBreadonly - Password: - db2 $ sudo -u postgres createdb -E utf8 NEWDB -O NEWDBadmin - db2 $ sudo -u postgres psql NEWDB - NEWDB=# revoke all on database NEWDB from public; - NEWDB=# revoke all on schema public from public; - NEWDB=# grant all on schema public to NEWDBadmin; - NEWDB=# [grant permissions to NEWDBapp as appropriate for your app] - NEWDB=# [grant permissions to NEWDBreadonly as appropriate for a user that - is only trusted enough to read information] - NEWDB=# grant connect on database NEWDB to nagiosuser; - - -If your application needs to have the NEWDBapp and password to connect to -the database, you probably want to add these to ansible as well. Put the -password in the private repo in batcave01. Then use a templatefile to -incorporate it into the config file. See fas.pp for an example. - -Troubleshooting and Resolution -============================== - -Connection issues ------------------ - -There are no known outstanding issues with the database itself. Remember -that every time either database is restarted, services will have to be -restarted (see below). - -Some useful queries -------------------- - -What queries are running -```````````````````````` - -This can help you find out what queries are cuurently running on the -server:: - - select datname, pid, query_start, backend_start, query from - pg_stat_activity where state<>'idle' order by query_start; - -This can help you find how many connections to the db server are for each -individual database:: - - select datname, count(datname) from pg_stat_activity group by datname - order by count desc; - -Seeing how "dirty" a table is -````````````````````````````` - -We've added a function from postgres's contrib directory to tell how dirty -a table is. By dirty we mean, how many tuples are active, how many have -been marked as having old data (and therefore "dead") and how much free -space is allocated to the table but not used.:: - - \c fas2 - \x - select * from pgstattuple('visit_identity'); - table_len | 425984 - tuple_count | 580 - tuple_len | 46977 - tuple_percent | 11.03 - dead_tuple_count | 68 - dead_tuple_len | 5508 - dead_tuple_percent | 1.29 - free_space | 352420 - free_percent | 82.73 - \x - -Vacuum should clear out dead_tuples. Only a vacuum full, which will lock -the table and therefore should be avoided, will clear out free space. - -XID Wraparound -`````````````` -Find out how close we are to having to perform a vacuum of a database (as -opposed to individual tables of the db). We should schedule a vacuum when -about 50% of the transaction ids have been used (approximately 530,000,000 -xids):: - - select datname, age(datfrozenxid), pow(2, 31) - age(datfrozenxid) as xids_remaining - from pg_database order by xids_remaining; - -Information on [61]wraparound - -Restart Procedure -================= - -If the database server needs to be restarted it should come back on it's -own. Otherwise each service on it can be restarted:: - - service mysqld restart - service postgresql restart - -Koji ----- - -Any time postgreql is restarted, koji needs to be restarted. Please also -see [62]Restarting Koji - -Bodhi ------ - -Anytime postgresql is restarted Bodhi will need to be restarted no sop -currently exists for this. - -TurboGears and MySQL -==================== - -.. note:: about TurboGears and MySQL - - There's a known bug in TurboGears that causes MySQL clients not to - automatically reconnect when lost. Typically a restart of the TurboGears - application will correct this issue. - -Restoring from backups or specific dbs. -======================================= - -Our backups store the latest copy in /backups/ on each db server. -These backups are created automatically by the db-backup script run fron cron. -Look in /usr/local/bin for the backup script. - -To restore partially or completely you need to: - -1. setup postgres on a system - -2. start postgres/run initdb - - if this new system running postgres has already run ansible then it will - have wrong config files in /var/lib/pgsql/data - clear them out before - you start postgres so initdb can work. -3. grab the backups you need from /backups - also grab global.sql - edit up global.sql to only create/alter the dbs you care about - -4. as postgres run: ``psql -U postgres -f global.sql`` - -5. when this completes you can restore each db with (as postgres user):: - createdb $dbname - pg_restore -d dbname dbname_backup_file.db - -6. restart postgres and check your data. diff --git a/docs/sops/datanommer.rst b/docs/sops/datanommer.rst deleted file mode 100644 index ab4c9d0..0000000 --- a/docs/sops/datanommer.rst +++ /dev/null @@ -1,117 +0,0 @@ -.. title: Datanommer SOP -.. slug: infra-datanommer -.. date: 2013-02-08 -.. taxonomy: Contributors/Infrastructure - -datanommer SOP -============== - -Consume fedmsg bus activity and stuff it in a postgresql db. - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc -Servers - busgateway01 -Purpose - Save fedmsg bus activity - -Description ------------ - -datanommer is a set of three modules: - -python-datanommer-models - Schema definition and API for storing new items - and querying existing items - -python-datanommer-consumer - A plugin for the fedmsg-hub that actively - listens to the bus and stores events. - -datanommer-commands - A set of CLI tools for querying the DB. - -datanommer will one day serve as a backend for future web services like -datagrepper and dataviewer. - -Source: https://github.com/fedora-infra/datanommer/ -Plan: https://fedoraproject.org/wiki/User:Ianweller/statistics_plus_plus - -CLI tools ---------- - -Dump the db into a file as json:: - - $ datanommer-dump > datanommer-dump.json - -When was the last bodhi message?:: - - $ # It was 678 seconds ago - $ datanommer-latest --category bodhi --timesince - [678] - -When was the last bodhi message in more readable terms?:: - - $ # It was 12 minutes and 43 seconds ago - $ datanommer-latest --category bodhi --timesince --human - [0:12:43.087949] - -What was that last bodhi message?:: - - $ datanommer-latest --category bodhi - [{"bodhi": { - "topic": "org.fedoraproject.stg.bodhi.update.comment", - "msg": { - "comment": { - "group": null, - "author": "ralph", - "text": "Testing for latest datanommer.", - "karma": 0, - "anonymous": false, - "timestamp": 1360349639.0, - "update_title": "xmonad-0.10-10.fc17" - }, - "agent": "ralph" - }, - }}] - -Show me stats on datanommer messages by topic:: - - $ datanommer-stats --topic - org.fedoraproject.stg.fas.group.member.remove has 10 entries - org.fedoraproject.stg.logger.log has 76 entries - org.fedoraproject.stg.bodhi.update.comment has 5 entries - org.fedoraproject.stg.busmon.colorized-messages has 10 entries - org.fedoraproject.stg.fas.user.update has 10 entries - org.fedoraproject.stg.wiki.article.edit has 106 entries - org.fedoraproject.stg.fas.user.create has 3 entries - org.fedoraproject.stg.bodhitest.testing has 4 entries - org.fedoraproject.stg.fedoratagger.tag.create has 9 entries - org.fedoraproject.stg.fedoratagger.user.rank.update has 5 entries - org.fedoraproject.stg.wiki.upload.complete has 1 entries - org.fedoraproject.stg.fas.group.member.sponsor has 6 entries - org.fedoraproject.stg.fedoratagger.tag.update has 1 entries - org.fedoraproject.stg.fas.group.member.apply has 17 entries - org.fedoraproject.stg.__main__.testing has 1 entries - -Upgrading the DB Schema ------------------------ - -datanommer uses "python-alembic" to manage its schema. When developers want -to add new columns or features, these should/must be tracked in alembic and -shipped with the RPM. - -In order to run upgrades on our stg/prod dbs: - -1) ssh to busgateway01{.stg} -2) ``cd /usr/share/datanommer.models/`` -3) Run:: - - $ alembic upgrade +1 - - Over and over again until the db is fully upgraded. diff --git a/docs/sops/denyhosts.rst b/docs/sops/denyhosts.rst deleted file mode 100644 index 8e3e362..0000000 --- a/docs/sops/denyhosts.rst +++ /dev/null @@ -1,62 +0,0 @@ -.. title: Denyhosts Infrastructure SOP -.. slug: infra-denyhosts -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -============================ -Denyhosts Infrastructure SOP -============================ - -Denyhosts provides a protection against brute force attacks. - -Contents -======== - -1. Contact Information -2. Description -3. Troubleshooting and Resolution - - 1. Connection issues - -Contact Information -==================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main group - -Location - Anywhere - -Servers - All - -Purpose - Denyhosts provides a protection against brute force attacks. - -Description -=========== - -All of our servers now implement denyhosts to protect against brute force -attacks. Very few boxes should be in the 'allowed' list. Especially -internally. - -Troubleshooting and Resolution -============================== - -Connection issues ------------------ -The most common issue will be legitimate logins failing. First, try to -figure out why a host ended up on the deny list (tcptraceroute, failed -login attempts, etc are all good candidates). Next do the following -directions. The below example is for a host (10.0.0.1) being banned. Login -to the box from a different host and as root do the following.:: - - cd /var/lib/denyhosts - sed -si '/10.0.0.1/d' * /etc/hosts.deny - /etc/init.d/denyhosts restart - -That should correct the problem. - diff --git a/docs/sops/departing-admin.rst b/docs/sops/departing-admin.rst deleted file mode 100644 index ab61135..0000000 --- a/docs/sops/departing-admin.rst +++ /dev/null @@ -1,64 +0,0 @@ -.. title: Departing Admin SOP -.. slug: infra-departing-admin -.. date: 2013-07-15 -.. taxonomy: Contributors/Infrastructure - -=================== -Departing admin SOP -=================== - -From time to time admins depart the project, this SOP checks any access they may no longer need. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - Everywhere -Servers - all - -Description -=========== - -From time to time people with admin access to various parts of the project may -leave the project or no longer wish to contribute. This SOP attempts to list -the process for removing access they no longer need. - -0. First, make sure that this SOP is needed. Verify the person has left the project - and what areas they might wish to still contibute to. - -1. Gather info: fas username, email address, knowledge of passwords. - -2. Check the following areas with the following commands: - - email address in ansible - - Check: ``git grep email@address`` - - Remove: ``git commit`` - - koji admin - - Check: ``koji list-permissions --user=username`` - - Remove: ``koji revoke-permission permissionname username`` - - wiki pages - - Check: look for https://fedoraproject.org/wiki/User:Username - - Remove: delete page, or modify with info they are no longer contributing. - - packages - - Check: Download https://admin.fedoraproject.org/pkgdb/lists/bugzilla?tg_format=plain and grep - - Remove: remove from cc, orphan packages or reassign. - - fas account - - Check: check username in fas - - Remove: set user inactive - - .. note:: If there are scripts or files needed, save homedir of user. - - passwords - - Check: if departing admin knew sensitive passwords. - - Remove: Change passwords. - - .. note:: root pw, management interfaces, etc diff --git a/docs/sops/dns.rst b/docs/sops/dns.rst deleted file mode 100644 index e8fc054..0000000 --- a/docs/sops/dns.rst +++ /dev/null @@ -1,328 +0,0 @@ -.. title: DNS Infrastructure SOP -.. slug: infra-dns -.. date: 2015-06-03 -.. taxonomy: Contributors/Infrastructure - -================================ -DNS repository for fedoraproject -================================ - -We've set this up so we can easily (and quickly) edit and deploy dns changes -with a record of who changed what and why. This system also lets us edit out -proxies from rotation for our many and varied websites quickly and with a -minimum of opportunity for error. Finally, it checks to make sure that all -of the zone changes will actually work before they are allowed. - -DNS Infrastructure SOP -====================== - -We have 5 DNS servers: - -ns-sb01.fedoraproject.org - hosted at Serverbeach -ns02.fedoraproject.org - hosted at ibiblio (ipv6 enabled) -ns03.phx2.fedoraproject.org - in phx2, internal to phx2. -ns04.fedoraproject.org - in phx2, external. -ns05.fedoraproject.org - hosted at internetx (ipv6 enabled) - -Contents -======== - -1. Contact Information -2. Troubleshooting, Resolution and Maintenance - - 1. DNS update - 2. Adding a new zone - -3. GeoDNS - - 1. Non geodns fedoraproject.org IPs - 2. Adding and removing countries - 3. IP Country Mapping - -4. resolv.conf - - 1. Phoenix - 2. Non-Phoenix - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin-main, sysadmin-dns -Location: - ServerBeach and ibiblio and internetx and phx2. -Servers: - ns01, ns02, ns03.phx2, ns04, ns05 -Purpose: - Provides DNS to our users - -Troubleshooting, Resolution and Maintenance - -Adding a new Host -================= - -Adding a new host requires to add it to DNS and to ansible, see new-hosts.rst for -the details. - -Editing the domain(s) -===================== - -We have three domains which needs to be able to change on demand for proxy -rotation/removal: - fedoraproject.org. - getfedora.org. - cloud.fedoraproject.org. - -The other domains are edited only when we add/subtract a host or move it to -a new ip. Not much else. - -If you need to edit a domain that is NOT In the above list: - -- change to the 'master' subdir, edit the domain as usual - (remember to update the serial), save it. - -If you need to edit one of the domains in the above list: -(replace fedoraproject.org with the domain from above) - -- if you need to add/change a host in fedoraproject.org that is not '@' or - 'wildcard' then: - - - edit fedoraproject.org.template - - make your changes - - do not edit the serial or anything surrounded by {{ }} unless you - REALLY know what you are doing. - -- if you need to only add/remove a proxy during an outage or due to - networking issue then run: - - - ``./zone-template fedoraproject.org.cfg disable ip [ip] [ip]`` - to disable the ip of the proxy you want removed. - - ``./zone-template fedoraproject.org.cfg enable ip [ip] [ip]`` - reverses the disable - - ``./zone-template fedoraproject.org.cfg reset`` - will reset to all ips enabled. - -- if you want to add an all new proxy as '@' or 'wildcard' for - fedoraproject.org: - - - edit fedoraproject.org.cfg - - add the ip to the correct section of the ipv4 or ipv6 in the config. - - save the file - - check the file for validity by running: ``python fedoraproject.org.cfg`` - looking for errors or tracebacks. - -In all cases then run: - -- ``./do-domains`` - -- if that completes successfully then run:: - - git add . - git commit -a -m 'description of your change here' - git push - -and then run this on all of the nameservers (as root):: - - /usr/local/bin/update-dns - - -To run this via ansible from batcave do:: - - sudo -i ansible ns\* -a "/usr/local/bin/update-dns" - - -this will pull from the git tree, update all of the zones and reload the -name server. - - - -DNS update -========== - -DNS config files are ansible managed on batcave01. - -From batcave01:: - - git clone /git/ansible - cd ansible/roles/dns/files/ - ...make changes needed... - git commit -m "What you did" - git push - -It should update within a half hour. You can test the new configs with dig:: - - dig @ns01.fedoraproject.org fedoraproject.org - -Adding a new zone -================= - -First name the zone and generate new set of keys for it. Run this on ns01. -Note it could take SEVERAL minutes to run:: - - /usr/sbin/dnssec-keygen -a RSASHA1 -b 1024 -n ZONE c.fedoraproject.org - /usr/sbin/dnssec-keygen -a RSASHA1 -b 2048 -n ZONE -f KSK c.fedoraproject.org - -Then copy the created .key and .private files to the private git repo (You -need to be sysadmin-main to do this). The directory is ``private/private/dnssec``. - -- add the zone in zones.conf in ``ansible/roles/dns/files/zones.conf`` -- save and commit - but do not push -- Add zone file to the master subdir in this repo -- git add and commit the file -- check the zone by running check-domains -- if you intend to have this be a dnssec signed zone then you must - - create a new key:: - - /usr/sbin/dnssec-keygen -a RSASHA1 -b 1024 -n ZONE $domain.org - /usr/sbin/dnssec-keygen -a RSASHA1 -b 2048 -n ZONE -f KSK $domain.org - - - put the files this generates into /srv/privatekeys/dnssec on batcave01 - - edit the do-domains file in this dir and your domain to the - signed_domains entry at the top - - edit the zone you just created and add the contents of the .key files - to the bottom of the zone - -If this is a subdomain of fedoraproject.org: - -- run dnssec-dsfromkey on each of the .key files generated -- paste that output into the bottom of fedoraproject.org.template -- commit everything to the dns tree -- push your changes -- push your changes to the ansible repo -- test - -If you add a new child zone, such as c.fedoraproject.org or -vpn.fedoraproject.org you will also need to add the contents of -dsset-childzone.fedoraproject.org (for example), to the main -fedoraproject.org zonefile, so that DNSSEC has a valid trust path to that -zone. - -You also must set the NS delegation entries near the top of fedoraproject.org zone file -these are necessary to keep dnssec-signzone from whining with this error msg:: - - dnssec-signzone: fatal: 'xxxxx.example.com': found DS RRset without NS RRset - -Look for the: "vpn IN NS" records at the top of fedoraproject.org and copy them for the new child zone. - - -fedorahosted.org template -========================= -we want to create a separate entry for each fedorahosted project - but we -do not want to have to maintain it later. So we have a simple map that -let's us put the ones which are different in there and know where they -should go. The map's format is:: - - projectname short_hostname-in-fedorahosted where it lives - -examples:: - - someproject git - someproject svn - someproject bzr - someproject hosted-super-crazy - -this will create cnames for each of them. - -running ``./do-domains`` will take care of all that and update the serial -automatically. - - -GeoDNS -====== - -As part of our Content Distribution Network we use geodns for certain -zones. At the moment just ``fedoraproject.org`` and ``*.fedoraproject.org`` zones. -We've got proxy servers all over the US and in Europe. We are -now sending users to proxy servers that are near them. The current list of -available 'zone areas' are: - -* DEFAULT -* EU -* NA - -DEFAULT contains all the zones. So someone who does not seem to be in or -near the EU, or NA would get directed to any random set. (South Africa -for example doesn't get directed to any particular server). - -.. important:: - Don't forget to increase the serial number in the fedoraproject.org zone - file. Even if you're making a change to one of the geodns IPs. There is - only one serial number for all setups and that serial number is in the - fedoraproject.org zone. - -.. note:: Non geodns fedoraproject.org IPs - If you're adding as server that is just in one location, and isn't going - to get geodns balanced. Just add that host to the fedoraproject.org zone. - -Adding and removing countries ------------------------------ - -Our setup actually requires us to specify which countries go to which -servers. To do this, simply edit the named.conf file in ansible. Below is -an example of what counts as "NA" (North America).:: - - view "NA" { - match-clients { US; CA; MX; }; - recursion no; - zone "fedoraproject.org" { - type master; - file "master/NA/fedoraproject.org.signed"; - }; - include "etc/zones.conf"; - }; - -IP Country Mapping ------------------- - -The IP -> Location mapping is done via a config file that exists on the -dns servers themselves (it's not ansible controlled). The file, located at -``/var/named/chroot/etc/GeoIP.acl`` is generated by the ``GeoIP.sh`` script -(that script is in ansible). - -.. warning:: - This is known to be a less efficient means of doing geodns than the - patched version from kernel.org. We're using this version at the moment - because it's in Fedora and works. The level of DNS traffic we see is - generally low enough that the inefficiencies aren't that noticed. For - example, average load on the servers before this geodns was .2, now it's - around .4 - -resolv.conf -=========== - -In order to make the network more transparent to the admins, we do a lot of -search based relative names. Below is a list of what a resolv.conf should -look like. - -.. important:: - Any machine that is not on our vpn or has not yet joined the vpn should - _NOT_ have the vpn.fedoraproject.org search until after it has been added - to the vpn (if it ever does) - -Phoenix - :: - - search phx2.fedoraproject.org vpn.fedoraproject.org fedoraproject.org - -Phoenix in the QA network: - :: - - search qa.fedoraproject.org vpn.fedoraproject.org phx2.fedoraproject.org fedoraproject.org - -Non-Phoenix - :: - - search vpn.fedoraproject.org fedoraproject.org - -The idea here is that we can, when need be, setup local domains to contact -instead of having to go over the VPN directly but still have sane configs. -For example if we tell the proxy server to hit "app1" and that box is in -PHX, it will go directly to app1, if its not, it will go over the vpn to -app1. diff --git a/docs/sops/fas-notes.rst b/docs/sops/fas-notes.rst deleted file mode 100644 index a50860c..0000000 --- a/docs/sops/fas-notes.rst +++ /dev/null @@ -1,130 +0,0 @@ -.. title: Fedora Account System SOP -.. slug: infra-fas -.. date: 2013-04-04 -.. taxonomy: Contributors/Infrastructure - -===================== -Fedora Account System -===================== - -Notes about FAS and how to do things in it: - -- where are certs for fas accounts for koji, etc? - on fas01 /var/lib/fedora-ca - makefile targets allow you to do - things with them. - -look in index.txt for certs. One's marked with an 'R' in the left-most -column are 'REVOKED' - -to revoke a cert:: - - cd /var/lib/fedora-ca - -find the cert number in index.txt - the number is the 3rd column in the -file - you can match it to the user by searching for their username. You -want the highest number cert for their account. - -once you have the number you would run (as root or fas):: - - make revoke cert=newcerts/$that_number.pem - -How to gather information about a user -====================================== - -You'll want to have direct access to query the database for this. The common -way is to have someone in sysadmin-db ssh to the postgres db hosting FAS -(currently db01). Then access it via ident auth on the box:: - - sudo -u postgres psql fas2 - - -There are several tables that will have information about a user. Some of it -is redundant but it's good to check all the sources there shouldn't be -inconsistencies:: - - select * from people where username = 'USERNAME'; - -Of interest here are: - -:id: for later queries -:password_changed: tells when the password was last changed -:last_seen: last login to fas (including through jsonfas from other TG1/2 - apps. Maybe wiki and insight as well. Not fedorahosted trac, shell - login, etc) -:status_change: last time that the user's status was updated via the website. - Usually triggered when the user was marked inactive for a mass password - change and then they reset their password. - -Next table is the log table:: - - select * from log where author_id = ID_FROM_PREV_QUERY or description ~ '.*USERNAME.*'; - -The FAS writes certain events to the log table. This will get those events. -We use both the author_id field (who made the change) and the username in a -description regex search because a few changes are made to users by admins. -Fields of interest are pretty self explanatory here: - -:changetime: when the log was made -:description: description of the event that's being logged - -.. note:: FAS does not log every event that happens to a user. Only - "important" ones. FAS also cannot record direct changes to the database - here (for instance, when we mark accounts inactive administratively via - the db). - -Lastly, there's the groups and person_roles table. When a user joins a group, -the person_roles table is updated to reflect the user's status in the group, -when they applied, and when they were approved:: - - select groups.name, person_roles.* from person_roles, groups where person_id = ID_FROM_INITIAL_QUERY and groups.id = person_roles.group_id; - -This will give you the following fields to pay attention to: - -:name: Name of the group -:role_status: If this is unapproved, it just means the user applied for it. - If it is approved, it means they are actually in the group. -:creation: When the user applied to the group -:approval: When the user was approved to be in the group -:role_type: What role the person has or wants to have in the group -:sponsor_id: If you suspect something is suspicious with one of the roles, you - may want to ask the sponsor if they remember sponsoring this person - -Account Deletion and renaming -============================= - -.. note:: see also accountdeletion.rst - For information on how to disable, rename, and remove accounts. - -Pseudo Users -============ - -.. note:: see also nonhumanaccounts.rst - For information on creating pseudo user accounts for use in pkgdb/bugzilla - -fas staging -=========== - -we have a staging fas db setup on db-fas01.stg.phx2.fedoraproject.org - it accessed -by fas01.stg.phx2.fedoraproject.org - -This system is not autopopulated by production fas - it must be done manually. -To do this you must: - -- dump the fas2 db on db-fas01.phx2.fedoraproject.org:: - - sudo -u postgres pg_dump -C fas2 > fas2.dump - scp fas2.dump db-fas01.stg.phx2.fedoraproject.org:/tmp - -- then on fas01.stg.phx2.fedoraproject.org:: - - /etc/init.d/httpd stop - -- then on db02.stg.phx2.fedoraproject.org:: - - echo "drop database fas2\;" | sudo -u postgres psql ; cat fas2.dump | sudo -u postgres psql - -- then on fas01.stg.phx2.fedoraproject.org:: - - /etc/init.d/httpd start - -that should do it. diff --git a/docs/sops/fas-openid.rst b/docs/sops/fas-openid.rst deleted file mode 100644 index 419ddd5..0000000 --- a/docs/sops/fas-openid.rst +++ /dev/null @@ -1,52 +0,0 @@ -.. title: FAS-OpenID -.. slug: infra-fas-openid -.. date: 2013-12-14 -.. taxonomy: Contributors/Infrastructure - -========== -FAS-OpenID -========== - - -FAS-OpenID is the OpenID server of Fedora infrastructure. - -Live instance is at https://id.fedoraproject.org/ -Staging instance is at https://id.dev.fedoraproject.org/ - -Contact Information -=================== - -Owner - Patrick Uiterwijk (puiterwijk) -Contact - #fedora-admin, #fedora-apps, #fedora-noc -Location - openid0{1,2}.phx2.fedoraproject.org - openid01.stg.fedoraproject.org -Purpose - Authentication & Authorization - -Trusted roots -============== - -FAS-OpenID has a set of "trusted roots", which contains websites which are -always trusted, and thus FAS-OpenID will not show the Approve/Reject form to -the user when they login to any such site. - -As a policy, we will only add websites to this list which Fedora -Infrastructure controls. If anyone ever ask to add a website to this list, -just answer with this default message:: - - We only add websites we (Fedora Infrastructure) maintain to this list. - - This feature was put in because it wouldn't make sense to ask for permission - to send data to the same set of servers that it already came from. - - Also, if we were to add external websites, we would need to judge their - privacy policy etc. - - Also, people might start complaining that we added site X but not their site, - maybe causing us "political" issues later down the road. - - As a result, we do NOT add external websites. - diff --git a/docs/sops/fedmsg-certs.rst b/docs/sops/fedmsg-certs.rst deleted file mode 100644 index fe99a49..0000000 --- a/docs/sops/fedmsg-certs.rst +++ /dev/null @@ -1,172 +0,0 @@ -.. title: fedmsg Certificates SOP -.. slug: infra-fedmsg-certs -.. date: 2013-04-08 -.. taxonomy: Contributors/Infrastructure - -=================================================== -fedmsg (Fedora Messaging) Certs, Keys, and CA - SOP -=================================================== - -X509 certs, private RSA keys, Certificate Authority, and Certificate -Revocation List. - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-admin, #fedora-apps, #fedora-noc -Servers - - app0[1-7] - - packages0[1-2] - - fas0[1-3] - - pkgs01 - - busgateway01, - - value0{1,3} - - releng0{1,4} - - relepel03 -Purpose - Certify fedmsg messages come from authentic sources. - -Description ------------ - -fedmsg sends JSON-encoded messages from many services to a zeromq messaging -bus. We're not concerned with encrypting the messages, only with signing them -so an attacker cannot spoof. - -Every instance of each service on each host has its own cert and private key, -signed by the CA. By convention, we name the certs -.{crt,key} -For instance, bodhi has the following certs: - -- bodhi-app01.phx2.fedoraproject.org -- bodhi-app02.phx2.fedoraproject.org -- bodhi-app03.phx2.fedoraproject.org -- bodhi-app01.stg.phx2.fedoraproject.org -- bodhi-app02.stg.phx2.fedoraproject.org -- more - -Scripts to generate new keys, sign them, and revoke them live in the ansible -repo in ``ansible/roles/fedmsg/files/cert-tools/``. The keys and certs -themselves (including ca.crt and the CRL) live in the private repo in -``private/fedmsg-certs/keys/`` - -fedmsg is locally configured to find the key it needs by looking in -``/etc/fedmsg.d/ssl.py`` which is kept in ansible in -``ansible/roles/fedmsg/templates/fedmsg.d/ssl.py.erb``. - -Each service-host has its own key. This means: - -- A key is not shared across multiple instances of a service on - different machines. i.e., bodhi on app01 and bodhi on app02 should have - different key/cert pairs. - -- A key is not shared across multiple services on a host. i.e., mediawiki - on app01 and bodhi on app01 should have different key/cert pairs. - -The attempt here is to minimize the number of potential attack vectors. -Each private key should be readable only by the service that needs it. -bodhi runs under mod_wsgi in apache and should run as its own unique bodhi -user (not as apache). The permissions for its.phx2.fedoraproject.org -private_key, when deployed by ansible, should be read-only for that local -bodhi user. - -For more information on how fedmsg uses these certs see -http://fedmsg.readthedocs.org/en/latest/crypto.html - - -Configuring the Scripts ------------------------ - -Usage of the main scripts is described in more detail below. They are -located in ``ansible/rolesfedmsg/files/cert-tools``. - -Before you use them, you'll need to point them at the right directory to -modify. By default, this is ``~/private/fedmsg-certs/keys/``. You -can change that by editing ``ansible/roles/fedmsg/files/cert-tools/vars`` in -the event that you have the private repo checked out to an alternate location. - -There are other configuration values defined in that script. Most will not -need to be changed. - -Wiping and Rebuilding Everything --------------------------------- - -There is a script in ``ansible/roles/fedmsg/files/cert-tools/`` named -``rebuild-all-fedmsg-certs``. You can run it with no arguments to wipe out -the old and generate a new CA root certificate, a signing cert and key, and -all key/cert pairs for all service-hosts. - -.. note:: Warning -- Obviously, this will wipe everything. Do you want that? - -Adding a new key for a new service-host ---------------------------------------- - -First, checkout the ansible private repo as that's where the keys are going -to be stored. The scripts will assume this is checked out to ~/private. - -In ``ansible/roles/fedmsg/files/cert-tools`` run:: - - $ source ./vars - $ ./build-and-sign-key - - -For instance, if we bring up a new app host, app10.phx2.fedoraproject.org, -we'll need to generate a new cert/key pair for each fedmsg-enabled service -that will be running on it, so you'd run:: - - $ source ./vars - $ ./build-and-sign-key shell-app10.phx2.fedoraproject.org - $ ./build-and-sign-key bodhi-app10.phx2.fedoraproject.org - $ ./build-and-sign-key mediawiki-app10.phx2.fedoraproject.org - -Just creating the keys isn't quite enough, there are four more things you'll -need to do. - -The private keys are created in your checkout of the private repo under -~/private/private/fedmsg-certs/keys . There will be four files for each cert -you created: .pem (ex: 5B.pem) and -.{crt,csr,key} -git add, commit, and push all of those. - -Second, You need to edit -``ansible/roles/fedmsg/files/cert-tools/rebuild-all-fedmsg-certs`` -and add the argument of the commands you just ran, so that next time certs need -to be blown away and recreated, the new service-hosts will be included. -For the examples above, you would need to add to the list: - shell-app10.phx2.fedoraproject.org - bodhi-app10.phx2.fedoraproject.org - mediawiki-app10.phx2.fedoraproject.org - -You need to ensure that the keys are distributed to the host with the proper -permissions. Only the bodhi user should be able to access bodhi's private -key. This can be accomplished by using the ``fedmsg::certificate`` in -ansible. It should distribute your new keys to the correct hosts and -correctly permission them. - -Lastly, if you haven't already updated the global fedmsg config, you'll need -to. You need to add your new service-node to ``fedmsg.d/endpoint.py`` and -to ``fedmsg.d/ssl.py``. Those can be found in -``ansible/roles/fedmsg/templates/fedmsg.d``. See -http://fedmsg.readthedocs.org/en/latest/config.html for more information on -the layout and meaning of those files. - -Revoking a key --------------- - -In ``ansible/roles/fedmsg/files/cert-tools`` run:: - - $ source ./vars - $ ./revoke-full - - -This will alter ``private/fedmsg-certs/keys/crl.pem`` which should be -picked up and served publicly, and then consumed by all fedmsg consumers -globally. - -``crl.pem`` is publicly available at http://fedoraproject.org/fedmsg/crl.pem - -.. note:: Even though crl.pem lives in the private repo, we're just keeping - it there for convenience. It really *should* be served publicly, - so don't panic. :) - -.. note:: At the time of this writing, the CRL is not actually used. I need - one publicly available first so we can test it out. diff --git a/docs/sops/fedmsg-gateway.rst b/docs/sops/fedmsg-gateway.rst deleted file mode 100644 index e66e891..0000000 --- a/docs/sops/fedmsg-gateway.rst +++ /dev/null @@ -1,109 +0,0 @@ -.. title: fedmsg-gateway SOP -.. slug: infra-fedmsg-gateway -.. date: 2012-10-31 -.. taxonomy: Contributors/Infrastructure - -================== -fedmsg-gateway SOP -================== - -Outgoing raw ZeroMQ message stream. - -.. note:: see also: fedmsg-websocket - -Contact Information -=================== - -Owner: - Messaging SIG, Fedora Infrastructure Team -Contact: - #fedora-apps, #fedora-admin, #fedora-noc -Servers: - busgateway01, proxy0* -Purpose: - Expose raw ZeroMQ messages outside the FI environment. - -Description -=========== - -Users outside of Fedora Infrastructure can listen to the production message -bus by connecting to specific addresses. This is required for local users to -run their own hubs and message processors ("Consumers"). It is also -required for user-facing tools like fedmsg-notify to work. - -The specific public endpoints are: - -production - tcp://hub.fedoraproject.org:9940 -staging - tcp://stg.fedoraproject.org:9940 - -fedmsg-gateway, the daemon running on busgateway01, is listening to the FI -production fedmsg bus and will relay every message that it receives out to a -special ZMQ pub endpoint bound to port 9940. haproxy mediates connections -to the fedmsg-gateway daemon. - -Connection Flow -=============== - -Clients connect through haproxy on proxy0*:9940 are redirected to -busgateway0*:9940. This can be found in the haproxy.cfg entry for -``listen fedmsg-raw-zmq 0.0.0.0:9940``. - -This is different than the apache reverse proxy pass setup we have for the -app0* and packages0* machines. *That* flow looks something like this:: - - Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01) - -The flow for the raw zmq stream provided by fedmsg-gateway looks something -like this:: - - Client -> haproxy(proxy01) -> fedmsg-gateway(busgateway01) - -haproxy is listening on a public port. - -At the time of this writing, haproxy does not actually load balance zeromq -session requests across multiple busgateway0* machines, but there is nothing -stopping us from adding them. New hosts can be added in ansible and pressed -from busgateway01's template. Add them to the fedmsg-raw-zmq listen in -haproxy's config and it should Just Work. - -Increasing the Maximum Number of Concurrent Connections -======================================================= - -HTTP requests are typically very short (a few seconds at most). This -means that the number of concurrent tcp connections we require for most -of our services is quite low (1024 is overkill). ZeroMQ tcp connections, -on the other hand, are expected to live for quite a long time. -Consequently we needed to scale up the number of possible concurrent tcp -connections. - -All of this is in ansible and should be handled for us automatically if we -bring up new nodes. - -- The pam_limits user limit for the fedmsg user was increased from - 1024 to 160000 on busgateway01. -- The pam_limits user limit for the haproxy user was increased from - 1024 to 160000 on the proxy0* machines. -- The zeromq High Water Mark (HWM) was increased to 160000 on - busgateway01. -- The maximum number of connections allowed was increased in haproxy.cfg. - -Nagios -====== - -New nagios checks were added for this that check to see if the number of -concurrent connections through haproxy is approaching the maximum number -allowed. - -You can check these numbers by hand by inspecting the haproxy web interface: -https://admin.fedoraproject.org/haproxy/proxy1#fedmsg-raw-zmq - -Look at the "Sessions" section. "Cur" is the current number of sessions -versus "Max", the maximum number seen at the same time and "Limit", the -maximum number of concurrent connections allowed. - -RHIT -==== - -We had RHIT open up port 9940 special to proxy01.phx2 for this. diff --git a/docs/sops/fedmsg-introduction.rst b/docs/sops/fedmsg-introduction.rst deleted file mode 100644 index d8bb773..0000000 --- a/docs/sops/fedmsg-introduction.rst +++ /dev/null @@ -1,63 +0,0 @@ -.. title: fedmsg Intro SOP -.. slug: infra-fedmsg-intro -.. date: 2012-10-31 -.. taxonomy: Contributors/Infrastructure - -=================================== -fedmsg introduction and basics, SOP -=================================== - -General information about fedmsg - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-admin, #fedora-noc -Servers - Almost all of them. -Purpose - Introduce sysadmins to fedmsg tools and config - -Description ------------ - -fedmsg is a system that links together most of our webapps and services into -a message mesh or net (often called a "bus"). It is built on top of the -zeromq messaging library. - -fedmsg has its own developer documentation that is a good place to check if -this or other SOPs don't provide enough information - http://fedmsg.rtfd.org - -Tools ------ - -Generally, fedmsg-tail and fedmsg-logger are the two most commonly used -tools for debugging and testing. To see if bus-connectivity exists between -two machines, log onto each of them and run the following on the first:: - - $ echo testing from $(hostname) | fedmsg-logger - -And run the following on the second:: - - $ fedmsg-tail --really-pretty - -Configuration -------------- - -fedmsg configuration lives in /etc/fedmsg.d/ - -``/etc/fedmsg.d/endpoints.py`` keeps the list of every possible fedmsg endpoint. -It acts as a global index that defines the bus. - -See fedmsg.readthedocs.org/en/latest/config/ for a full glossary of -configuration values. - -Logs ----- - -fedmsg daemons keep their logs in /var/log/fedmsg. fedmsg message hooks in -existing apps (like bodhi) will log any errors to the logs of the app -they've been added to (like /var/log/httpd/error_log). diff --git a/docs/sops/fedmsg-irc.rst b/docs/sops/fedmsg-irc.rst deleted file mode 100644 index e29c1d5..0000000 --- a/docs/sops/fedmsg-irc.rst +++ /dev/null @@ -1,35 +0,0 @@ -.. title: fedmsg IRC SOP -.. slug: infra-fedmsg-irc -.. date: 2014-02-13 -.. taxonomy: Contributors/Infrastructure -============== -fedmsg-irc SOP -============== - - Echo fedmsg bus activity to IRC. - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc -Servers - value03 -Purpose - Echo fedmsg bus activity to IRC - -Description ------------ - -fedmsg-irc is a daemon running on value03 and value01.stg. It is listening -to the fedmsg bus and echoing that activity to the #fedora-fedmsg channel in -IRC. - -It can be configured to ignore certain messages, join certain rooms, and -take on a different nick by editing the values in ``/etc/fedmsg.d/irc.py`` and -restarting it with ``sudo service fedmsg-irc restart`` - -See http://fedmsg.readthedocs.org/en/latest/config/#term-irc for more -information on configuration. diff --git a/docs/sops/fedmsg-new-message-type.rst b/docs/sops/fedmsg-new-message-type.rst deleted file mode 100644 index bc7511a..0000000 --- a/docs/sops/fedmsg-new-message-type.rst +++ /dev/null @@ -1,78 +0,0 @@ -.. title: Adding a new fedmsg message type -.. slug: fedmsg-new-message-type -.. date: 2016-05-27 - -================================ -Adding a new fedmsg message type -================================ - - -Instrumenting the program -------------------------- -First, figure out how you're going to publish the message? Is it from a shell -script or from a long running process? - -If its from shell script, you need to just add a `fedmsg-logger` statement to -the script. Remember to set the `--modname` and `--topic` for your new -message's fully-qualified topic. - -If its from a python process, you need to just add a ``fedmsg.publish(..)`` -call. The same concerns about modname and topic apply here. - -If this is a short-lived python process, you'll want to add `active=True` to the -call to ``fedmsg.publish(..)``. This will make the fedmsg lib "actively" reach -out to our fedmsg-relay running on busgateway01. - -If it is a long-running python process (like a WSGI thread), then you don't need -to pass any extra arguments. You don't want it to reach out to the fedmsg-relay -if possible. Your process will require that some "endpoints" are created for it -in ``/etc/fedmsg.d/``. More on that below. - -Supporting infrastructure -------------------------- - - -You need to make sure that the machine this is running on has a cert and key -that can be read by the program to sign its message. If you don't have a cert -already, then you need to create it in the private repo. Ask a sysadmin-main -member. - -Then you need to declare those certs in the `fedmsg_certs` data structure stored -typically in our ansible ``group_vars/`` for this service. Declare both the -name of the cert, what group and user it should be owned by, and in the -``can_send:`` section, declare the list of topics that this cert should be -allowed to publish. - -If this is a long-running python process that is *not* passing `active=True` to -the call to `fedmsg.publish(..)`, then you have to also declare endpoints for -it. You do that by specifying the ``fedmsg_wsgi_procs`` and -``fedmsg_wsgi_vars`` in the ``group_vars`` for your service. The iptables rules -and fedmsg endpoints should be automatically created for you on the next -playbook run. - -Supporting code ---------------- - -At this point, you can push the change out to production and be publishing -messages "okay". Everything should be fine. - -However, your message will show up blank in datagrepper, in IRC, and in FMN, and -everywhere else we try to render it. You *must* then follow up and write a new -`Processor` for it in the fedmsg_meta library we maintain: -https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure - -You also *must* write a test case for it there. The docs listing all topics we -publish at http://fedora-fedmsg.rtfd.org/ is automatically generated from the -test suite. Please don't forget this. - -Lastly, you should cut a release of fedmsg_meta and deploy it using the -`playbooks/manual/upgrade/fedmsg.yml` playbook, which should update all the -relevant hosts. - -Corner cases ------------- - -If the process publishing the new message lives *outside* our main network, you -have to jump through more hoops. Look at abrt, koschei, and copr for examples -of how to configure this (you need a special firewall rule, and they need to be -configured to talk to our "inbound gateway" running on the proxies. diff --git a/docs/sops/fedmsg-relay.rst b/docs/sops/fedmsg-relay.rst deleted file mode 100644 index 2903794..0000000 --- a/docs/sops/fedmsg-relay.rst +++ /dev/null @@ -1,66 +0,0 @@ -.. title: fedmsg-relay SOP -.. slug: infra-fedmsg-relay -.. date: 2012-10-31 -.. taxonomy: Contributors/Infrastructure - -================ -fedmsg-relay SOP -================ - -Bridge ephemeral scripts into the fedmsg bus. - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-admin, #fedora-noc -Servers - app01 -Purpose - Bridge ephemeral bash and python scripts into the fedmsg bus. - -Description ------------ - -fedmsg-relay is running on app01, which is a bad choice. We should look to -move it to a more isolated place in the future. busgateway01 would be a -better choice. - -"Ephemeral" scripts like ``pkgdb2branch.py``, the post-receive git hook on -pkgs01, and anywhere fedmsg-logger is used all depend on fedmsg-relay. -Instead of emitting messages "directly" to the rest of the bus, they use -fedmsg-relay as an intermediary. - -Check that fedmsg-relay is running by looking for it in the process list. -You can restart it in the standard way with ``sudo service fedmsg-relay -restart``. Check for its logs in ``/var/log/fedmsg/fedmsg-relay.log`` - -Ephemeral scripts know where the fedmsg-relay is by looking for the -relay_inbound and relay_outbound values in the global fedmsg config. - -But What is it Doing? And Why? -------------------------------- - -The fedmsg bus is designed to be "passive" in its normal operation. A -mod_wsgi process under httpd sets up its fedmsg publisher socket to -passively emit messages on a certain port. When some other service wants -to receive these messages, it is up to that service to know where mod_wsgi -is emitting and to actively connect there. In this way, emitting is passive -and listening is active. - -We get a problem when we have a one-off or "ephemeral" script that is not a -long-running process -- a script like pkgdb2branch which is run when a user -runs it and which ends shortly after. Listeners who want these scripts -messages will find that they are usually not available when they try to -connect. - -To solve this problem, we introduced the "fedmsg-relay" daemon which is a -kind of "passive"-to-"passive" adaptor. It binds to an outbound port on one -end where it will publish messages (like normal) but it also binds to an -another port where it listens passively for inbound messages. Ephemeral -scripts then actively connect to the passive inbound port of the -fedmsg-relay to have their payloads echoed on the bus-proper. - -See http://fedmsg.readthedocs.org/en/latest/topology/ for a diagram. diff --git a/docs/sops/fedmsg-websocket.rst b/docs/sops/fedmsg-websocket.rst deleted file mode 100644 index 8dc10cd..0000000 --- a/docs/sops/fedmsg-websocket.rst +++ /dev/null @@ -1,76 +0,0 @@ -.. title: websocket SOP -.. slug: infra-websocket -.. date: 2012-10-31 -.. taxonomy: Contributors/Infrastructure - -============= -websocket SOP -============= - -websocket communication with Fedora apps. - -see-also: ``fedmsg-gateway.txt`` - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-admin, #fedora-noc -Servers - busgateway01, proxy0*, app0* -Purpose - Expose a websocket server for FI apps to use - -Description ------------ - -WebSocket is a protocol (an extension of HTTP/1.1) by which client web -browsers can establish full-duplex socket communications with a server -- -the "real-time web". - -In our case, webapps served from app0* and packages0* will include -javascript code instructing client browsers to establish a second connection -to our WebSocket server. They point browsers to the following addresses: - -production - wss://hub.fedoraproject.org:9939 -staging - wss://stg.fedoraproject.org:9939 - -The websocket server itself is a fedmsg-hub daemon running on busgateway01. -It is configured to enable its websocket server component in the presence of -certain configuration values. - -haproxy mediates connections to the fedmsg-hub websocket server daemon. -An stunnel daemon provides SSL support. - -Connection Flow ---------------- - -The connection flow is much the same as in the fedmsg-gateway.txt SOP, but -is somewhat more complicated. - -"Normal" HTTP requests to our app servers traverse the following chain:: - - Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01) - -The flow for a websocket requests looks something like this:: - - Client -> stunnel(proxy01) -> haproxy(proxy01) -> fedmsg-hub(busgateway01) - -stunnel is listening on a public port, negotiates the SSL connection, and -redirects the connection to haproxy who in turn hands it off to -the fedmsg-hub websocket server listening on busgateway01. - -At the time of this writing, haproxy does not actually load balance zeromq -session requests across multiple busgateway0* machines, but there is nothing -stopping us from adding them. New hosts can be added in ansible and pressed -from busgateway01's template. Add them to the fedmsg-websockets listen in -haproxy's config and it should Just Work. - -RHIT ----- - -We had RHIT open up port 9939 special to proxy01.phx2 for this. diff --git a/docs/sops/fedocal.rst b/docs/sops/fedocal.rst deleted file mode 100644 index be7403c..0000000 --- a/docs/sops/fedocal.rst +++ /dev/null @@ -1,39 +0,0 @@ -.. title: Fedocal SOP -.. slug: infra-fedocal -.. date: 2016-01-04 -.. taxonomy: Contributors/Infrastructure - -====================== -Fedocal SOP -====================== - -Fedocal is a web-based group calender application that is made available to the various groups with in the Fedora project. - -Contents -======== - -1. Contact Information -2. Documentation Links - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - https://apps.fedoraproject.org/calendar -Servers - -Purpose - To provide links to the documentation for fedocal, as it exists elsewhere on the internet and it was decided that a link document would be a better use of resources than to rewrite the book. - -Documentation Links -=================== - -For information on the latest and greatest in fedocal please review: http://fedocal.readthedocs.org/en/latest/ - -For documentation on the usage of fedocal please consult: http://fedocal.readthedocs.org/en/latest/usage.html - - diff --git a/docs/sops/fedora-releases.rst b/docs/sops/fedora-releases.rst deleted file mode 100644 index e911133..0000000 --- a/docs/sops/fedora-releases.rst +++ /dev/null @@ -1,399 +0,0 @@ -.. title: Fedora Release Infrastructure SOP -.. slug: infra-releng -.. date: 2015-03-10 -.. taxonomy: Contributors/Infrastructure - -================================= -Fedora Release Infrastructure SOP -================================= - -This SOP contains all of the steps required by the Fedora Infrastructure -team in order to get a release out. Much of this work overlaps with the -Release Engineering team (and at present share many of the same members). -Some work may get done by releng, some may get done by Infrastructure, as -long as it gets done, it doesn't matter. - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team, Fedora Release Engineering Team -Contact: - #fedora-admin, #fedora-releng, sysadmin-main, sysadmin-releng -Location: - N/A -Servers: - All -Purpose: - Releasing a new version of Fedora - -Preparations -============ - -Before a release ships, the following items need to be completed. - -1. New website from the websites team (typically hosted at - http://getfedora.org/_/) - -2. Verify mirror space (for all test releases as well) - -3. Verify with rel-eng permissions on content are right on the mirrors. Don't leak. - -4. Communication with Red Hat IS (Give at least 2 months notice, then - reminders as the time comes near) (final release only) - -5. Infrastructure change freeze - -6. Modify Template:FedoraVersion to reference new version. (Final release only) - -7. Move old releases to archive (post final release only) - -8. Switch release from development/N to normal releases/N/ tree in mirror - manager (post final release only) - -Change Freeze -============= - -The rules are simple: - -* Hosts with the ansible variable "freezes" "True" are frozen. - -* You may make changes as normal on hosts that are not frozen. - (For example, staging is never frozen) - -* Changes to frozen hosts requires a freeze break request sent to - the fedora infrastructure list, containing a description of the - problem or issue, actions to be taken and (if possible) patches - to ansible that will be applied. These freeze breaks must then get - two approvals from sysadmin-main or sysadmin-releng group members - before being applied. - -* Changes to recover from outages are acceptable to frozen hosts if needed. - -Change freezes will be sent to the fedora-infrastructure-list and begin 2 -weeks before each release and the final release. The freeze will end one -day after the release. Note, if the release slips during a change freeze, -the freeze just extends until the day after a release ships. - -You can get a list of frozen/non-frozen hosts by:: - - git clone https://infrastructure.fedoraproject.org/infra/ansible.git - scripts/freezelist -i inventory/inventory - -Notes about release day -======================= - -Release day is always an interesting and unique event. After the final -sprint from test to the final release a lot of the developers will be -looking forward to a bit of time away, as well as some sleep. Once Release -Engineering has built the final tree, and synced it to the mirrors it is -our job to make sure everything else (except the bit flip) gets done as -painlessly and easily as possible. - -.. note:: All communication is typically done in #fedora-admin. Typically these - channels are laid back and staying on topic isn't strictly enforced. On - release day this is not true. We encourage people to come, stay in the - room and be quiet unless they have a specific task or question releated to - release day. Its nothing personal, but release day can get out of hand - quick. - -During normal load, our websites function as normal. This is especially -true since we've moved the wiki to mod_fcgi. On release day our load -spikes a great deal. During the Fedora 6 launch many services were offline -for hours. Some (like the docs) were off for days. A large part of this -outage was due to the wiki not being able to handle the load, part was a -lack of planning by the Infrastructure team, and part is still a mystery. -(There are questions as to whether or not all of the traffic was legit or -a ddos. - -The Fedora 7 release went much better. Some services were offline for -minutes at a time but very little of it was out longer then that. The wiki -crashed, as it always does. We had made sure to make the fedoraproject.org -landing page static though. This helped a great deal though we did see -load on the proxy boxes as spiky. - -Recent releases have been quite smooth due to a number of changes: we -have a good deal more bandwith on master mirrors, more cpus and memory, -as well as prerelease versions are much easier to come by for those -interested before release day. - -Day Prior to Release Day -======================== - -Step 1 (Torrent) ----------------- -Setup the torrent. All files can be synced with the torrent box -but just not published to the world. Verify with sha1sum. Follow the -instructions on the torrentrelease.txt sop up to and including step 4. - -Step 2 (Website) ----------------- - -Verify the website design / content has been finalized with the websites -team. Update the Fedora version number wiki template if this is a final -release. It will need to be changed in https://fedoraproject.org/wiki/Template:CurrentFedoraVersion - -Additionally, there are redirects in the ansible -playbooks/include/proxies-redirects.yml file for Cloud -Images. These should be pushed as soon as the content is available. -See: https://fedorahosted.org/fedora-infrastructure/ticket/3866 for example - -Step 3 (Mirrors) ----------------- - -Verify enough mirrors are setup and have Fedora ready for release. If for -some reason something is broken it needs to be fixed. Many of the mirrors -are running a check-in script. This lets us know who has Fedora without -having to scan everyone. Hide the Alpha, Beta, and Preview releases from -the publiclist page. - -You can check this by looking at:: - - wget "http://mirrors.fedoraproject.org/mirrorlist?path=pub/fedora/linux/releases/test/20-Alpha&country=global" - - (replace 20 and Alpha with the version and release.) - -Release day -=========== - -Step 1 (Prep and wait) ----------------------- - -Verify the mirrors are ready and that the torrent has valid copies of its -files (use sha1sum) - -Do not move on to step two until the Release Engineering team has given -the ok for the release. It is the releng team's decision as to whether or -not we release and they may pull the plug at any moment. - -Step 2 (Torrent) ----------------- - -Once given the ok to release, the Infrastructure team should publish the -torrent and encourage people to seed. Complete the steps on the -http://infrastructure.fedoraproject.org/infra/docs/torrentrelease.txt -after step 4. - -Step 3 (Bit flip) ------------------ - -The mirrors sit and wait for a single permissions bit to be altered so -that they show up to their services. The bit flip (done by the releng -team) will replicate out to the mirrors. Verify that the mirrors have -received the change by seeing if it is actually available, just use a spot -check. Once that is complete move on. - -Step 4 (Taskotron) (final release only) ---------------------------------------- - -Please file a Taskotron ticket and ask for the new release support to be -added (log in to Phabricator using your FAS_account@fedoraproject.org email -address) -https://phab.qadevel.cloud.fedoraproject.org/maniphest/task/edit/form/default/?title=new%20Fedora%20release&priority=80&tags=libtaskotron - -Step 5 (Website) ----------------- - -Once all of the distribution pieces are verified (mirrors and torrent), -all that is left is to publish the website. At present this is done by -making sure the master branch of fedora-web is pulled by the syncStatic.sh -script in ansible. It will sync in an hour normally but on release day -people don't like to wait that long so do the following on sundries01 - - sudo -u apache /usr/local/bin/lock-wrapper syncStatic 'sh -x /usr/local/bin/syncStatic' - -Once that completes, on batcave01:: - - sudo -i ansible proxy\* "/usr/bin/rsync --delete -a --no-owner --no-group bapp02::getfedora.org/ /srv/web/getfedora.org/" - -Verify http://getfedora.org/ is working. - -Step 6 (Docs) -------------- - -Just as with the website, the docs site needs to be published. Just as -above follow the following steps:: - - /root/bin/docs-sync - -Step 7 (Monitor) ----------------- - -Once the website is live, keep an eye on various news sites for the -release announcement. Closely watch the load on all of the boxes, proxy, -application and otherwise. If something is getting overloaded, see -suggestions on this page in the "Juggling Resources" section. - -Step 8 (Badges) (final release only) ------------------------------------- - -We have some badge rules that are dependent on which release of Fedora -we're on. As you have time, please performs the following on your local -box:: - - $ git clone ssh://git.fedorahosted.org/git/badges.git - $ cd badges - -Edit ``rules/tester-it-still-works.yml`` and update the release tag to match -the now old but stable release. For instance, if we just released fc21, -then the tag in that badge rule should be fc20. - -Edit ``rules/tester-you-can-pry-it-from-my-cold-dead-hands.yml`` and update -the release tag to match the release that is about to reach EOL. For -instance, if we just released fc21, then the tag in that badge rule -should be fc19. Commit the changes:: - - $ git commit -a -m 'Updated tester badge rule for f21 release.' - $ git push origin master - -Then, on batcave, perform the following:: - - $ sudo -i ansible-playbook $(pwd)/playbooks/manual/push-badges.yml - -Step 9 (Done) --------------- - -Just chill, keep an eye on everything and make changes as needed. If you -can't keep a service up, try to redirect randomly to some of the mirrors. - -Priorities -========== - -Priorities of during release day (In order): - -1. Website - Anything related to a user landing at fedoraproject.org, and - clicking through to a mirror or torrent to download something must be - kept up. This is distribution, and without it we can potentially lose - many users. - -2. Linked addresses - We do not have direct control over what Digg, - Slashdot or anyone else links to. If they link to something on the - wiki and it is going down or link to any other site we control a - rewrite should be put in place to direct them to - http://fedoraproject.org/get-fedora. - -3. Torrent - The torrent server has never had problems during a release. - Make sure it is up. - -4. Release Notes - Typically grouped with the docs site, the release - notes are often linked to (this is fine, no need to redirect) but keep - an eye on the logs and ensure that where we've said the release notes - are, that they can be found there. In previous releases we sometimes - had to make this available in more than one spot. - -5. docs.fedoraproject.org - People will want to see whats new in Fedora - and get further documentation about it. Much of this is in the release - notes. - -6. wiki - Because it is so resource heavy, and because it is so developer - oriented we have no choice but to give the wiki a lower priority. - -7. Everything else. - -Juggling Resources -================== - -In our environment we're running different things on many different -servers. Using Xen we can easily give machines more or less ram, -processors. We can take down builders and bring up application servers. -The trick is to be smart and make sure you understand what is causing the -problem. These are some tips to keep in mind: - -* IPTables based bandwidth and connection limiting (successful in the - past) - -* Altering the weight on the proxy balancers - -* Create static pages out of otherwise dynamic content - -* Redirect pages to a mirror - -* Add a server / remove un-needed servers - -CHECKLISTS: -=========== - -Alpha: ------- - -* Announce infrastructure freeze 2 weeks before Alpha -* Change /topic in #fedora-admin -* mail infrastucture list a reminder. -* File all tickets -* new website, check mirror permissions, mirrormanager, check -* mirror sizes, release day ticket. - -After release is a "go": - -* Make sure torrents are setup and ready to go. -* fedora-web needs a branch for fN-alpha. In it: - * Alpha used on get-prerelease - * get-prerelease doesn't direct to release - * verify is updated with Alpha info - * releases.txt gets a branched entry for preupgrade - * bfo gets updated to have a Alpha entry. - -After release: - -* Update /topic in #fedora-admin -* post to infrastructure list that freeze is over. - -Beta: ------ - -* Announce infrastructure freeze 2 weeks before Beta -* Change /topic in #fedora-admin -* mail infrastucture list a reminder. -* File all tickets -* new website -* check mirror permissions, mirrormanager, check - mirror sizes, release day ticket. - -After release is a "go": - -* Make sure torrents are setup and ready to go. -* fedora-web needs a branch for fN-beta. In it: -* Beta used on get-prerelease -* get-prerelease doesn't direct to release -* verify is updated with Beta info -* releases.txt gets a branched entry for preupgrade -* bfo gets updated to have a Beta entry. - -After release: - -* Update /topic in #fedora-admin -* post to infrastructure list that freeze is over. - -Final: ------- - -* Announce infrastructure freeze 2 weeks before Final -* Change /topic in #fedora-admin -* mail infrastucture list a reminder. -* File all tickets -* new website, check mirror permissions, mirrormanager, check -* mirror sizes, release day ticket. - -After release is a "go": - -* Make sure torrents are setup and ready to go. -* fedora-web needs a branch for fN-alpha. In it: -* get-prerelease does direct to release -* verify is updated with Final info -* bfo gets updated to have a Final entry. -* update wiki version numbers and names. - -After release: - -* Update /topic in #fedora-admin -* post to infrastructure list that freeze is over. -* Move MirrorManager repository tags from the development/$version/ - Directory objects, to the releases/$version/ Directory objects. This is - done using the ``move-devel-to-release --version=$version`` command on bapp02. - This is usually done now a week or two after release. diff --git a/docs/sops/fedorahosted-fedmsg.rst b/docs/sops/fedorahosted-fedmsg.rst deleted file mode 100644 index fd08621..0000000 --- a/docs/sops/fedorahosted-fedmsg.rst +++ /dev/null @@ -1,106 +0,0 @@ -.. title: Fedmsg Fedorahosted SOP -.. slug: infra-fedorahosted-fedmsg -.. date: 2013-08-21 -.. taxonomy: Contributors/Infrastructure - -====================================== -Fedorahosted FedMsg Infrastructure SOP -====================================== - -Publish fedmsg messages from Fedora Hosted trac instances. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-apps #fedora-admin, sysadmin-hosted -Location - Serverbeach -Servers - hosted03, hosted04 -Purpose - Broadcast trac activity for select projects (opt-in) - -Description -=========== - -fedmsg activity is usually an all-or-nothing proposition. We emit messages -for all koji jobs and all bodhi updates, or none. - -fedmsg activity for Fedora Hosted is another story. We provide the option -for project owners to opt-in to fedmsg and have their activity broadcast, -but it is off by default. - -This document describes how to: - -1. Enable the fedmsg plugin for a fedora hosted project. -2. Setup the fedmsg plugin on a new node. - -Enable the fedmsg plugin for a fedora hosted project. -===================================================== - -Enable the trac plugin ----------------------- - -The trac-fedmsg-plugin package should be installed, but disabled. - -Edit ``/srv/web/trac/projects/$PROJECT/conf/trac.ini``. Under the [components] section add:: - - trac_fedmsg_plugin.* = enabled - -And restart apache with "sudo apachectl graceful" - -Enable the git hook -------------------- - -There is an ansible playbook that does this. There is no -need to do it by hand anymore. Run:: - - $ sudo -i ansible-playbook \ - /srv/web/infra/ansible/playbooks/fedorahosted_fedmsg_git.yml \ - --extra-vars '{"repos":["yanex.git"]}' - - -Enabling by hand -````````````````` - -*If* you were to do it by hand, without the playbook, you could follow -the instructions below: Make a backup of the old post-receive hook. It -should be empty when you encounter it, but just to be safe:: - - $ mv /srv/git/$PROJECT.git/hooks/post-receive \ - /srv/git/$PROJECT.git/hooks/post-receive.orig - -Then, symlink in the new post-receive hook with:: - - $ ln -s /usr/local/share/git/hooks/post-receive-fedorahosted-fedmsg \ - /srv/git/$PROJECT.git/hooks/post-receive - -That hooks is managed by ansible -- if you want to modify it you can do -so there. - -.. note:: IF there was an old post-receive hook in place, you should - check to see if it did something important. The 'fedora-web' git - repo (which was converted early on) had such a hook. See - /srv/git/fedora-web.git/hooks for an example of how to handle - multiple git hooks. Something like - /usr/share/git-core/post-receive-chained can be used to chain the - hook across multiple scripts. - - -How to setup the fedmsg plugin on a new fedorahosted node. -========================================================== - -1) Create certs for the new node as per the fedmsg-certs doc. - -2) Declare those certs in `/etc/fedmsg.d/ssl.py`` globally. - -3) Declare endpoints for the new node in ``/etc/fedmsg.d/endpoints.py``. - -4) Use our configuration management tool to distribute that new global - fedmsg config to the new node and all other nodes. - -5) Install the trac-fedmsg-plugin package on the new node and follow the - steps above. diff --git a/docs/sops/fedorahosted-project-cleanup.rst b/docs/sops/fedorahosted-project-cleanup.rst deleted file mode 100644 index 31f2380..0000000 --- a/docs/sops/fedorahosted-project-cleanup.rst +++ /dev/null @@ -1,82 +0,0 @@ -.. title: Fedorahosted Cleanup SOP -.. slug: infra-fedorahosted-cleanup -.. date: 2011-10-10 -.. taxonomy: Contributors/Infrastructure - -====================================== -FH-Projects-Cleanup Infrastructure SOP -====================================== - -Contents - -1. Introduction -2. Our first move -3. Removing Project's git repo -4. Removing Trac's project -5. Removing Project's ML -6. FAS Group Removal - -Introduction -============ - -This wiki page will help any sysadmin having a [50]Fedora Hosted Project -completely removed either because the owner requested to have it removed -or for whatever any other issue that would take us to remove a project. -This page covers git, Trac, Mailing List and FAS group clean-up. - -Our first move -============== - -If you are going to remove a Fedora Hosted's project, please remember to -create a folder into /srv/tmp that should follow the following syntax:: - - cd /srv/tmp && mkdir $project-hold-until-xx-xx-xx - -where xx-xx-xx should be substituted with the date everything should be -purged away from there. (it happens 14 days after the delete request) - -Removing Project's git repo -=========================== - -Having a git repository removed can be achieved with the following steps:: - - ssh uid@fedorahosted.org - cd /git - mv $project.git/ /srv/tmp/$project-hold-until-xx-xx-xx/ - -We're done with git! - -Removing Trac's project -======================= - -Steps are:: - - ssh uid@fedorahosted.org - cd /srv/web/trac/projects - mv $project/ /srv/tmp/$project-hold-until-xx-xx-xx/ - -and...that's all! - -Removing Project's ML -===================== - -We have two options here: - -Delete a list, but keep the archives:: - - sudo /usr/lib/mailman/bin/rmlist - -Delete a list and its archives:: - - sudo /usr/lib/mailman/bin/rmlist -a - -If you are going to completely remove the Mailing List and its archives, -please make sure the list is empty and there are no subscribers in it. - -FAS Group Removal -================= - -Not every Fedora sysadmin can have this done. See -[51]ISOP:ACCOUNT_DELETION for information. You may want to remove the -group or simply disable it. - diff --git a/docs/sops/fedorahosted-repo-setup.rst b/docs/sops/fedorahosted-repo-setup.rst deleted file mode 100644 index 768f092..0000000 --- a/docs/sops/fedorahosted-repo-setup.rst +++ /dev/null @@ -1,300 +0,0 @@ -.. title: Fedorahosted Repository Setup SOP -.. slug: infra-fedorahosted-repo-setup -.. date: 2014-09-24 -.. taxonomy: Contributors/Infrastructure - -======================= -Hosted repository setup -======================= - -Fedora provides SCM repositories for open source projects. - -Contents - -1. Mercurial Repository - - 3. Repo Setup - 2. Commit Mail - -2. Git Repository - - 1. Repo Setup - 2. Commit Mail - -3. Bazaar Repository -4. SVN Repository - - 1. Repo Setup - 2. Commit Mail - -Mercurial Repository -==================== - -You'll need to know three things in order to start the mercurial -repository. - -PROJECTNAME - what the project wants to be called. - -OLDURL - how to access the project's current sourcecode in their - mercurial repository. - -PROJECTGROUP - the group setup in the account system for readwrite - access to the repository. - -Repo Setup ----------- - -The Mercurial repository lives on the hosted server. Access it by logging -into hosted1 Then follow these steps: - -1. Fetch latest content from the FAS Database.:: - - $ fasClient -i -f - -2. Create the repo:: - - $ cd /hg - $ sudo hg clone -U $OLDURL $PROJECTNAME (or sudo mkdir $PROJECTNAME; cd $PROJECTNAME; sudo hg init) - $ sudo find $PROJECTNAME -type d -exec chmod g+s \{\} \; - $ sudo chmod -R g+w $PROJECTNAME - $ sudo chown -R root:$PROJECTGROUP $PROJECTNAME - -This should setup all the files needed for the repository. - -Commit Mail ------------ - -The Mercurial Notify extension can be used to send out email when -commits are pushed to a Mecurial repository. To enable notifications, -create the file ``/hg/$PROJECTNAME/.hg/hgrc``:: - - [extensions] - hgext.notify = - - [hooks] - changegroup.notify = python:hgext.notify.hook - - [email] - from = admin@fedoraproject.org - - [smtp] - host = localhost - - [web] - baseurl = http://hg.fedorahosted.org/hg - - [notify] - sources = serve push pull bundle - test = False - config = /hg/$PROJECTNAME/.hg/subscriptions - maxdiff = -1 - -And the file ``/hg/$PROJECTNAME/.hg/subscriptions``:: - - [usersubs] - - user@host = * - - [reposubs] - -Git Repository --------------- - -You'll need to know several things in order to start the git repository. - - -PROJECTNAME - what the project wants to be called. - -OLDURL - how to access the project's current source code in their git repository. - -PROJECTGROUP - the group setup in the account system for write access to the repository. - -COMMITLIST - comma-separated list of email addresses for commits (optional) - -DESCRIPTION - description of the project (optional) - -PROJECTOWNER - the FAS username of the project owner - -Repo Setup ----------- - -The git repository lives on the hosted server. Access it by logging into -hosted1 Then follow these steps: - -Fetch latest content from the FAS Database.:: - - $ sudo fasClient -i -f - - $ cd /git - -Clone an existing repository:: - - $ sudo git clone --bare $OLDURL $PROJECTNAME.git - $ cd $PROJECTNAME.git - $ sudo git config core.sharedRepository true - $ # - $ ## or - $ # - $ # Create a new repository: - $ sudo mkdir $PROJECTNAME.git - $ cd $PROJECTNAME.git - $ sudo git init --bare --shared=true - -Give the repository a nice description for gitweb:: - - $ echo $DESCRIPTION | sudo tee description > /dev/null - -Setup and run post-update hook. - -..note:: - We symlink this because /git is on a filesystem with noexec set) - -:: - - $ sudo ln -svf /usr/share/git-core/templates/hooks/post-update.sample ./hooks/post-update - $ sudo git update-server-info - -Ensure ownership and modes are correct:: - - $ sudo find -type d -exec chmod g+s \{\} \; - $ sudo find -perm /u+w -a ! -perm /g+w -exec chmod g+w \{\} \; - $ sudo chown -R $PROJECTOWNER:$PROJECTGROUP . - -This should setup all the files needed for the repository. The repository -owner can push changes into the repo by running:: - - $ git push ssh://git.fedorahosted.org/git/$PROJECTNAME.git/ master - -from within their local git repository. - -Commit Mail ------------ - -If they want commit mail, then there are a couple of additional steps.:: - - $ cd /git/$PROJECTNAME.git - $ sudo git config hooks.mailinglist $COMMITLIST - $ sudo git config hooks.maildomain fedoraproject.org - $ sudo git config hooks.emailprefix "[$PROJECTNAME]" - $ sudo git config hooks.repouri "http://git.fedorahosted.org/cgit/$PROJECTNAME.git" - $ sudo ln -svf /usr/share/git-core/post-receive-chained ./hooks/post-receive - $ sudo mkdir ./hooks/post-receive-chained.d - $ sudo ln -svf /usr/local/bin/git-notifier ./hooks/post-receive-chained.d/post-receive-email - $ sudo ln -svf /usr/local/share/git/hooks/post-receive-fedorahosted-fedmsg ./hooks/post-receive-chained.d/post-receive-fedmsg - -Bazaar Repository -================= -You'll need to know three things in order to start a bazaar repository. - - -PROJECTNAME - what the project wants to be called. - -OLDBRANCHURL - how to access the project's current sourcecode in - their previous bazaar repository. Note that a project may have - multiple branches that they want to import. Each branch will have a - separate URL. (The project can import the new branches after the - repository is created if they want.) - -PROJECTGROUP - the group setup in the account system for readwrite - access to the repository. - -Repo Setup ----------- - -The bzr repository lives on the hosted server. Access it by logging into -hosted1 then follow these steps: - -The first stage is to create the Bazaar repository. - -Fetch latest content from the FAS Database.:: - - $ fasClient -i -f - - $ cd /srv/bzr/ - $ # This creates a Bazaar repository which has shared storage between branches - $ sudo bzr init-repo $PROJECTNAME --no-trees - $ cd $PROJECTNAME - $ sudo bzr branch $OLDURL - $ sudo bzr branch $OLDURL2 - $ # [...] - $ sudo bzr branch $OLDURLN - $ cd .. - $ sudo find $PROJECTNAME -type d -exec chmod g+s \{\} \; - $ sudo chmod -R g+w $PROJECTNAME - $ sudo chown -R root:$PROJECTGROUP $PROJECTNAME - -This should be all that is needed. To checkout run:: - - bzr init-repo $MYLOCALPROJECTREPO - cd $MYLOCALPROJECTREPO - bzr branch bzr+ssh://bzr.fedorahosted.org/bzr/$PROJECTNAME/$BRANCHNAME - bzr branch bzr://bzr.fedorahosted.org/bzr/$PROJECTNAME/$BRANCHNAME/ - -.. note:: - If the end user checks out a branch without creating their own - repository they will need to create a local working tree by doing the - following:: - - cd $BRANCHNAME - bzr checkout --lightweight - -SVN Repository -============== - -You'll need to know two things in order to start a svn repository. - - -PROJECTNAME - what the project wants to be called. - -PROJECTGROUP - The Fedora account system group with read-write - access. - -COMMITLIST - comma-separated list of email addresses for commits - (optional) - -Repo Setup ----------- - -SVN lives on the hosted server. Access it by logging into hosted1. Then -run the following steps: - -Fetch latest content from the FAS Database.:: - - $ fasClient -i -f - -Create the repo:: - - $ cd /svn/ - $ sudo svnadmin create $PROJECTNAME - $ cd $PROJECTNAME - $ sudo chgrp -R $PROJECTGROUP . - $ sudo chmod -R g+w . - $ sudo find -type d -exec chmod g+s \{\} \; - -This should be all that is needed. To checkout run:: - - svn co svn+ssh://svn.fedorahosted.org/svn/$PROJECTNAME - -Commit Mail ------------ - -If they want commit mail, then there are a couple of additional steps.:: - - $ echo $COMMITLIST | sudo tee ./commit-list > /dev/null - $ sudo ln -sv /usr/bin/fedora-svn-commit-mail-hook ./hooks/post-commit - diff --git a/docs/sops/fedorahosted.rst b/docs/sops/fedorahosted.rst deleted file mode 100644 index 6c1a3b7..0000000 --- a/docs/sops/fedorahosted.rst +++ /dev/null @@ -1,114 +0,0 @@ -.. title: Fedorahosted Infrastructure SOP -.. slug: infra-fedorahosted -.. date: 2014-09-22 -.. taxonomy: Contributors/Infrastructure - -=============================== -Fedorahosted Infrastructure SOP -=============================== - -Provide hosting place for open source projects. - -.. important:: - This page is for administrators only. People wishing to request a hosted - project should use the Ticketing System ; see the - new project request template. (Requires Fedora Account) - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-hosted -Location - Serverbeach -Servers - hosted03, hosted04 -Purpose - Provide hosting place for open source projects - -Description -=========== - -fedorahosted.org can be used to host open source projects. It provides the -following facilities: - -1. An scm for maintaining the code. The currently supported SCMs include - Mercurial, Git, Bazaar, or SVN. There is no cvs. -2. A trac instance, which provides a mini-wiki for hosting information - and also provides a ticketing system. -3. A mailing list - -How to setup a new hosted project -================================= - -1. Create source group in Fedora Account System of the form - ex ``gitepel``, ``svnkernel``, etc - -2. Create source repo - -3. Log into hosted03 - -4. Create new project space:: - - sudo /usr/local/bin/hosted-setup.sh - - * must use the same case as the scm repo - * You're likely to end up with:: - - 'Command failed: columns username, action are not unique' - - this can be safely ignored as this only tries to tell you - that you are giving admin access to a person already - having admin access. - -5. If a mailing list is desired, follow the directions for the mailman SOP. - -How to import data from a cvs repo into git repo -================================================ - -Often users request their git repos to be imported from an existing cvs -repo. This is a two step process as follows:: - - git-cvsimport -v -d :pserver:anonymous@cvs.fedoraproject.org/cvs/docs -C - - sudo git clone --bare --no-hardlinks /git/.git/ - -Example:: - - git-cvsimport -v -d :pserver:anonymous@cvs.fedoraproject.org/cvs/docs -C translation-quick-start-guide translation-quick-start-guide - sudo git clone --bare --no-hardlinks translation-quick-start-guide/ /git/translation-quick-start-guide.git/ - -.. note:: - - Note that our git repos disallow non-fast-forward pushes by default. This - default makes the most sense, but sometimes, users understand the impact - of doing so, but still wish to make such a push. - - To enable this temporarily, edit the config file inside of the git repo, - and make sure that receive.denyNonFastforwards is set to false. Make sure - to reenable this once the user has finished their push. - -How to allow a project to redirect parts of their release tree -============================================================== - -A project may want to host parts of their release tree elsewhere (for -instance, moving docs from hosting inside of the fedorhosted release tree -to an external service). To do that, modify::: - - configs/web/fedorahosted.org/release.conf - -Adding a new Directory section like this:: - - # Allow python-fedora project to redirect documentation/release tree elsewhere - - AllowOverride FileInfo - - -Then tell the project that they can create a .htaccess file with the -Redirect (Note that the release tree can be reached by two URLs so you need to -redirect both of them):: - - Redirect permanent /releases/p/y/python-fedora/doc http://pythonhosted.org/python-fedora - Redirect permanent /released/python-fedora/doc/ http://pythonhosted.org/python-fedora diff --git a/docs/sops/fedorahostedrename.rst b/docs/sops/fedorahostedrename.rst deleted file mode 100644 index 75d6da9..0000000 --- a/docs/sops/fedorahostedrename.rst +++ /dev/null @@ -1,88 +0,0 @@ -.. title: Fedorahosted Project Rename SOP -.. slug: infra-fedorahosted-rename -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -=============================== -FedoraHosted Project Rename SOP -=============================== - -This describes the steps necessary to rename a project in Fedora Hosted. - -Contents -======== - -1. Rename the Trac instance -2. Rename the git / svn / hg / ... directory -3. Rename any old releases directories -4. Rename the group in FAS - -Rename the Trac instance -========================= - -:: - - cd /srv/web/trac/projects - mv oldname newname - cd newname/conf - sed -i -e 's/oldname/newname/' trac.ini - cd .. - sudo -u apache trac-admin . - resync - -Rename the git / svn / hg / ... directory -========================================= - -:: - - cd /git - mv oldname.git newname.git - -Rename any old releases directories -=================================== - -:: - - cd /srv/web/releases/o/l/oldname - -somehow, the newname releases dir gets created; if there were old releases, move them to the new location. - -Rename the group in FAS -======================= - -.. note:: - Don't blindly rename - fedorahosted groups are usually safe to rename. If the old group could be - present in other apps/configs, though, (like provenpackagers, perl-sig, - etc) do not rename them. The other apps would need to have the group name - updated there as well to make this safe. - -:: - - ssh db2 - sudo -u postgres psql fas2 - -:: - - BEGIN; - select * from groups where name = '$OLDNAME'; - update groups set name = '$NEWNAME' where name = '$OLDNAME'; - -* Check that only one row was modified:: - - select * from groups where name in ('$OLDNAME', '$NEWNAME'); - -* Check that there's only one row and the name == $NEWNAME - -* If incorrect, do ROLLBACK; instead of commit:: - - COMMIT; - -.. warning:: Don't delete groups - If, for some reason, you end up with a group in FAS that was a typo but it - doesn't conflict with anything else, don't delete it without talking to - other admins on fedora-infrastructure-list. The numeric group ids could be - present on a filesystem somewhere and removing the group could eventually - lead to the id being allocated to some other group which would give - unintended people access to the files. As a group we can figure out what - hosts and files need to be checked for this issue if a delete is needed. diff --git a/docs/sops/fedorapackages.rst b/docs/sops/fedorapackages.rst deleted file mode 100644 index d3e0d59..0000000 --- a/docs/sops/fedorapackages.rst +++ /dev/null @@ -1,83 +0,0 @@ -.. title: Fedora Packages SOP -.. slug: infra-fedora-packages -.. date: 2012-02-23 -.. taxonomy: Contributors/Infrastructure - -=================== -Fedora Packages SOP -=================== - -This SOP is for the Fedora Packages web application. -https://community.dev.fedoraproject.org/packages - -Contents -======== - -1. Contact Information -2. Building a new release -3. Deploying to the development server -4. Hotfixing -5. Checking for AGPL violations - -Contact Information -=================== - -Owner - Luke Macken - -Contact - lmacken@redhat.com - -Location - PHX2 - -Servers - community01.dev - -Purpose - Web interface for package information - -Building a new release -====================== -There is a helper script that lives in the fedoracommunity git repository -that automatically handles spinning up a new release, building it in mock, and -scping it to batcave. First, edit the version/release in the specfile and -setup.py, then run::: - - ./release - -Deploying to the development server: -===================================== - -There is a script in the fedoracommunity git repository called -'fcomm-dev-update' that you must first copy to the ansible server. Then you run -it with the same arguments as the release script. This tool will sign the -RPMs, copy them into the infrastructure testing repo, update the repodata, -and then run a bunch of func commands to update the package on the dev server. - -:: - - ./fcomm-dev-release - -Hotfixing -========= -If you wish to make a hotfix to the Fedora Packages application, simply -make your change in your local git repository, and then perform the building & -deployment steps above. This will still work even if you do not wish to commit -& push your change back upstream. - -In order to ensure AGPL compliance, we DO NOT do ansible based hotfixing for -Fedora Packages. - -Checking for AGPL violations -============================ - -To remain AGPL compliant, we must ensure that all modifications to the code -are made available in the SRPM that we link to in the footer of the -application. You can easily query our app servers to determine if any AGPL -violating code modifications have been made to the package.:: - - func-command --host="*app*" --host="community*" "rpm -V fedoracommunity" - -You can safely ignore any changes to non-code files in the output. If any -violations are found, the Infrastructure Team should be notified immediately. diff --git a/docs/sops/fedorapastebin.rst b/docs/sops/fedorapastebin.rst deleted file mode 100644 index 7cc25f9..0000000 --- a/docs/sops/fedorapastebin.rst +++ /dev/null @@ -1,89 +0,0 @@ -.. title: Fedora Pastebin SOP -.. slug: infra-fpaste -.. date: 2013-04-15 -.. taxonomy: Contributors/Infrastructure - -=================== -Fedora Pastebin SOP -=================== - -Contents -======== - -1. Contact Information -2. Introduction -3. Installation -4. Dashboard -5. Add a word to censored list - - -1. Contact Information ------------------------ - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Persons - athmane herlo -Sponsor - nirik -Location - phx2 -Servers - paste01.stg, paste01.dev -Purpose - To host Fedora Pastebin - - -2. Introduction ----------------- - -Fedora pastebin is powered by sticky-notes which is included in EPEL. - -Fedora theming (skin) is included in ansible role. - - -3. Installation ----------------- - -Sticky-notes needs a MySQL db and a user with 'select, update, delete, insert' privileges. - -It's recommended to dump and import db from a working installation -to save time (skipping the installation and tweaking). - -By default the installation is locked ie: you can't relaunch it. - -However, you can unlock the installation by commenting the line containing -``$gsod->trigger`` in ``/etc/sticky-notes/install.php`` then pointing the web browser to '/install' - -The configuration file containing general settings and DB credentials -is located in ``/etc/sticky-notes/config.php`` - -4. Dashboard -------------- - -Sticky-notes has a dashboard (URL: /admin/) that can be used to : - -- Manage pastes: - - deleting paste - - getting information about the paste author (IP/Date/time etc...) -- Manage users (aka admins) which can log into the dashboard -- Manage IP Bans (add / delete banned IPs). -- Authentication (not needed) -- Site configuration: - - General configuration (included in config.php). - - Project Honey Pot configuration (not a FOSS service) - - Word censor configuration: a list of words to be censored in pastes. - -5. Add a word to censored list ------------------------------- - -If a word is in censored list, any paste containing that word will be -rejected, to add one, edit the variable '$sg_censor' in sticky-notes configuration file.:: - - $sg_censor = "WORD1 - WORD2 - ... - ... - WORDn"; diff --git a/docs/sops/fedorawebsites.rst b/docs/sops/fedorawebsites.rst deleted file mode 100644 index a850361..0000000 --- a/docs/sops/fedorawebsites.rst +++ /dev/null @@ -1,314 +0,0 @@ -.. title: Websites Release SOP -.. slug: infra-websites -.. date: 2015-08-27 -.. taxonomy: Contributors/Infrastructure - -=================== -Webites Release SOP -=================== - - - * 1. Preparing the website for a release - - * 1.1 Obsolete GPG key of the EOL Fedora release - * 1.2 Update GPG key - * 1.2.1 Steps - - * 2. Update website - - * 2.1 For Alpha - * 2.2 For Beta - * 2.3 For GA - - * 3. Fire in the hole - - * 4. Tips - - * 4.1 Merging branches - - - - 1. Preparing the website for a new release cycle - - 1.1 Obsolete GPG key - - One month after a Fedora release the release number 'FXX-2' (i.e. 1 month - after F21 release, F19 will be EOL) will be EOL (End of Life). - At this point we should drop the GPG key from the list in verify/ and move - the keys to the obsolete keys page in keys/obsolete.html. - - 1.2 Update GPG key - - After another couple of weeks and as the next release approaches, watch - the fedora-release package for a new key to be added. Use the update-gpg-keys - script in the fedora-web git repository to add it to static/. Manually add it - to /keys and /verify in all websites where we use these keys: - * arm.fpo - * getfedora.org - * labs.fpo - * spins.fpo - - 1.2.1 Steps - - a) Get a copy of the new key(s) from the fedora-release repo, you will - find FXX-primary and FXX-secondary keys. Save them in ./tools to make the - update easier. - - https://pagure.io/fedora-repos - - b) Start by editing ./tools/update-gpg-keys and adding the key-ids of - any obsolete keys to the obsolete_keys list. - - c) Then run that script to add the new key(s) to the fedora.gpg block: - - fedora-web git:(master) cd tools/ - tools git:(master) ./update-gpg-keys RPM-GPG-KEY-fedora-23-primary - tools git:(master) ./update-gpg-keys RPM-GPG-KEY-fedora-23-secondary - - This will add the key(s) to the keyblock in static/fedora.gpg and - create a text file for the key in static/$KEYID.txt as well. Verify - that these files have been created properly and contain all the keys - that they should. - - * Handy checks: gpg static/fedora.gpg or gpg static/$KEYID.txt - * Adding "--with-fingerprint" option will add the fingerprint to the - output - - The output of fedora.gpg should contain only the actual keys, not the - obsolete keys. - The single text files should contain the correct information for the - uploaded key. - - d) Next, add new key(s) to the list in data/verify.html and move the new - key informations in the keys page in data/content/keys/index.html. A - script to aid in generating the HTML code for new keys is in - ./tools/make-gpg-key-html. - It will print HTML to stdout for each RPM-GPG-KEY-* file given as - arguments. This is suitable for copy/paste (or directly importing if - your editor supports this). - Check the copied HTML code and select if the key info is for a primary - or secondary key (output says 'Primary or Secondary'). - - tools git:(master) ./make-gpg-key-html RPM-GPG-KEY-fedora-23-primary - - Build the website with 'make en test' and carefully verify that the - data is correct. Please double check all keys in http://localhost:5000/en/keys - and http://localhost:5000/en/verify. - - NOTE: the tool will give you an outdated output, adapt it to the new - websites and bootstrap layout! - - - 2. Update website - - 2.1 For Alpha - - a) Create the fXX-alpha branch from master - fedora-web git:(master) git push origin master:refs/heads/f22-alpha - - and checkout to the new branch: - fedora-web git:(master) git checkout -t -b f13-alpha origin/f13-alpha - - b) Update the global variables - Change curr_state to Alpha for all arches - - c) Add Alpha banner - Upload the FXX-Alpha banner to static/images/banners/f22alpha.png - which should appear in every ${PRODUCT}/download/index.html page. - Make sure the banner is shown in all sidebars, also in labs, spins, and arm. - - d) Check all Download links and paths in ${PRODUCT}/prerelease/index.html - You can find all paths in bapp01 (sudo su - mirrormanager first) or - you can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage - - e) Add CHECKSUM files to static/checksums and verify that the paths are - correct. The files should be in sundries01 and you can query them with: - $ find /pub/fedora/linux/releases/test/17-Alpha/ -type f -name \ - *CHECKSUM* -exec cp '{}' . \; - Remember to add the right checksums to the right websites (same path). - - f) Add EC2 AMI IDs for Alpha. All IDs now are in the globalvar.py file. - We get all data from there, even the redirect path to trac the AMI IDs. - We now also have a script which is useful to get all the AMI IDs uploaded - with fedimg. Execute it to get the latest uploads, but don't run the script too - early, as new builds are added constantly. - fedora-web git:(fXX-alpha) python ~/fedora-web/tools/get_ami.py - - g) Add CHECKSUM files also to http://spins.fedoraproject.org in - static/checksums. Verify the paths are correct in data/content/verify.html. - (see point e) to query them on sundries01). Same for labs.fpo and arm.fpo. - - h) Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo. - - i) Update Alpha Image sizes and pre_cloud_composedate in ./build.d/globalvar.py. - Verify they are right in Cloud images and Docker image. - - j) Update the new POT files and push them to Zanata (ask a maintainer to do - so) every time you change text strings. - - k) Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to - test the pages online. - - l) Release Date: - * Merge the fXX-alpha branch to master and correct conflicts manually - * Remove the redirect of prerelease pages in ansible, edit: - * ansible/playbooks/include/proxies-redirects.yml - * ask a sysadmin-main to run playbook - * When ready and about 90 minutes before Release Time push to master - * Tag the commit as new release and push it too: - $ git tag -a FXX-Alpha -m 'Releasing Fedora XX Alpha' - $ git push --tags - * If needed follow "Fire in the hole" below. - - - 2.2 For Beta - - a) Create the fXX-beta branch from master - fedora-web git:(master) git push origin master:refs/heads/f22-beta - - and checkout to the new branch: - fedora-web git:(master) git checkout -t -b f22-beta origin/f22-beta - - b) Update the global variables - Change curr_state to Beta for all arches - - c) Add Alpha banner - Upload the FXX-Beta banner to static/images/banners/f22beta.png - which should appear in every ${PRODUCT}/download/index.html page. - Make sure the banner is shown in all sidebars, also in labs, spins, and arm. - - d) Check all Download links and paths in ${PRODUCT}/prerelease/index.html - You can find all paths in bapp01 (sudo su - mirrormanager first) or - you can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage - - e) Add CHECKSUM files to static/checksums and verify that the paths are - correct. The files should be in sundries and you can query them with: - $ find /pub/fedora/linux/releases/test/17-Beta/ -type f -name \ - *CHECKSUM* -exec cp '{}' . \; - Remember to add the right checksums to the right websites (same path). - - f) Add EC2 AMI IDs for Beta. All IDs now are in the globalvar.py file. - We get all data from there, even the redirect path to trac the AMI IDs. - We now also have a script which is useful to get all the AMI IDs uploaded - with fedimg. Execute it to get the latest uploads, but don't run the script too - early, as new builds are added constantly. - fedora-web git:(fXX-beta) python ~/fedora-web/tools/get_ami.py - - g) Add CHECKSUM files also to http://spins.fedoraproject.org in - static/checksums. Verify the paths are correct in data/content/verify.html. - (see point e) to query them on sundries01). Same for labs.fpo and arm.fpo. - - h) Remove static/checksums/Fedora-XX-Alpha* in all websites. - - i) Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo. - - j) Update Beta Image sizes and pre_cloud_composedate in ./build.d/globalvar.py. - Verify they are right in Cloud images and Docker image. - - k) Update the new POT files and push them to Zanata (ask a maintainer to do - so) every time you change text strings. - - l) Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to - test the pages online. - - m) Release Date: - * Merge the fXX-beta branch to master and correct conflicts manually - * When ready and about 90 minutes before Release Time push to master - * Tag the commit as new release and push it too: - $ git tag -a FXX-Beta -m 'Releasing Fedora XX Beta' - $ git push --tags - * If needed follow "Fire in the hole" below. - - - 2.3 For GA - - a) Create the fXX branch from master - fedora-web git:(master) git push origin master:refs/heads/f22 - - and checkout to the new branch: - fedora-web git:(master) git checkout -t -b f22 origin/f22 - - b) Update the global variables - Change curr_state for all arches - - c) Check all Download links and paths in ${PRODUCT}/download/index.html - You can find all paths in bapp01 (sudo su - mirrormanager first) or - you can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage - - d) Add CHECKSUM files to static/checksums and verify that the paths are - correct. The files should be in sundries01 and you can query them with: - $ find /pub/fedora/linux/releases/17/ -type f -name \ - *CHECKSUM* -exec cp '{}' . \; - Remember to add the right checksums to the right websites (same path). - - e) At some point freeze translations. Add an empty PO_FREEZE file to every - website's directory you want to freeze. - - f) Add EC2 AMI IDs for GA. All IDs now are in the globalvar.py file. - We get all data from there, even the redirect path to trac the AMI IDs. - We now also have a script which is useful to get all the AMI IDs uploaded - with fedimg. Execute it to get the latest uploads, but don't run the script too - early, as new builds are added constantly. - fedora-web git:(fXX) python ~/fedora-web/tools/get_ami.py - - g) Add CHECKSUM files also to http://spins.fedoraproject.org in - static/checksums. Verify the paths are correct in data/content/verify.html. - (see point e) to query them on sundries01). Same for labs.fpo and arm.fpo. - - h) Remove static/checksums/Fedora-XX-Beta* in all websites. - - i) Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo. - - j) Update GA Image sizes and cloud_composedate in ./build.d/globalvar.py. - Verify they are right in Cloud images and Docker image. - - k) Update static/js/checksum.js and check if the paths and checksum still match. - - l) Update the new POT files and push them to Zanata (ask a maintainer to do - so) every time you change text strings. - - m) Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to - test the pages online. - - n) Release Date: - * Merge the fXX-beta branch to master and correct conflicts manually - * Add the redirect of prerelease pages in ansible, edit: - * ansible/playbooks/include/proxies-redirects.yml - * ask a sysadmin-main to run playbook - * Unfreeze translations by deleting the PO_FREEZE files - * When ready and about 90 minutes before Release Time push to master - * Update the short links for the Cloud Images for 'Fedora XX', 'Fedora - XX-1' and 'Latest' - * Tag the commit as new release and push it too: - $ git tag -a FXX -m 'Releasing Fedora XX' - $ git push --tags - * If needed follow "Fire in the hole" below. - - - 3. Fire in the hole - - We now use ansible for everything, and normally use a regular build to make - the websites live. If something is not happening as expected, you should get in - contact with a sysadmin-main to run the ansible playbook again. - - All our stuff, such as SyncStatic.sh and SyncTranslation.sh scripts are now - also in ansible! - - Staging server app02 and production server bapp01 do not exist anymore, now our staging - websites are on sundries01.stg and the production on sundries01. Change your scripts - accordingly and as sysadmin-web you should have access to those servers as before. - - - 4. Tips - - 4.1 Merging branches - - Suggested by Ricky - This can be useful if you're *sure* all new changes on devel branch should go into - the master branch. Conflicts will be solved directly accepting only the changes - in the devel branch. - If you're not 100% sure do a normal merge and fix conflicts manually! - - $ git merge f22-beta - $ git checkout --theirs f22-beta [list of conflicting po files] - $ git commit diff --git a/docs/sops/fmn.rst b/docs/sops/fmn.rst deleted file mode 100644 index 5466a7d..0000000 --- a/docs/sops/fmn.rst +++ /dev/null @@ -1,81 +0,0 @@ -.. title: fedmsg Notifications SOP -.. slug: infra-fmn -.. date: 2015-03-24 -.. taxonomy: Contributors/Infrastructure - -============================== -fmn (fedmsg notifications) SOP -============================== - -Route individualized notifications to fedora contributors over email, irc. - -Contact Information -------------------- - -Owner - Messaging SIG, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc -Servers - notifs-backend01, notifs-web0{1,2} -Purpose - Route notifications to users - -Description ------------ - -fmn is a pair of systems intended to route fedmsg notifications to Fedora -contributors and users. - -There is a web interface running on notifs-web01 and notifs-web02 that -allows users to login and configure their preferences to select this or that -type of message. - -There is a backend running on notifs-backend01 where most of the work is -done. - -The backend process is a 'fedmsg-hub' daemon, controlled by systemd. - -Disable an account (on notifs-backend01):: - - $ sudo -u fedmsg /usr/local/bin/fmn-disable-account USERNAME - -Restart:: - - $ sudo systemctl restart fedmsg-hub - -Watch logs:: - - $ sudo journalctl -u fedmsg-hub -f - -Configuration:: - - $ ls /etc/fedmsg.d/ - $ sudo fedmsg-config | less - -Monitor performance:: - - http://threebean.org/fedmsg-health-day.html#FMN - -Upgrade (from batcave):: - - $ sudo -i ansible-playbook /srv/web/infra/ansible/playbooks/manual/upgrade/fmn.yml - -Mailing Lists -------------- - -We use FMN as a way to forward certain kinds of messages to mailing lists so -people can read them the good old fashioned way that they like to. To -accomplish this, we create 'bot' FAS accounts with their own FMN profiles and -we set their email addresses to the lists in question. - -If you need to change the way some set of messages are forwarded, you can do -it from the FMN web interface (if you are an FMN admin as defined in the config -file in roles/notifs/frontend/). You can navigate to -https://apps.fedoraproject.org/notifications/USERNAME.id.fedoraproject.org to do -this. - -If the account exists as a FAS user already (for instance, the ``virtmaint`` -user) but it does not yet exist in FMN, you can add it to the FMN database by -logging in to notifs-backend01 and running ``fmn-create-user --email -DESTINATION@EMAIL.COM --create-defaults FAS_USERNAME``. diff --git a/docs/sops/freemedia.rst b/docs/sops/freemedia.rst deleted file mode 100644 index cb51f7c..0000000 --- a/docs/sops/freemedia.rst +++ /dev/null @@ -1,194 +0,0 @@ -.. title: FreeMedia Infrastructure SOP -.. slug: infra-freemedia -.. date: 2014-12-18 -.. taxonomy: Contributors/Infrastructure - -FreeMedia Infrastructure SOP - -This page is for defining the SOP for Fedora FreeMedia Program. This will -cover the infrastructural things as well as procedural things. - -Contents -======== - -1. Location of Resources -2. Location on Ansible -3. Opening of the form -4. Closing of the Form -5. Tentative timeline -6. How to - - 1. Open - 2. Close - -7. Handling of tickets - - 1. Login - 2. Rejecting Invalid Tickets - 3. Accepting Valid Tickets - -8. Handling of non fulfilled requests -9. How to handle membership applications - -Location of Resources -===================== -* The web form is at - https://fedoraproject.org/freemedia/FreeMedia-form.html -* The TRAC is at [63]https://fedorahosted.org/freemedia/report - -Location on ansible -=================== - -$PWD = ``roles/freemedia/files`` - -Freemedia form - FreeMedia-form.html -Backup form - FreeMedia-form.html.orig -Closed form - FreeMedia-close.html -Backend processing script - process.php -Error Document - FreeMedia-error.html - -Opening of the form -=================== - -The form will be opened on the First day of each month. - -Closing of the Form -=================== - -Tentative timeline ------------------- - -The form will be closed after a couple of days. This may vary according to -the capacity. - -How to -====== - -* The form is available at - ``roles/freemedia/files/FreeMedia-form.html`` and - ``roles/freemedia/files//FreeMedia-form.html.orig`` - -* The closed form is at - ``roles/freemedia/files/FreeMedia-close.html`` - -Open ----- - -* Goto roles/freemedia/tasks -* Open ``main.yml`` -* Goto line 32. -* To Open: Change the line to read:: - src="FreeMedia-form.html" -* After opening the form, go to trac and grant "Ticket Create and - Ticket View" privilege to "Anonymous". - -Close ------ - -* Goto roles/freemedia/tasks -* Open main.yml -* Goto line 32. -* To Close: Change the line to read:: - src="FreeMedia-close.html", -* After closing the form, go to trac and remove "Ticket Create and - Ticket View" privilege from "Anonymous". - -.. note:: - * Have to check about monthly cron. - * Have to write about changing init.pp for closing and opening - -Handling of tickets -=================== - -Login ------ - -* Contributors are requested to visit - https://fedorahosted.org/freemedia/report -* Please login with your FAS account. - -Rejecting Invalid Tickets -------------------------- - -* If a ticket is invalid, don't accept the request. Go to "resolve as:" - and select "invalid" and then press "Submit Changes". - -* A ticket is Invalid if - - * No Valid email-id is provided. - * The region does not match the country. - * No Proper Address is given. - -* If a ticket is duplicate, accept one copy, close the others as - duplicate Go to "resolve as:" and select "duplicate" and then press - "Submit Changes". - -Accepting Valid Tickets ------------------------ -* If you wish to fulfill a request, please ensure it from the above - section, it is not liable to be discarded. - -* Now "Accept" the ticket from the "Action" field at the bottom, and - press the "Submit Changes" button. - -* These accepted tickets will be available from - https://fedorahosted.org/freemedia/report user both "My Tickets" - and "Accepted Tickets for XX" (XX= your region e.g APAC) - -* When You ship the request, please go to the ticket again, go to - "resolve as:" from the "Action" field and select "Fixed" and then - press "Submit Changes". - -* If an accepted ticket is not finalised by the end of the month, is - should be closed with "shipping status unknown" in a comment - -Handling of non fulfilled requests ----------------------------------- - -We shall close all the pending requests by the end of the Month. - -* Please Check your region - -How to handle membership applications -------------------------------------- - -Steps to become member of Free-media Group. - -1. Create an account in Fedora Account System (FAS) -2. Create an user page in Fedora Wiki with contact data. Like - User:. There are templates. -3. Apply to Free-Media Group in FAS -4. Apply to Free-Media mailing list subscription - -Rules for deciding over membership applications -```````````````````````````````````````````````` -======= ================ ========== =============== ========================= -Case Applied to User Page Applied to Action - Free-Media Group Created Free-Media List -======= ================ ========== =============== ========================= -1 Yes Yes Yes Approve Group and mailing - list applications -------- ---------------- ---------- --------------- ------------------------- - Put on hold + Write to -2 Yes Yes No subscribe to list Within - a Week -------- ---------------- ---------- --------------- ------------------------- - Put on hold + Write to -3 Yes No whatever make User Page Within a - Week -------- ---------------- ---------- --------------- ------------------------- -4 No No Yes Reject -======= ================ ========== =============== ========================= - -.. note:: - 1. As you need to have an FAS account for steps 2 and 3, this is not - included in the decision rules above - 2. The time to be on hold is one week. If not action is taken after one - week, the application has to be rejected. - 3. When writing asking to fulfil steps, send CC to other Free-media - sponsors to let them know the application has been reviewed. diff --git a/docs/sops/freenode-irc-channel.rst b/docs/sops/freenode-irc-channel.rst deleted file mode 100644 index 539a3b9..0000000 --- a/docs/sops/freenode-irc-channel.rst +++ /dev/null @@ -1,88 +0,0 @@ -.. title: Freenode IRC SOP -.. slug: infra-freenode -.. date: 2013-11-08 -.. taxonomy: Contributors/Infrastructure - -======================================= -Freenode IRC Channel Infrastructure SOP -======================================= - -Fedora uses the freenode IRC network for it's IRC communications. If you -want to make a new Fedora Related IRC Channel, please follow the following -guidelines. - -Contents -======== - -1. Contact Information -2. Is a new channel needed? -3. Adding new channel -4. Recovering/fixing an existing channel - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin -Location: - freenode -Servers: - none -Purpose: - Provides a channel for Fedora contributors to use. - -Is a new channel needed? -======================== - -First you should see if one of the existing Fedora channels will meet your -needs. Adding a new channel can give you a less noisy place to focus on -something, but at the cost of less people being involved. If you -topic/area is development related, perhaps the main #fedora-devel channel -will meet your needs? - -Adding new channel -================== - -* Make sure the channel is in the #fedora-* namespace. This allows the - Fedora Group Coordinator to make changes to it if needed. - -* Found the channel. You do this by /join #channelname, then /msg - chanserv register #channelname - -* Setup GUARD mode. This allows ChanServ to be in the channel for easier - management: ``/msg chanserv set #channel GUARD on`` - -* Add Some other Operators/Managers to the access list. This would allow - them to manage the channel if you are asleep or absent.:: - - /msg chanserv access #channel add NICK +ARfiorstv - -You can see what the various flags mean at http://toxin.jottit.com/freenode_chanserv_commands#cs03 - -You may want to consider adding some or all of the folks in #fedora-ops -who manage other channels to help you with yours. You can see this list -with `/msg chanserv access #fedora-ops list`` - -* Set default modes. - ``/msg chanserv set mlock #channel +Ccnt`` - (The t for topic lock is optional, if your channel would like - to have people change the topic often). - -* If your channel is of general interest, add it to the main communicate - page of IRC Channels, and possibly announce it to your target - audience. - -* You may want to request zodbot join your channel if you need it's - functions. You can request that in #fedora-admin. - -Recovering/fixing an existing channel -===================================== - -If there is an existing channel in the #fedora-* namespace that has a -missing founder/operator, please contact the Fedora Group Coordinator: -[49]User:Spot and request it be reassigned. Follow the above procedure -on the channel once done so it's setup and has enough -operators/managers to not need reassiging again. - diff --git a/docs/sops/gather-easyfix.rst b/docs/sops/gather-easyfix.rst deleted file mode 100644 index 66b252f..0000000 --- a/docs/sops/gather-easyfix.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. title: gather-easyfix SOP -.. slug: infra-gather-easyfix -.. date: 2016-03-14 -.. taxonomy: Contributors/Infrastructure - -========================= -Fedora gather easyfix SOP -========================= - -Fedora-gather-easyfix as the name says gather tickets marked as easyfix from -multiple sources (pagure, github and fedorahosted currently). Providing a single -place for new-comers to find small tasks to work on. - - -Contents -======== - -1. Contact Information -2. Documentation Links - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - http://fedoraproject.org/easyfix/ -Servers - sundries01, sundries02, sundries01.stg -Purpose - Gather easyfix tickets from multiple sources. - - -Upstream sources are hosted on github at: -https://github.com/fedora-infra/fedora-gather-easyfix/ - -The files are then mirrored to our ansible repo, under the `easyfix/gather` -role. - -The project is a simple script ``gather_easyfix.py`` gathering information from -the projects sets on the `Fedora wiki -`_ and outputing a single html file. -This html file is then improved via the css and javascript files present in the -sources. - -The generated html file together with the css and js files are then synced to -the proxies for public consumption :) diff --git a/docs/sops/geoip-city-wsgi.rst b/docs/sops/geoip-city-wsgi.rst deleted file mode 100644 index cf0d486..0000000 --- a/docs/sops/geoip-city-wsgi.rst +++ /dev/null @@ -1,69 +0,0 @@ -.. title: geoip-city-wsgi SOP -.. slug: geoip-city-wsgi -.. date: 2017-01-30 -.. taxonomy: Contributors/Infrastructure - - -==================== -geoip-city-wsgi SOP -==================== - -A simple web service that return geoip information as JSON-formatted dictionary in utf-8. Particularly, it's used by anaconda[1] to get the most probable territory code, based on the public IP of the caller. - -Contents -======== - -1. Contact Information -2. Basic Function -3. Ansible Roles -4. Apps depending of geoip-city-wsgi -5. Documentation Links - - -Contact Information -==================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-admin, #fedora-noc -Location - https://geoip.fedoraproject.org -Servers - sundries*, sundries*-stg -Purpose - A simple web service that return geoip information as JSON-formatted dictionary in utf-8. Particularly, it's used by anaconda[1] to get the most probable territory code, based on the public IP of the caller. - -Basic Function -============== - -- Users go to https://geoip.fedoraproject.org/city - -- The website is exposed via ``/etc/httpd/conf.d/geoip-city-wsgi-proxy.conf``. - -- Return a string with geoip information with syntax as JSON-formatted dict in utf8 - -- It also currently accepts one override: ?ip=xxx.xxx.xxx.xxx, e.g. https://geoip.fedoraproject.org/city?ip=18.0.0.1 which then uses the passed IP address instead of the determined IP address of the client. - - -Ansible Roles -============== -The geoip-city-wsgi role https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/geoip-city-wsgi -is present in sundries playbook https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/groups/sundries.yml - -the proxy task are present in -https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/include/proxies-reverseproxy.yml - -Apps depending of geoip-city-wsgi -================================= -unknown - -Documentation Links -=================== - -app: https://geoip.fedoraproject.org -source: https://github.com/fedora-infra/geoip-city-wsgi -bugs: https://github.com/fedora-infra/geoip-city-wsgi/issues -Role: https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/geoip-city-wsgi -[1] https://fedoraproject.org/wiki/Anaconda - diff --git a/docs/sops/github.rst b/docs/sops/github.rst deleted file mode 100644 index 3a4f182..0000000 --- a/docs/sops/github.rst +++ /dev/null @@ -1,77 +0,0 @@ -.. title: Fedora Infrastructure Github SOP -.. slug: infra-githup -.. date: 2014-09-26 -.. taxonomy: Contributors/Infrastructure - -=============================== -Using github for Infra Projects -=============================== - -We're presently using github to host git repositories and issue tracking for -some infrastructure projects. Anything we need to know should be recorded -here. - ---------------------- -Setting up a new repo ---------------------- - -Create projects inside of the fedora-infra group: - -https://github.com/fedora-infra - -That will allow us to more easily track what projects we have. - -[TODO] How do we create a new project and import it? - -- After creating a new repo, click on the Settings tab to set up some fancy - things. - - If using git-flow for your project: - - - Set the default branch from 'master' to 'develop'. Having the default - branch be develop is nice: new contributors will automatically start - committing there if they're not paying attention to what branch they're - on. You almost never want to commit directly to the master branch. - - If there does not exist a develop branch, you should create one by - branching off of master.:: - - $ git clone GIT_URL - $ git checkout -b develop - $ git push --all - - - Set up an IRC hook for notifications. From the "settings" tab click on - "Webhooks & Services." Under the "Add Service" dropdown, find "IRC" and - click it. You might need to enter your password. - In the form, you probably want the following values: - - - Server, irc.freenode.net - - Port, 6697 - - Room, #fedora-apps - - Nick, - - Branch Regexes, - - Password, - - Ssl, - - Message Without Join, - - No Colors, - - Long Url, - - Notice, - - Active, - - -Add an EasyFix label -==================== - -The EasyFix label is used to mark bugs that are potentially fixable by new -contributors getting used to our source code or relatively new to python -programming. GitHub doesn't provide this label automatically so we have to -add it. You can add the label from the issues page of the repository or use -this curl command to add it:: - - curl -k -u '$GITHUB_USERNAME:$GITHUB_PASSWORD' https://api.github.com/repos/fedora-infra/python-fedora/labels -H "Content-Type: application/json" -d '{"name":"EasyFix","color":"3b6eb4"}' - -Please try to use the same color for consistency between Fedora Infrastructure -Projects. You can then add the github repo to the list that -easyfix.fedoraproject.org scans for easyfix tickets here: - -https://fedoraproject.org/wiki/Easyfix diff --git a/docs/sops/github2fedmsg.rst b/docs/sops/github2fedmsg.rst deleted file mode 100644 index 16c6854..0000000 --- a/docs/sops/github2fedmsg.rst +++ /dev/null @@ -1,62 +0,0 @@ -.. title: github2fedmsg SOP -.. slug: infra-github2fedmsg -.. date: 2016-04-08 -.. taxonomy: Contributors/Infrastructure - -================= -github2fedmsg SOP -================= - -Bridge github events onto our fedmsg bus. - -App: https://apps.fedoraproject.org/github2fedmsg/ -Source: https://github.com/fedora-infra/github2fedmsg/ - -Contact Information -------------------- - -Owner - Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-admin, #fedora-noc -Servers - github2fedmsg01 -Purpose - Bridge github events onto our fedmsg bus. - -Description ------------ - -github2fedmsg is a small Python Pyramid app that bridges github events onto our -fedmsg bus by way of github's "webhooks" feature. It is what allows us to have -IRC notifications of github activity via fedmsg. It has two phases of -operation: - -- Infrequently, a user will log in to github2fedmsg via Fedora OpenID. They - then push a button to also log in to github.com. They are then logged in to - github2fedmsg with *both* their FAS account and their github account. - - They are then presented with a list of their github repositories. They can - toggle each one: "on" or "off". When they turn a repo on, our webapp makes a - request to github.com to install a "webhook" for that repo with a callback URL - to our app. - -- When events happen to that repo on github.com, github looks up our callback - URL and makes an http POST request to us, informing us of the event. Our - github2fedmsg app receives that, validates it, and then republishes the - content to our fedmsg bus. - -What could go wrong? --------------------- - -- Restarting the app or rebooting the host shouldn't cause a problem. It should - come right back up. - -- Our database could die. We have a db with a list of all the repos we have - turned on and off. We would want to restore that from backup. - -- If github gets compromised, they might have to revoke all of their application - credentials. In that case, our app would fail to work. There are *lots* of - private secrets set in our private repo that allow our app to talk to - github.com. There are inline comments there with instructions about how to - generate new keys and secrets. diff --git a/docs/sops/gitweb.rst b/docs/sops/gitweb.rst deleted file mode 100644 index 9df9172..0000000 --- a/docs/sops/gitweb.rst +++ /dev/null @@ -1,38 +0,0 @@ -.. title: Gitweb Infrastructure SOP -.. slug: infra-gitweb -.. date: 2011-08-23 -.. taxonomy: Contributors/Infrastructure - -========================= -Gitweb Infrastructure SOP -========================= - -Gitweb-caching is the web interface we use to expose git to the web at -http://git.fedorahosted.org/git/ - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-hosted -Location - Serverbeach -Servers - hosted[1-2] -Purpose - Http access to git sources. - -Basic Function -============== - -- Users go to [46]http://git.fedorahosted.org/git/ - -- Pages are generated from cache stored in ``/var/cache/gitweb-caching/``. - -- The website is exposed via ``/etc/httpd/conf.d/git.fedoraproject.org.conf``. - -- Main config file is ``/var/www/gitweb-caching/gitweb_config.pl``. - This pulls git repos from /git/. - diff --git a/docs/sops/guestdisk.rst b/docs/sops/guestdisk.rst deleted file mode 100644 index 17dd70e..0000000 --- a/docs/sops/guestdisk.rst +++ /dev/null @@ -1,116 +0,0 @@ -.. title: Guest Disk Resize SOP -.. slug: infra-guest-disk-resize -.. date: 2012-06-13 -.. taxonomy: Contributors/Infrastructure - -===================== -Guest Disk Resize SOP -===================== - -Resize disks in our kvm guests - -Contents -======== - -1. Contact Information -2. How to do it - - 1. KVM/libvirt Guests - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin-main -Location: - PHX, Tummy, ibiblio, Telia, OSUOSL -Servers: - All xen servers, kvm/libvirt servers. -Purpose: - Resize guest disks - -How to do it -============ - -KVM/libvirt Guests ------------------- - -1. SSH to the kvm server and resize the guest's logical volume. If you - want to be extra careful, make a snapshot of the LV first:: - - lvcreate -n [guest name]-snap -L 10G -s /dev/VolGroup00/[guest name] - - Optional, but always good to be careful - -2. Shutdown the guest:: - - sudo virsh shutdown [guest name] - -3. Disable the guests lv:: - - lvchange -an /dev/VolGroup00/[guest name] - -4. Resize the lv:: - - lvresize -L [NEW TOTAL SIZE]G /dev/VolGroup00/[guest name] - - or - - lvresize -L +XG /dev/VolGroup00/[guest name] - (to add X GB to the disk) - -5. Enable the lv:: - - lvchange -ay /dev/VolGroup00/[guest name] - -6. Bring the guest back up:: - - sudo virsh start [guest name] - -7. Login into the guest:: - - sudo virsh console [guest name] - You may wish to boot single user mode to avoid services coming up and going down again - -8. On the guest, run:: - - fdisk /dev/vda - -9. Delete the the LVM partition on the guest you want to add space to and - recreate it with the maximum size. Make sure to set its type to LV (8e) - -10. Run partprobe:: - - partprobe - -11. Check the size of the partition:: - - fdisk -l /dev/vdaN - - If this still reflects the old size, then reboot the guest and verify - that its size changed correctly when it comes up again. - -12. Login to the guest again, and run:: - - pvresize /dev/vdaN - -13. A vgs should now show the new size. Use lvresize to resize the root lv:: - - lvresize -L [new root partition size]G /dev/GuestVolGroup00/root - - (pvs will tell you how much space is available) - -14. Finally, resize the root partition:: - - resize2fs /dev/GuestVolGroup00/root - (If the root fs is ext4) - - or - - xfs_growfs /dev/GuestVolGroup00/root - (if the root fs is xfs) - - verify that everything worked out, and delete the snapshot you made - if you made one. diff --git a/docs/sops/guestedit.rst b/docs/sops/guestedit.rst deleted file mode 100644 index bcca35f..0000000 --- a/docs/sops/guestedit.rst +++ /dev/null @@ -1,72 +0,0 @@ -.. title: Guest Editing SOP -.. slug: infra-guest-editing -.. date: 2012-04-23 -.. taxonomy: Contributors/Infrastructure - -================= -Guest Editing SOP -================= - -Various virsh commands - -Contents -======== - -1. Contact Information -2. How to do it - - 1. add/remove cpus - 2. resize memory - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin-main -Location: - PHX, Tummy, ibiblio, Telia, OSUOSL -Servers: - All xen servers, kvm/libvirt servers. -Purpose: - Resize guest disks - -How to do it -============= - -Add cpu -------- - -1. SSH to the virthost server - -2. Calculate the number of CPUs the system needs - -3. ``sudo virsh setvcpus --config`` - ie:: - - sudo virsh setvcpus bapp01 16 --config - -4. Shutdown the virtual system - -5. Start the virtual system - -6. Login and check that cpu count matches - - -Resize memory -------------- - -1. SSH to the virthost server - -2. Calculate the amount of memory the system needs in kb - -3. ``sudo virsh setmem --config`` - ie:: - - sudo virsh setmem bapp01 16777216 --config - -4. Shutdown the virtual system - -5. Start the virtual system - -6. Login and check that memory matches - diff --git a/docs/sops/guestmigrate.rst b/docs/sops/guestmigrate.rst deleted file mode 100644 index 4df35ad..0000000 --- a/docs/sops/guestmigrate.rst +++ /dev/null @@ -1,90 +0,0 @@ -.. title: Guest Migration SOP -.. slug: infra-guest-migration -.. date: 2011-10-07 -.. taxonomy: Contributors/Infrastructure - -============================== -Guest migration between hosts. -============================== - -Move guests from one host to another. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main - -Location - PHX, Tummy, ibiblio, Telia, OSUOSL - -Servers - All xen servers, kvm/libvirt servers. - -Purpose - Migrate guests - -How to do it -============ - -1. Schedule outage time if any. This will need to be long enough to copy - the data from one host to another, so will depend on guest disk - size. - -2. Turn off monitoring in nagios - -3. On new host create disk space for server:: - - lvcreate -n app03 -L 32G vg_guests00 - -4. prepare old guest for migration: - a) if system is xen, install a regular kernel - b) look for entries for xenblk and hvc0 in /etc files - -5. Shutdown the guest. - -6. :: - - virsh dumpxml guestname > guest.xml - -7. Copy guest.xml to the new machine. You will need to make various - edits depending on if the system was originally xen or such. I - normally need to compare an existing xml on the target system and the - one we dumped out to make up the differences. - -8. Define the guest on the new machine: 'virsh define guest.xml'. - Depending on the changes in the xml this may not work and you will - need to make many manual changes plus copy the guest.xml to - ``/etc/libvirtd/qemu`` and do a ``/sbin/service libvirtd restart`` - -9. Insert iptables rule for nc transfer:: - - iptables -I INPUT 14 -s -m tcp -p tcp --dport 11111 -j ACCEPT - -10. On the destination host: - - - RHEL-5:: - - nc -l -p 11111 | dd of=/dev/mapper/ - - - RHEL-6:: - - nc -l 11111 | dd of=/dev/mapper/ - -11. On the source host:: - - dd if=/dev/mapper/guest-partition | nc desthost 11111 - - Wait for the copy to finish. You can do the following to track how - far something has gone by finding the dd pid and then sending a - 'kill -USR1' to it. - -11. start the guest on the new host:: - - ``virsh start guest`` - -12. On the source host, rename storage and undefine guest so it's not started. - diff --git a/docs/sops/haproxy.rst b/docs/sops/haproxy.rst deleted file mode 100644 index 4ed26c1..0000000 --- a/docs/sops/haproxy.rst +++ /dev/null @@ -1,156 +0,0 @@ -.. title: haproxy Infrastructure SOP -.. slug: infra-haproxy -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -========================== -Haproxy Infrastructure SOP -========================== - -haproxy is an application that does load balancing at the tcp layer or at -the http layer. It can do generic tcp balancing but it does specialize in -http balancing. Our proxy servers are still running apache and that is -what our users connect to. But instead of using mod_proxy_balancer and -ProxyPass balancer://, we do a ProxyPass to [45]http://localhost:10001/ or -[46]http://localhost:10002/. haproxy must be told to listen to an -individual port for each farm. All haproxy farms are listed in -/etc/haproxy/haproxy.cfg. - -Contents --------- - -1. Contact Information -2. How it works -3. Configuration example -4. Stats -5. Advanced Usage - -Contact Information -------------------- - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin-main, sysadmin-web group -Location: - Phoenix, Tummy, Telia -Servers: - proxy1, proxy2, proxy3, proxy4, proxy5 -Purpose: - Provides load balancing from the proxy layer to our application - layer. - -How it works ------------- - -haproxy is a load balancer. If you're familiar, this section won't be that -interesting. haproxy in its normal usage acts just like a web server. It -listens on a port for requests. Unlike most webservers though it then -sends that request to one of our back end application servers and sends -the response back. This is referred to as reverse proxying. We typically -configure haproxy to send check to a specific url and look for the -response code. If this url isn't sent, it just does basic checks to /. In -most of our configurations we're using round robin balancing. IE, request -1 goes to app1, request2 goes to app2, request 3 goes to app3 request 4 -goes to app1, and the whole process repeats. - -.. warning:: - These checks do add load to the app servers. As well as additional - connections. Be smart about which url you're checking as it gets checked - often. Also be sure to verify the application servers can handle your new - settings, monitor them closely for the hour or two after you make changes. - -Configuration example ---------------------- - -The below example is how our fedoraproject wiki could be configured. Each -application should have its own farm. Even though it may have an identical -configuration to another farm, this allows easy addition and subtraction -of specific nodes when we need them.:: - - listen fpo-wiki 0.0.0.0:10001 - balance roundrobin - server app1 app1.fedora.phx.redhat.com:80 check inter 2s rise 2 fall 5 - server app2 app2.fedora.phx.redhat.com:80 check inter 2s rise 2 fall 5 - server app4 app4.fedora.phx.redhat.com:80 backup check inter 2s rise 2 fall 5 - option httpchk GET /wiki/Infrastructure - -* The first line "listen ...." Says to create a farm called 'fpo-wiki'. - Listening on all IP's on port 10001. fpo-wiki can be arbitrary but make it - something obvious. Aside from that the important bit is :10001. Always - make sure that when creating a new farm, its listening on a unique port. - In Fedora's case we're starting at 10001, and moving up by one. Just check - the config file for the lowest open port above 10001. - -* The next line "balance roundrobin" says to use round robin balancing. - -* The server lines each add a new node to the balancer farm. In this case - the wiki is being served from app1, app2 and app4. If the wiki is - available at [53]http://app1.fedora.phx.redhat.com/wiki/ Then this config - would be used in conjunction with "RewriteRule ^/wiki/(.*) - [54]http://localhost:10001/wiki/$1 [P,L]". - -* 'server' means we're adding a new node to the farm - -* 'app1' is the worker name, it is analagous to fpo-wiki but should - match shorthostname of the node to make it easy to follow. - -* 'app1.fedora.phx.redhat.com:80' is the hostname and port to be - contacted. - -* 'check' means to check via bottom line "option httpchk GET - /wiki/Infrastructure" which will use /wiki/Infrastructure to verify - the wiki is working. If that URL fails, that entire node will be taken - out of the farm mix. - -* 'inter 2s' means to check every 2 seconds. 2s is the same as 2000 in - this case. - -* 'rise 2' means to not put this node back in the mix until it has had - two successful connections in a row. haproxy will continue to check - every 2 seconds whether a node is up or down - -* 'fall 5' means to take a node out of the farm after 5 failures. - -* 'backup' You'll notice that app4 has a 'backup' option. We don't - actually use this for the wiki but do for other farms. It basically - means to continue checking and treat this node like any other node but - don't send it any production traffic unless the other two nodes are - down. - -All of these options can be tweaked so keep that in mind when changing or -building a new farm. There are other configuration options in this file -that are global. Please see the haproxy documentation for more info:: - - /usr/share/doc/haproxy-1.3.14.6/haproxy-en.txt - -Stats ------ - -In order to view the stats for a farm please see the stats page. Each -proxy server has its own stats page since each one is running its own -haproxy server. To view the stats point your browser to -https://admin.fedoraproject.org/haproxy/shorthostname/ so proxy1 is at -https://admin.fedoraproject.org/haproxy/proxy1/ The trailing / is -important. - -* https://admin.fedoraproject.org/haproxy/proxy1/ -* https://admin.fedoraproject.org/haproxy/proxy2/ -* https://admin.fedoraproject.org/haproxy/proxy3/ -* https://admin.fedoraproject.org/haproxy/proxy4/ -* https://admin.fedoraproject.org/haproxy/proxy5/ - -Advanced Usage --------------- - -haproxy has some more advanced usage that we've not needed to worry about -yet but is worth mentioning. For example, one could send users to just one -app server based on session id. If user A happened to hit app1 first and -user B happened to hit app4 first. All subsequent requests for user A -would go to app1 and user B would go to app4. This is handy for -applications that cannot normally be balanced because of shared storage -needs or other locking issues. This won't solve all problems though and -can have negative affects for example when app1 goes down user A would -either lose their session, or be unable to work until app1 comes back up. -Please do some great testing before looking in to this option. - diff --git a/docs/sops/hosted_git_to_svn.rst b/docs/sops/hosted_git_to_svn.rst deleted file mode 100644 index d890a8d..0000000 --- a/docs/sops/hosted_git_to_svn.rst +++ /dev/null @@ -1,174 +0,0 @@ -.. title: Fedorahosted Repository Migration SOP -.. slug: infra-fedorahosted-migration -.. date: 2011-12-14 -.. taxonomy: Contributors/Infrastructure - -======================= -Fedorahosted migrations -======================= - -Migrating hosted repositories to that of another type. - -Contents -======== -1. Contact Information -2. Description -3. SVN to GIT migration - - 1. Questions left to be answered with this SOP - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-hosted - -Location - Serverbeach - -Servers - hosted1, hosted2 - -Purpose - Migrate hosted SCM repositories to that of another SCM. - -Description -=========== - -fedorahosted.org can be used to host open source projects. Occasionally -those projects want to change the SCM they utilize. This document provides -documentation for doing so. - -1. An scm for maintaining the code. The currently supported scm's include - Mercurial, Git, Bazaar, or SVN. Note: There is no cvs -2. A trac instance, which provides a mini-wiki for hosting information - and also provides a ticketing system. -3. A mailing list - -.. important:: - This page is for administrators only. People wishing to request a hosted - project should use the [50]Ticketing System ; see the - new project request template. (Requires Fedora Account) - -SVN to GIT migration -==================== - -FAS User Prep --------------- - -Currently you must manually generate $PROJECTNAME-users.txt by grabbing a -list of people in the FAS group - and recording them in th following -format:: - - $fasusername = FirstName LastName <$emailaddress> - -This is error prone, and will stop the git-svn fetch below if an author -appears that doesn't exist in the list of users.:: - - svn log --quiet | awk '/^r/ {print $3}' | sort -u - -The above will generate a list of users in the svn repo. - -If all users are FAS users you can use the following script to create a -users file (written by tmz (Todd Zullinger):: - - #!/bin/bash - - if [ -z "$1" ]; then - echo "usage: $0 " >&2 - exit 1 - fi - - svnurl=file:///svn/$1 - - if ! svn info $svnurl &>/dev/null; then - echo "$1 is not a valid svn repo." >&2 - fi - - svn log -q $svnurl | awk '/^r[0-9]+/ {print $3}' | sort -u | while read user; do - name=$( (getent passwd $user 2>/dev/null | awk -F: '{print $5}') || '' ) - [ -z "$name" ] && name=$user - email="$user@fedoraproject.org" - echo "$user=$name <$email>" - done - -Doing the conversion ---------------------- - -1. Log into hosted1 -2. Make a temporary directory to convert the repos in:: - - $ sudo mkdir /tmp/tmp-$PROJECTNAME.git - - $ cd /tmp/tmp-$PROJECTNAME.git - -3. Create an git repo ready to receive migrated SVN data:: - - $ sudo git-svn init http://svn.fedorahosted.org/svn/$PROJECTNAME --no-metadata - -4. Tell git to fetch and convert the repository:: - - $ git svn fetch - - .. note:: - This creation of a temporary repository is necessary because SVN leaves a - number of items floating around that git can ignore, and we want those - essentially ignored. - -5. From here, you'll wanted to follow [53]Creating a new git repo as if - cloning an existing git repository to Fedorahosted. - -6. After that process is done - kindly remove the temporary repo that was created:: - - $ sudo rm -rf /tmp/tmp-$PROJECTNAME.git - -Doing the converstion (alternate) ---------------------------------- - -Alternately, here's another way to do this (tmz): - -Setup a working dir:: - - [tmz@hosted1 tmp (master)]$ mkdir im-chooser-conversion && cd im-chooser-conversion - -Create authors file mapping svn usernames to Name form git uses.:: - - [tmz@hosted1 im-chooser-conversion (master)]$ ~tmz/svn-to-git-authors im-chooser > authors - -Convert svn to git:: - - [tmz@hosted1 im-chooser-conversion (master)]$ git svn clone -s -A authors --no-metadata file:///svn/im-chooser - -Move svn branches and tags into proper locations for the new git repo. -(git-svn leaves them as 'remote' branches/tags.):: - - [tmz@hosted1 im-chooser-conversion (master)]$ cd im-chooser - [tmz@hosted1 im-chooser (master)]$ mv .git/refs/remotes/tags/* .git/refs/tags/ && rmdir .git/refs/remotes/tags - [tmz@hosted1 im-chooser (master)]$ mv .git/refs/remotes/* .git/refs/heads/ - -Now 'git branch' and 'git tag' should display the branches/tags. - -Create a bare repo from the converted git repo. -Using ``file://$(pwd)`` here ensures that git copies all objects to the new bare repo.:: - - [tmz@hosted1 im-chooser-conversion (master)]$ git clone --bare --shared file://$(pwd)/im-chooser im-chooser.git - -Follow the steps in https://fedoraproject.org/wiki/Hosted_repository_setup to -finish setting proper modes and permissions for the repo. Don't forget to -update the description file. - -.. note:: - This still leaves moving the converted bare repo (im-chooser.git) to /git - and fixing up the user/group. - -Questions left to be answered with this SOP -============================================ - -* Obviously we need to have requestor review the migration and confirm - it's ok. -* Do we then delete the old SCM contents? -* Do we need to change the FAS-group type to grant them access to - pull/push from it? diff --git a/docs/sops/hotfix.rst b/docs/sops/hotfix.rst deleted file mode 100644 index 4a78987..0000000 --- a/docs/sops/hotfix.rst +++ /dev/null @@ -1,58 +0,0 @@ -.. title: Hotfixes SOP -.. slug: infra-hotfix -.. date: 2015-02-24 -.. taxonomy: Contributors/Infrastructure - -============ -HOTFIXES SOP -============ - -From time to time we have to quickly patch a problem or issue -in applications in our infrastructure. This process allows -us to do that and track what changed and be ready to remove -it when the issue is fixed upstream. - - -Ansible based items: -==================== -For ansible, they should be placed after the task that installs -the package to be changed or modified. Either in roles or tasks. - -hotfix tasks should be called "HOTFIX description" -They should also link in comments to any upstream bug or ticket. -They should also have tags of 'hotfix' - -The process is: - -- Create a diff of any files changed in the fix. -- Check in the _original_ files and change to role/task -- Check in now your diffs of those same files. -- ansible will replace the files on the affected machines - completely with the fixed versions. -- If you need to back it out, you can revert the diff step, - wait and then remove the first checkin - -Example:: - - - - # - # install hash randomization hotfix - # See bug https://bugzilla.redhat.com/show_bug.cgi?id=812398 - # - - name: hotfix - copy over new httpd init script - copy: src="{{ files }}/hotfix/httpd/httpd.init" dest=/etc/init.d/httpd - owner=root group=root mode=0755 - notify: - - restart apache - tags: - - config - - hotfix - - apache - -Upstream changes -================ - -Also, if at all possible a bug should be filed with the upstream -application to get the fix in the next version. Hotfixes are something -we should strive to only carry a short time. diff --git a/docs/sops/hotness.rst b/docs/sops/hotness.rst deleted file mode 100644 index 2992cee..0000000 --- a/docs/sops/hotness.rst +++ /dev/null @@ -1,67 +0,0 @@ -.. title: The New Hotness SOP -.. slug: hotness-sop -.. date: 2017-01-31 -.. taxonomy: Contributors/Infrastructure - -.. _hotness-sop: - -The New Hotness -=============== -`the-new-hotness `_ is a -`fedmsg consumer `_ -that subscribes to `release-monitoring.org `_ fedmsg -notifications to determine when a package in Fedora should be updated. For more details -on the-new-hotness, consult the `project documentation `_. - - -Contact Information -------------------- -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - phx2.fedoraproject.org -Servers - hotness01.phx2.fedoraproject.org - hotness01.stg.phx2.fedoraproject.org -Purpose - File issues when upstream projects release new versions of a package - - -Deploying a New Version ------------------------ -As of January 31, 2017, the-new-hotness is not packaged for Fedora or EPEL. When upstream -tags a new version in Git and you are building a new version (from the specfile in the upstream -repository), you will need to build it into the :ref:`infra-repo`. - -1. Build the SRPM with ``koji build epel7-infra the-new-hotness--.src.rpm``. If - you do not have permission to perform this build (it fails with permission denied), ask for help - in #fedora-admin. - -2. Consult the upstream changelog. If necessary, adjust the Ansible configuration for - the-new-hotness. - -3. Update the host. At the moment this is done with shell access to the host and running:: - - $ sudo -i yum clean all - $ sudo -i yum update the-new-hotness - -4. Ensure the configuration is up-to-date by running this on batcave01:: - - $ sudo rbac-playbook -l staging groups/hotness.yml # remove the "-l staging" to update prod - -All done! - - -Monitoring Activity -------------------- -It can be nice to check up on the-new-hotness to make sure its behaving correctly. -You can see all the Bugzilla activity using the -`user activity query `_ (staging uses -`partner-bugzilla.redhat.com `_) -and querying for the ``upstream-release-monitoring@fedoraproject.org`` user. - -You can also view all the Koji tasks dispatched by the-new-hotness. For example, you can see the -`failed tasks `_ -it has created. diff --git a/docs/sops/ibm-drive-replacement.rst b/docs/sops/ibm-drive-replacement.rst deleted file mode 100644 index afafe0c..0000000 --- a/docs/sops/ibm-drive-replacement.rst +++ /dev/null @@ -1,342 +0,0 @@ -.. title: Drive Replacement SOP -.. slug: infra-drive-replacement -.. date: 2012-07-13 -.. taxonomy: Contributors/Infrastructure - -==================================== -Drive Replacement Infrastructure SOP -==================================== - -At present this SOP only works for the X series IBM servers. - -We have multiple machines with lots of different drives in them. For the -most part now though, we are trying to standardise on IBM X series -servers. At present I've not figured out how to disable onboard raid, as a -result of this many of our servers have two raid 0 arrays then we do -software raid on this. - -The system xen11 is currently an HP ProLiant DL180 G5 with its own -interesting RAID system (using Compaq Smart Array ccis). Like the IBM X -series each drive is considered a single RAID-0 instance which is then -accessed through a logical drive. - -Contents -======== - -1. Contact Information -2. Verify the drive is dead - - 1. Re-adding a drive (poor man's fix) - -3. Actually replacing the drive (IBM) - - 1. Collecting Data - 2. Call IBM - 3. Get the package, give access to the tech - 4. Prepwork before the tech arrives - 5. Tech on site - 6. Rebuild the array - -4. Actually Replacing the Drive (HP) - - 1. Collecting data - 2. Call HP - 3. Get the package, give access to the tech - 4. Prepwork before the tech arrives - 5. Tech on site - 6. Rebuild the array - -5. Installing RaidMan (IBM Only) - -Database - DriveReplacement - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - All -Servers - All -Purpose - Steps for drive replacement. - -Verify the drive is dead -======================== - -:: - - $ cat /proc/mdadm - Personalities : [raid1] - md0 : active raid1 sdb1[1] sda1[0] - 513984 blocks [2/2] [UU] - - md1 : active raid1 sdb2[2](F) sda2[0] - 487717248 blocks [2/1] [U_] - -This indicates that md1 is in a degraded state and that /dev/sdb2 is the -failed drive. Notice that /dev/sdb1 (same physical drive as /dev/sdb2) is -not failed. /dev/md0 (not yet degraded) is showing a good state. This is -because /dev/md0 is /boot. If you run:: - - touch /boot/t - sync - rm /boot/t - -That should make /dev/md0 notice that its drive is also failed. If it does -not fail, its possible the drive is fine and that some blip happened that -caused it to get flagged as dead. It is also worthwhile to log in to -xenX-mgmt to determine if the RSAII adapter has noticed the drive is dead. - -If you think the drive just had a blip and is fine, see "Re-adding" below - -Re-adding a drive (poor man's fix) ------------------------------------ - -Basically what we're doing here is making sure the drive is, infact, dead. -Obviously you don't want to do this more then once on a drive, if it -continues to fail. Replace it. - -:: - - # cat /proc/mdadm - Personalities : [raid1] - md0 : active raid1 sdb1[1] sda1[0] - 513984 blocks [2/2] [UU] - - md1 : active raid1 sdb2[2](F) sda2[0] - 487717248 blocks [2/1] [U_] - # mdadm /dev/md1 --remove /dev/sdb2 - # mdadm /dev/md1 --add /dev/sdb2 - # cat /proc/mdstat - md0 : active raid1 sdb1[1] sda1[0] - 513984 blocks [2/1] [U_] - resync=DELAYED - - md1 : active raid1 sdb2[2] sda2[0] - 487717248 blocks [2/1] [U_] - [=>...................] recovery = 9.2% (45229120/487717248) finish=145.2min speed=50771K/sec - -So we removed the bad drive, added it again and you can now see the -recovery status. Watch it carefully. If it fails again, time for a drive -replacement. - -Actually replacing the drive (IBM) -================================== - -Actually replacing the drive is a bit of a todo. If the box is in a RH -owned location, we'll have to file a ticket and get someone access to the -colo. If it is at another location, we may be able to just ship the drive -there and have someone do it on site. Please follow the below steps for -drive replacement. - -Collecting Data ----------------- - -There's a not insignificant amount of data you'll need to place the call. -Please have the following information handy: - -1) The hosts machine type (this is not model number).:: - - # lshal | grep system.product - system.product = 'IBM System x3550 -[7978AC1]-' (string) - - In the above case, the machine type is encoded into [7978AC1]. And is just - the first 4 numbers. So this machine type is 7978. M/T (machine type) is - always 4 digits for IBM boxes. - -2) Machine's serial number:: - - # lshal | grep system.hardware.serial - system.hardware.serial = 'FAAKKEE' (string) - - The above's serial number is 'FAAKKEE' - -3) Drive Stats - - There are two ways to get the drive stats. You can get some of this - information via hal, but for the full complete information you need to - either have someone physically go look at the drive (some of which is in - inventory) or use RaidMan. See "Installing RaidMan" below for more - information on how to install RaidMan. - - Specifically you need: - - - Drive Size (in G) - - Drive Type (SAS or SATA?) - - Drive Model - - Drive Vendor - - To get this information run:: - - # cd /usr/RaidMan/ - # ./arcconf GETCONFIG 1 - -4) The phone number and address of the building where the drive is - currently located. This will go to the RH cage. - - This information is located in the contacts.txt of private git repo on - batcave01 (only available to sysadmin-main people) - - Call IBM - - Call 1-800-426-7378 and follow the directions they give you. You'll need - to use the M/T above to get to the correct rep. They will ask you for the - information above (you wrote it down, right?) - - When they agree to replace the drive, make sure to tell them you need the - shipping number of the drive as well as the name of the tech who will do - the drive replacement. Sometimes the tech will just bring the drive. If - not though, you need to open a ticket with the colo to let them know a - drive is coming. - - Get the package, give access to the tech - - As SOON as you get this information, open a ticket with RH. at - is-ops-tickets redhat.com. Request a ticket ID from RH. If the tech has - any issues getting into the colo, you can give the AT&T ticket request to - the tech to get them in. - - NOTE: this can often take hours. We have 4 hour on site response time from - IBM. This time goes very quickly, sometimes you may need to page out - someone in IS to ensure it gets created quickly. To get this pager - information see contacts.txt in batcave01's private repo (if batcave01 is down - for some reason see the dr copy on backup2.fedoraproject.org:/srv/ - - Prepwork before the tech arrives - - Really the big thing here is to remove the broken drive from the array. In - our earlier example we found /dev/sdb failed. We'll want to remove it from - both arrays: - - # mdadm /dev/md0 --remove /dev/sdb1 - # mdadm /dev/md1 --remove /dev/sdb2 - - Next get the current state of the drives and save it somewhere. See - "Installing RaidMan" for more information if RaidMan is not installed. - - # cd /usr/RaidMan - # ./arcconf GETCONFIG 1 > /tmp/raid1.txt - - Copy /tmp/raid1.txt off to some other device and save it until the tech is - on site. It should contain information about the failed drive. - - Tech on site - - When the tech is on site you may have to give him the rack location. All - of our Mesa servers are in one location, "the same room that the desk is - in". You may have to give him the serial number of the server, or possibly - make it blink. It's either the first rack on the left labeled: "01 2 55" - or "01 2 58". - - Once he's replaced the drive, he'll have you verify. Use the RaidMan tools - to do the following: - - # cd /usr/RaidMan - # ./arcconf RESCAN 1 - # ./arcconf GETCONFIG 1 > /tmp/raid2.txt - # # arcconf CREATE LOGICALDRIVE [Options] - # ./arcconf create 1 LOGICALDRIVE 476790 Simple_volume 0 1 - - First we're going to re-scan the array for the new drive. Then we'll - re-get the configs. Compare /tmp/raid2.txt to /tmp/raid1.txt and verify - the bad drive is fixed and that it has a different serial number. Also - make sure its the correct size. Thank the tech and send him on his way. - The last line there creates a new logical drive from the physical drive. - "Simple_volume" tells it to create a raid0 array of one drive. The size - was pulled out of our initial /tmp/raid1.txt (should match the other - drive). The last two numbers are the Channel and ID of the new drive. - - Rebuild the array - - Now that the disk has been replaced we need to put a partition table on - the new drive and add it to the array: - - * /dev/sdGOOD is the *GOOD* drive - * /dev/sdBAD is the *BAD* drive - - # dd if=/dev/sdGOOD of=/tmp/sda-mbr.bin bs=512 count=1 - # dd if=/tmp/sda-mbr.bin of=/dev/sdBAD - # partprobe - - Next re-add the drives to the array: - - * /dev/sdBAD1 and /dev/sdBAD2 are the partitons on the new drive which - is no longer bad. - - # mdadm /dev/md0 --add /dev/sdBAD1 - # mdadm /dev/md1 --add /dev/sdBAD2 - # cat /proc/mdadm - - This starts rebuilding the arrays, the last line checks the status. - -Actually Replacing the Drive (HP) - - Replacing the drive on the HP's is similar to the IBM's. First you will - need to contact HP, then you will need to open a ticket with Red Hat's - Helpdesk to get into the PHX2 facility. Then you will need to coordinate - with the technician on the colocation's rules for entry and who to - call/talk with. - - Collecting data - - Call HP - - Get the package, give access to the tech - - Prepwork before the tech arrives - - Tech on site - - Rebuild the array - - Now that the disk has been replaced we need to put a partition table on - the new drive and add it to the array: - - * /dev/cciss/c0dGOOD is the *GOOD* drive. The HP utilities will have a - code like 1I:1:1 - * /dev/cciss/c0dBAD is the *BAD* drive. The HP utilities will have a - code like 2I:1:1 - - First we need to create the logical drive on the system. - - # hpacucli controller serialnumber=P61630H9SVU4JF create type=ld sectors=63 drives=2I:1:1 raid=0 - - # dd if=/dev/ccis/c0dGOOD of=/tmp/sda-mbr.bin bs=512 count=1 - # dd if=/tmp/sda-mbr.bin of=/dev/ccis/c0dBAD - # partprobe - - Next re-add the drives to the array: - - * /dev/sdBAD1 and /dev/sdBAD2 are the partitons on the new drive which - is no longer bad. - - # mdadm /dev/md0 --add /dev/sdBAD1 - # mdadm /dev/md1 --add /dev/sdBAD2 - # cat /proc/mdadm - - This starts rebuilding the arrays, the last line checks the status. - -Installing RaidMan (IBM Only) - - Unfortunately there is no feasible alternative to managing IBM Raid Arrays - without causing downtime. You can get and do this via the pre-POST - interface. This requires downtime, and if the first drive is the failed - drive, may result in a non-booting system. So for now RaidMan it is until - we can figure out how to get rid of the raid controllers in these boxes - completely. - - yum -y install compat-libstdc++-33.i686 - rpm -ihv https://infrastructure.fedoraproject.org/rhel/RaidMan/RaidMan-9.00.i386.rpm - - To verify installation has completed successfully: - - # cd /usr/RaidMan/ - # ./arcconf GETCONFIG 1 - - This should print the current configuration of the raid controller and its - logical drives. - diff --git a/docs/sops/ibm_rsa_ii.rst b/docs/sops/ibm_rsa_ii.rst deleted file mode 100644 index f037d98..0000000 --- a/docs/sops/ibm_rsa_ii.rst +++ /dev/null @@ -1,61 +0,0 @@ -.. title: IBM RSA II Remote Management SOP -.. slug: infra-ibm-rsa-ii -.. date: 2011-08-23 -.. taxonomy: Contributors/Infrastructure - -============================= -IBM RSA II Infrastructure SOP -============================= - -Many of our physical machines use RSA II cards for remote management. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - PHX, ibiblio -Servers - All physical IBM machines -Purpose - Provide remote management for our physical IBM machines - -Restarting the RSA II card -========================== - -Normally, the RSA II can be restarted from the web/ssh interface. If you -are locked out of any outside access to the RSA II, follow these -instructions on the physical machine. - -If the machine can be rebooted without issue, cut off all power to the -machine, wait a few seconds, and restart everything. - -Otherwise, to restart the card without rebooting the machine: - -1. Download and install the IBM Remote Supervisor Adapter II Daemon - - 1. ``yum install usbutils libusb-devel`` # (needed by the RSA II daemon) - - 2. Download the correct tarball from - http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5071676&brandind=5000008 - (TODO: check if this can be packaged in Fedora) - - 3. Extract the tarball and run ``sudo ./install.sh --update`` - -2. Download and extract the IBM Advanced Settings Utility - http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=TOOL-ASU&brandind=5000016 - - .. warning:: this tarball dumps files in the current working directory - -3. Issue a ``sudo ./asu64 rebootrsa`` to reboot the RSA II. - -4. Clean up: ``yum remove ibmusbasm64`` - -Other Resources -=============== - -http://www.redbooks.ibm.com/abstracts/sg246495.html may be a useful -resource to refer to when working with this. diff --git a/docs/sops/index.rst b/docs/sops/index.rst deleted file mode 100644 index 89fd5b6..0000000 --- a/docs/sops/index.rst +++ /dev/null @@ -1,142 +0,0 @@ -.. Fedora Infrastructure Best Practices documentation master file, created by - sphinx-quickstart on Wed Jan 25 17:17:34 2017. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. - -.. _sops: - -Standard Operating Procedures -============================= - -Below is a table of contents containing all the standard operating procedures -for Fedora Infrastructure applications. For information on how to write a new -standard operating procedure, consult the guide on :ref:`develop-sops`. - -.. toctree:: - :maxdepth: 2 - :caption: Contents: - - 2-factor - accountdeletion - anitya - ansible - apps-fp-o - archive-old-fedora - arm - askbot - badges - basset - bastion-hosts-info - bladecenter - blockerbugs - bodhi - bugzilla2fedmsg - bugzilla - cloud - collectd - contenthosting - copr - cyclades - darkserver - database - datanommer - denyhosts - departing-admin - dns - fas-notes - fas-openid - fedmsg-certs - fedmsg-gateway - fedmsg-introduction - fedmsg-irc - fedmsg-new-message-type - fedmsg-relay - fedmsg-websocket - fedocal - fedorahosted-fedmsg - fedorahosted-project-cleanup - fedorahostedrename - fedorahosted-repo-setup - fedorahosted - fedorapackages - fedorapastebin - fedora-releases - fedorawebsites - fmn - freemedia - freenode-irc-channel - gather-easyfix - github2fedmsg - github - gitweb - guestdisk - guestedit - guestmigrate - haproxy - hosted_git_to_svn - hotfix - ibm-drive-replacement - ibm_rsa_ii - infra-git-repo - infra-hostrename - infra-raidmismatch - infra-repo - infra-retiremachine - infra-yubikey - ipsilon - iscsi - jenkins-fedmsg - kerneltest-harness - kickstarts - koji-builder-setup - koji - koschei - layered-image-buildsys - linktracking - loopabull - mailman - making-ssl-certificates - massupgrade - mastermirror - memcached - mirrorhiding - mirrormanager - mirrormanager-S3-EC2-netblocks - mote - nagios - netapp - new-hosts - nonhumanaccounts - nuancier - openvpn - orientation - outage - packagedatabase - pdc - pesign-upgrade - planetsubgroup - privatefedorahosted - publictest-dev-stg-production - rdiff-backup - requestforresources - resultsdb - reviewboard - scmadmin - selinux - sigul-upgrade - sshaccess - sshknownhosts - staging-infra - staging - stagingservers - status-fedora - syslog - taskotron - torrentrelease - unbound - virt-image - virtio - virt-notes - voting - wiki - zodbot diff --git a/docs/sops/infra-git-repo.rst b/docs/sops/infra-git-repo.rst deleted file mode 100644 index d4ce5d5..0000000 --- a/docs/sops/infra-git-repo.rst +++ /dev/null @@ -1,62 +0,0 @@ -.. title: Fedora Infrastructure Git Repo SOP -.. slug: infra-git -.. date: 2013-06-17 -.. taxonomy: Contributors/Infrastructure - -======================== -Infrastructure Git Repos -======================== - -Setting up an infrastructure git repo - and the push mechanisms for the -magicks - -We have a number of git repos (in /git on batcave) that manage files -for ansible, our docs, our common host info database and our kickstarts -This is a doc on how to setup a new one of these, if it is needed. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - Phoenix -Servers - batcave01.phx2.fedoraproject.org, - batcave-comm01.qa.fedoraproject.org - - -Steps -====== -Create the bare repo:: - - make $git_dir - setfacl -m d:g:$yourgroup:rwx -m d:g:$othergroup:rwx \ - -m g:$yourgroup:rwx -m g:$othergroup:rwx $git_dir - - cd $git_dir - git init --bare - - -edit up config - add these lines to the bottom:: - - [hooks] - # (normallysysadmin-members@fedoraproject.org) - mailinglist = emailaddress@yourdomain.org - emailprefix = - maildomain = fedoraproject.org - reposource = /path/to/this/dir - repodest = /path/to/where/you/want/the/files/dumped - - -edit up description - make it something useful:: - - - cd hooks - rm -f *.sample - cp hooks from /git/infra-docs/hooks/ on batcave01 to this path - -modify sudoers to allow users in whatever groups can commit to -this repo can run /usr/local/bin/syncgittree.sh w/o inputting a password diff --git a/docs/sops/infra-hostrename.rst b/docs/sops/infra-hostrename.rst deleted file mode 100644 index d8b59ca..0000000 --- a/docs/sops/infra-hostrename.rst +++ /dev/null @@ -1,110 +0,0 @@ -.. title: Infrastructure Host Rename SOP -.. slug: infra-host-rename -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -============================== -Infrastructure Host Rename SOP -============================== - -This page is intended to guide you through the process of renaming a -virtual node. - -Contents -======== - -1. Introduction -2. Finding out where the host is -3. Preparation -4. Renaming the Logical Volume -5. Doing the actual rename -6. Telling ansible about the new host -7. VPN Stuff - -Introduction -============ - -Throughout this SOP, we will refer to the old hostname as $oldhostname and -the new hostname as $newhostname. We will refer to the Dom0 host that the -vm resides on as $vmhost. - -If this process is being followed so that a temporary-named host can -replace a production host, please be sure to follow the [51]Infrastructure -retire machine SOP to properly decommission the old host before -continuing. - -Finding out where the host is -============================= - -In order to rename the host, you must have access to the Dom0 (host) on -which the virtual server resides. To find out which host that is, log in -to batcave01, and run:: - - grep $oldhostname /var/log/virthost-lists.out - -The first column of the output will be the Dom0 of the virtual node. - -Preparation -=========== - -SSH to $oldhostname. If the new name is replacing a production box, change -the IP Address that it binds to, in ``/etc/sysconfig/network-scripts/ifcfg-eth0``. - -Also change the hostname in ``/etc/sysconfig/network``. - -At this point, you can ``sudo poweroff`` $oldhostname. - -Open an ssh session to $vmhost, and make sure that the node is listed as -``shut off``. If it is not, you can force it off with:: - - virsh destroy $oldhostname - -Renaming the Logical Volume -============================ -Find out the name of the logical volume (on $vmhost):: - - virsh dumpxml $oldhostname | grep 'source dev' - -This will give you a line that looks like ```` which tells you that -``/dev/VolGroup00/$oldhostname`` is the path to the logical volume. - -Run ``/usr/sbin/lvrename`` (the path that you found above) (the path that you -found above, with $newhostname at the end instead of $oldhostname)` - -For example:: - /usr/sbin/lvrename /dev/VolGroup00/noc03-tmp /dev/VolGroup00/noc01 - -Doing the actual rename -======================= -Now that the logical volume has been renamed, we can rename the host in -libvirt. - -Dump the configuration of $oldhostname into an xml file, by running:: - - virsh dumpxml $oldhostname > $newhostname.xml - -Open up $newhostname.xml, and change all instances of $oldhostname to -$newhostname. - -Save the file and run:: - - virsh define $newhostname.xml - -If there are no errors above, you can undefine $oldhostname:: - - virsh undefine $oldhostname - -Power on $newhostname, with:: - - virsh start $newhostname - -And remember to set it to autostart:: - - virsh autostart $newhostname - - -VPN Stuff -========= - -TODO diff --git a/docs/sops/infra-raidmismatch.rst b/docs/sops/infra-raidmismatch.rst deleted file mode 100644 index d5cf5a9..0000000 --- a/docs/sops/infra-raidmismatch.rst +++ /dev/null @@ -1,75 +0,0 @@ -.. title: Infrastructure Raid Mismatch Count SOP -.. slug: infra-raid-mismatch -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -====================================== -Infrastructure/SOP/Raid Mismatch Count -====================================== - -What to do when a raid device has a mismatch count - -Contents -======== -1. Contact Information -2. Description -3. Correction - - 1. Step 1 - 2. Step 2 - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main - -Location - All - -Servers - Physical hosts - -Purpose - Provides database connection to many of our apps. - -Description -=========== -In some situations a raid device may indicate there is a count mismatch as -listed in:: - - /sys/block/mdX/md/mismatch_cnt - -Anything other than 0 is considered not good. Though if the number is low -it's probably nothing to worry about. To correct this situation try the -directions below. - -Correction -========== - -More than anything these steps are to A) Verify there is no problem and B) -make the error go away. If step 1 and step 2 don't correct the problems, -PROCEED WITH CAUTION. The steps below, however, should be relatively safe. - - -Issue a repair (replace mdX with the questionable raid device):: - - echo repair > /sys/block/mdX/md/sync_action - -Depending on the size of the array and disk speed this can take a while. -Watch the progress with:: - - cat /proc/mdstat - -Issue a check. It's this check that will reset the mismatch count if there -are no problems. Again replace mdX with your actual raid device.:: - - echo check > /sys/block/mdX/md/sync_action - -Just as before, you can watch the progress with:: - - cat /proc/mdstat - diff --git a/docs/sops/infra-repo.rst b/docs/sops/infra-repo.rst deleted file mode 100644 index 535c995..0000000 --- a/docs/sops/infra-repo.rst +++ /dev/null @@ -1,109 +0,0 @@ -.. title: Infrastructure RPM Repository SOP -.. slug: infra-repo -.. date: 2016-10-12 -.. taxonomy: Contributors/Infrastructure - -=========================== -Infrastructure Yum Repo SOP -=========================== - -In some cases RPM's in Fedora need to be rebuilt for the Infrastructure -team to suit our needs. This repo is provided to the public (except for -the RHEL RPMs). Rebuilds go into this repo which are stored on the netapp -and shared via the proxy servers after being built on koji. - -For basic instructions, read the standard documentation on Fedora wiki: -- https://fedoraproject.org/wiki/Using_the_Koji_build_system - -This document will only outline the differences between the "normal" repos -and the infra repos. - - -Contents -======== - -1. Contact Information -2. Building an RPM -3. Tagging an existing build -4. Koji package list - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location: PHX [53]http - //infrastructure.fedoraproject.org/ -Servers - koji - batcave01 / Proxy Servers -Purpose - Provides infrastructure repo for custom Fedora Infrastructure rebuilds - -Building an RPM -=============== - -Building an RPM for Infrastructure is significantly easier then building -an RPM for Fedora. Basically get your SRPM ready, then submit it to koji -for building to the $repo-infra target. (e.g. epel7-infra). - -Example:: - - rpmbuild --define "dist .el7" -bs test.spec - koji build epel7-infra test-1.0-1.el7.src.rpm - -.. note:: - Remember to build it for every dist / arch you need to deploy it on. - -After it has been built, you will see it's tagged as $repo-infra-candidate, -this means that it is a candidate for being signed. The automatic signing -system will pick it up and sign the package for you without any further -intervention. You can track when this is done by checking the build info: -when it is moved from $repo-infra-candidate to $repo-infra, it has been -signed. You can check this on the web interface (look under "Tags"), or via:: - - koji buildinfo test-1.0-1.el7. - -For importing it into the live repositories, you can just wait a few minutes. -There's a cronjob that runs every :00, :15, :30 and :45 that refreshes the -infrastructure repository with all packages that have been tagged. -After this time, you can yum clean all and then install the packages via yum -install or yum update. - -Admins can also manually trigger that script via:: - - /mnt/fedora/app/fi-repo/infra/update.sh - - -Tagging existing builds -======================= - -If you already have a real build and want to use it inthe infrastructure before -it has landed in stable, you can tag it into the respective infra-candidate tag. -For example, if you have an epel7 build of test2-1.0-1.el7, run:: - - koji tag epel7-infra-candidate test2-1.0-1.el7 - -And then the same autosigning and cronjob from the previous section applies. - - -Koji package list -================= - -If you try to build a package into the infra tags, and koji says something like: -BuildError: package test not in list for tag epel7-infra-candidate -That means that the package has not been added to the list for building in that -particular tag. Either add the package to the respective Fedora/EPEL branches -(this is the preferred method, since we should always aim to get everything -packaged for Fedora/EPEL), or ask a koji admin to add the package to the listing -for the respective tag. - -To list koji admins:: - - koji list-history --permission=admin --active | grep grant - -For koji admins, they can run:: - - koji add-pkg $tag $package --owner=$user diff --git a/docs/sops/infra-retiremachine.rst b/docs/sops/infra-retiremachine.rst deleted file mode 100644 index b5dbbb8..0000000 --- a/docs/sops/infra-retiremachine.rst +++ /dev/null @@ -1,54 +0,0 @@ -.. title: Infrastructure Machine Retirement SOP -.. slug: infra-machine-retirement -.. date: 2011-08-23 -.. taxonomy: Contributors/Infrastructure - -================================= -Infrastructure retire machine SOP -================================= - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin -Location: - anywhere -Servers: - any -Purpose: - Makes sure decommisioning machines is correctly done - -Introduction -============ - -When a machine (be it virtual instance or real physical hardware is -decommisioned, a set of steps must be followed to ensure that the machine -is properly removed from the set of machines we manage and doesn't cause -problems down the road. - -Retire process -============== - -1. Ensure that the machine is no longer used for anything. Use git-grep, - stop services, etc. - -2. Remove the machine from ansible. Make sure you not only remove the main - machine name, but also any aliases it might have (or move them to an - active server if they are active services. Make sure to search for the IP - address(s) of the machine as well. Ensure dns is updated to remove the - machine. - -3. Remove the machine from any labels in hardware devices like consoles or - the like. - -4. Revoke the ansible cert for the machine. - -5. Move the machine xml defintion to ensure it does NOT start on boot. You - can move it to 'name-retired-YYYY-MM-DD'. - -6. Ensure any backend storage the machine was using is freed or renamed to - name-retired-YYYY-MM-DD - -TODO -====== -fill in commands diff --git a/docs/sops/infra-yubikey.rst b/docs/sops/infra-yubikey.rst deleted file mode 100644 index a11c994..0000000 --- a/docs/sops/infra-yubikey.rst +++ /dev/null @@ -1,147 +0,0 @@ -.. title: Infrastructure Yubikey SOP -.. slug: infra-yubikey -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -========================== -Infrastructure/SOP/Yubikey -========================== - -This document describes how yubikey authentication works - -Contents -======== - -1. Contact Information -2. User Information -3. Host Admins - - 1. pam_yubico - -4. Server Admins - - 1. Basic architecture - 2. ykval - 3. ykksm - 4. Physical Yubikey info - -5. fas integration - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main - -Location - Phoenix - -Servers - fas*, db02 - -Purpose - Provides yubikey authentication in Fedora - -Config Files -============ -* ``/etc/httpd/conf.d/yk-ksm.conf`` -* ``/etc/httpd/conf.d/yk-val.conf`` -* ``/etc/ykval/ykval-config.php`` -* ``/etc/ykksm/ykksm-config.php`` -* ``/etc/fas.cfg`` - -User Information -================ - -See [57]Infrastruture/Yubikey - -Host Admins -=========== - -pam_yubico - -Generated from fas, the /etc/yubikeyid works like a authroized_keys file -and maps valid keys to users. It is downloaded from FAS: - -[58]https://admin.fedoraproject.org/accounts/yubikey/dump - -Server Admins -============= -Basic architecture ------------------- -Yubikey authentication takes place in 3 basic phases. - -1. User presses yubikey which generates a one time password -2. The one time password makes its way to the yk-val application which - verifies it is not a replay -3. yk-val passes that otp on to the yk-ksm application which verifies the - key itself is a valid key - -If all of those steps succeed, the ykval application sends back an OK and -authentication is considered successful. The two applications are defined -below, if either of them is unavailable, yubikey authentication will fail. - -ykval -`````` - -Database: db02:ykval - -The database contains 3 tables. clients: just a valid client. These are -not users, these are systems able to authenticate against ykval. In our -case Fedora is the only client so there's just one entry here queue: Used -for distributed setups (we don't do this) yubikeys: maps which yubikey -belongs to which user - -ykval is installed on fas* and is located at: -[59]http://localhost/yk-val/verify - -Purpose: Is to map keys to users and protect against replay attacks - -ykksm -`````` -Database: db02:ykksm - -The database contains one table: yubikeys: maps who created keys, what key -was created, when, and the public name and serial number, whether its -active, etc. - -ykksm is installed on fas* at [60]http://localhost/yk-ksm - -Purpose: verify if a key is a valid known key or not. Nothing contacts -this service directly except for ykval. This should be considered the -“high security” portion of the system as access to this table would allow -users to make their own yubikeys. - -Physical Yubikey info -`````````````````````` - -The actual yubikey contains information to generate a one time password. -The important bits to know are the begining of the otp contains the -identifier of the key (used similar to how ssh uses authorized_keys) and -note the rest of it contains lots of bits of information, including a -serial incremental. - -Sample key: ``ccccfcdaivjrvdhvzfljbbievftnvncljhibkulrftt`` - -Breaking this up, the first 12 characters are the identifier. This can be -considered 'public' - -ccccfcdaivj rvdhvzfljbbievftnvncljhibkulrftt - -The second half is the otp part. - -fas integration -=============== -Fas integration has two main parts. First is key generation, the next is -activation. The fas-plugin-yubikey contains the bits for both, and -verification. Users call on this page to generate the key info: - -[61]https://admin.fedoraproject.org/accounts/yubikey/genkey - -The fas password field automatically detects whether someone is using a -otp or a regular password. It then sends otp requests to yk-val for -verification. - diff --git a/docs/sops/ipsilon.rst b/docs/sops/ipsilon.rst deleted file mode 100644 index a1c4d70..0000000 --- a/docs/sops/ipsilon.rst +++ /dev/null @@ -1,80 +0,0 @@ -.. title: Ipsilon Infrastucture SOP -.. slug: infra-ipsilon -.. date: 2016-03-21 -.. taxonomy: Contributors/Infrastructure - -========================= -Ipsilon Infrastructure SOP -========================= - - - -Contents -======== - -1. Contact Information -2. Description -3. Known Issues -4. ReStarting -5. Configuration -6. Common actions - 6.1. Registering OpenID Connect Scopes - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - Phoenix -Servers - ipsilon01.phx2.fedoraproject.org ipsilon02.phx2.fedoraproject.org ipsilion01.stg.phx2.fedoraproject.org. - -Purpose - Ipsilon is our central authentication service that is used to authenticate users agains FAS. It is seperate from FAS. - -Description -=========== - -Ipsilon is our central authentication agent that is used to authenticate users agains FAS. It is seperate from FAS. The only service that is not using this currently is the wiki. It is a web service that is presented via httpd and is load balanced by our standard haproxy setup. - -Known issues -============== - -No known issues at this time. There is not currently a logout option for ipsilon, but it is not considered an issue. If group memberships are updated in ipsilon the user will need to wait a few minutes for them to replicate to the all the systems. - -Restarting -=============== - -To restart the application you simply need to ssh to the servers for the problematic region and issue an 'service httpd restart'. This should rarely be required. - -Configuration -================ - -Configuration is handled by the ipsilon.yaml playbook in Ansible. This can also be used to reconfigure application, if that becomes nessecary. - -Common actions -============== -This section describes some common configuration actions. - -OpenID Connect Scope Registration ---------------------------------- -As documented on https://fedoraproject.org/wiki/Infrastructure/Authentication, application developers can request their own scopes. -When a request for this comes in, look in ansible/roles/ipsilon/files/oidc_scopes/ and copy an example module. -Copy this to a new file, so we have a file per scope set. -Fill in the information: - - name is an Ipsilon-internal name. This should not include any spaces - - display_name is the name that is displayed to the category of scopes to the user - - scopes is a dictionary with the full scope identifier (with namespace) as keys. - The values are dicts with the following keys: - display_name: The complete display name for this scope. This is what the user gets shown to accept/reject - claims: A list of additional "claims" (pieces of user information) an application will get when the user - consents to this scope. For most scopes, this will be the empty list. -In ansible/roles/ipsilon/tasks/main.yml, add the name of the new file (without .py) to the with_items of - "Copy OpenID Connect scope registrations"). -To enable, open ansible/roles/ipsilon/templates/configuration.conf, and look for the lines starting with - "openidc enabled extensions". -Add the name of the plugin (in the "name" field of the file) to the environment this scopeset has been requested for. -Run the ansible ipsilon.yml playbook. diff --git a/docs/sops/iscsi.rst b/docs/sops/iscsi.rst deleted file mode 100644 index 5b9abf6..0000000 --- a/docs/sops/iscsi.rst +++ /dev/null @@ -1,132 +0,0 @@ -.. title: Infrastructure iSCSI SOP -.. slug: infra-iscsi -.. date: 2011-08-23 -.. taxonomy: Contributors/Infrastructure - -===== -iSCSI -===== - -iscsi allows one to share and mount block devices using the scsi protocol -over a network. Fedora currently connects to a netapp that has an iscsi -export. - -Contents -======== - -1. Contact Information -2. Typical uses -3. iscsi basics - - 1. Terms - 2. iscsi's basic login / logout procedure is - -4. Loggin in -5. Logging out -6. Important note about creating new logical volumes - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - Phoenix -Servers - xen[1-15] -Purpose - Provides iscsi connectivity to our netapp. - -Typical uses -============ - -The best uses for Fedora are for servers that are not part of a farm or -live replicated. For example, we wouldn't put app1 on the iscsi share -because we don't gain anything from it. Shutting down app1 to move it -isn't an issue because app1 is part of our application server farm. - -noc1, however, is not replicated. It's a stand alone box that, at best, -would have a non-live failover. By placing this host on an iscsi share, we -can make it more highly available as it allows us to move that box around -our virtualization infrastructure without rebooting it or even taking it -down. - -iscsi basics -============ - -Terms -------- - -* initiator means client -* target means server -* swab means mop -* deck means floor - -iscsi's basic login / logout procedure is -------------------------------------------- -1. Notify your client that a new target is available (similar to editing - /etc/fstab for a new nfs mount) -2. Login to the iscsi target (similar to running "mount /my/nfs" -3. Logout from the iscsi target (similar to running "umount /my/nfs" -4. Delete the target from the client (similar to removing the nfs mount - from /etc/fstab) - -Logging in -``````````` -Most mounts are covered by ansible so this should be automatic. In the -event that something goes wrong though, the best way to fix this is: - -- Notify the client of the target:: - - iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 -o new - -- Log in to the new target:: - - iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 --login - -- Scan and activate lvm:: - - pvscan - vgscan - vgchange -ay xenGuests - -Once this is done, one should be able to run "lvs" to see the logical -volumes - -Logging out -``````````` -Logging out isn't normally needed, for example rebooting a machine -automatically logs the initiator out. Should a problem arise though here -are the steps: - -- Disable the logical volume:: - - vgchange -an xenGuests - -- log out:: - - iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 --logout - -.. note:: ``Cannot deactivate volume group`` - - If the vgchange command fails with an error about not being able to - deactivate the volume group, this means that one of the logical volumes is - still in use. By running "lvs" you can get a list of volume groups. Look - in the Attr column. There are 6 attrs listed. The 5th column usually has a - '-' or an 'a'. 'a' means its active, - means it is not. To the right of - that (the last column) you will see an '-' or an 'o'. If you see an 'o' - that means that logical volume is still mounted and in use. - -.. important:: Note about creating new logical volumes - - At present we do not have logical volume locking on the xen servers. This - is dangerous and being worked on. Basically when you create a new volume - on a host, you need to run:: - - pvscan - vgscan - lvscan - - on the other virtualization servers. diff --git a/docs/sops/jenkins-fedmsg.rst b/docs/sops/jenkins-fedmsg.rst deleted file mode 100644 index f45324b..0000000 --- a/docs/sops/jenkins-fedmsg.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. title: Jenkins Fedmsg SOP -.. slug: infra-jenkins-fedmsg -.. date: 2016-05-11 -.. taxonomy: Contributors/Infrastructure - -================== -Jenkins Fedmsg SOP -================== - -Send information about Jenkins builds to fedmsg. - -Contact Information -------------------- - -Owner - Ricky Elrod, Fedora Infrastructure Team -Contact - #fedora-apps - -Reinstalling when it disappears -------------------------------- - -For an as-of-yet unknown reason, the plugin sometimes seems to disappear, -though it still shows as "installed" on Jenkins. - -To re-install it, grab `fedmsg.hpi` from `/srv/web/infra/bigfiles/jenkins`. -Go to the Jenkins web interface and log in. Click `Manage Jenkins` -> -`Manage Plugins` -> `Advanced`. Upload the plugin and on the page that comes -up, check the box to have Jenkins restart when running jobs are finished. - -Configuration Values --------------------- - -These are written here in case the Jenkins configuration ever gets lost. -This is how to configure the jenkins-fedmsg-emit plugin. - -Assume the plugin is already installed. - -Go to "Configure Jenkins" -> "System Configuration" - -Towards the bottom, look for "Fedmsg Emitter" - -Values: - -Signing: Checked -Fedmsg Endpoint: tcp://209.132.181.16:9941 -Environment Shortname: prod -Certificate File: /etc/pki/fedmsg/jenkins-jenkins.fedorainfracloud.org.crt -Keystore File: /etc/pki/fedmsg/jenkins-jenkins.fedorainfracloud.org.key diff --git a/docs/sops/kerneltest-harness.rst b/docs/sops/kerneltest-harness.rst deleted file mode 100644 index 49d9959..0000000 --- a/docs/sops/kerneltest-harness.rst +++ /dev/null @@ -1,79 +0,0 @@ -.. title: Kerneltest-harness SOP -.. slug: infra-kerneltest-harness -.. date: 2016-03-14 -.. taxonomy: Contributors/Infrastructure - -====================== -Kerneltest-harness SOP -====================== - -The kerneltest-harness is the web application used to gather and present -statistics about kernel test results. - -Contents -======== - -1. Contact Information -2. Documentation Links - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - https://apps.fedoraproject.org/kerneltest/ -Servers - kerneltest01, kerneltest01.stg -Purpose - Provide a system to gather and present kernel tests results - - -Add a new Fedora release -======================== - -* Login - -* On the front page, in the menu on the left side, if there is a `Fedora - Rawhide` release, click on `(edit)`. - -* Bump the `Release number` on `Fedora Rawhide` to avoid conflicts with the new - release you're creating - -* Back on the index page, click on `New release` - -* Complete the form: - - Release number - This would be the integer version of the Fedora release, for example 24 for - Fedora 24. - - Support - The current status of the Fedora release - - Rawhide for Fedora Rawhide - - Test for branched release - - Release for released Fedora - - Retired for retired release of Fedora - - -Upload new test results -======================= - -The kernel tests are available on the `kernel-test -`_ git repository. - -Once ran with `runtests.sh`, you can upload the resulting file either using -`fedora_submit.py` or the UI. - -If you choose the UI the steps are simply: - -* Login - -* Click on `Upload` in the main menu on the top - -* Select the result file generated by running the tests - -* Submit - diff --git a/docs/sops/kickstarts.rst b/docs/sops/kickstarts.rst deleted file mode 100644 index e06b6d9..0000000 --- a/docs/sops/kickstarts.rst +++ /dev/null @@ -1,169 +0,0 @@ -.. title: Infrastructure Kickstart SOP -.. slug: infra-kickstart -.. date: 2016-02-08 -.. taxonomy: Contributors/Infrastructure - -============================ -Kickstart Infrastructure SOP -============================ - -Kickstart scripts provide our install infrastructure. We have a -plethora of different kickstarts to best match the system you are trying -to install. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - Everywhere we have machines. -Servers - batcave01 (stores kickstarts and install media) -Purpose - Provides our install infrastructure - -Introduction -============ - -Our kickstart infrastructure lives on batcave01. All -install media and kickstart scripts are located on batcave01. Because the -RHEL binaries are not public we have these bits blocked. You can add -needed IPs to (from batcave01):: - - ansible/roles/batcave/files/allows - -Physical Machine (kvm virthost) -====================================== - -.. note:: PXE Booting - - If PXE booting just follow the prompt after doing the pxe boot (most hosts - will pxeboot via console hitting f12). - -Prep ----- - -This only works on an already booted box, many boxes at our colocations -may have to be rebuilt by the people in those locations first. Also make -sure the IP you are about to boot to install from is allowed to our IP -restricted infrastructure.fedoraproject.org as noted above (in -Introduction). - -Download the vmlinuz and initrd images. - -for a rhel6 install:: - - wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/images/pxeboot/vmlinuz \ - -O /boot/vmlinuz-install - wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/images/pxeboot/initrd.img \ - -O /boot/initrd-install.img - - grubby --add-kernel=/boot/vmlinuz-install \ - --args="ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-6-nohd \ - repo=https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/ \ - ksdevice=link ip=$IP gateway=$GATEWAY netmask=$NETMASK dns=$DNS" \ - --title="install el6" --initrd=/boot/initrd-install.img - -for a rhel7 install:: - - wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/images/pxeboot/vmlinuz -O /boot/vmlinuz-install - wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/images/pxeboot/initrd.img -O /boot/initrd-install.img - -For phx2 hosts:: - - grubby --add-kernel=/boot/vmlinuz-install \ - --args="ks=http://10.5.126.23/repo/rhel/ks/hardware-rhel-7-nohd \ - repo=http://10.5.126.23/repo/rhel/RHEL7-x86_64/ \ - net.ifnames=0 biosdevname=0 bridge=br0:eth0 ksdevice=br0 \ - ip={{ br0_ip }}::{{ gw }}:{{ nm }}:{{ hostname }}:br0:none" \ - --title="install el7" --initrd=/boot/initrd-install.img - -(You will need to setup the br1 device if any after install) - -For non phx2 hosts:: - - grubby --add-kernel=/boot/vmlinuz-install \ - --args="ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-ext \ - repo=https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/ \ - net.ifnames=0 biosdevname=0 bridge=br0:eth0 ksdevice=br0 \ - ip={{ br0_ip }}::{{ gw }}:{{ nm }}:{{ hostname }}:br0:none" \ - --title="install el7" --initrd=/boot/initrd-install.img - -Fill in the br0 ip, gateway, etc - -The default here is to use the hardware-rhel-7-nohd config which requires -you to connect via VNC to the box and configure its drives. If this is a -new machine or you are fine with blowing everything away, you can instead -use https://infrastructure.fedoraproject.org/rhel/ks/hardware-rhel-6-minimal -as your kickstart - -If you know the number of hard drives the system has there are other -kickstarts which can be used. - -2 disk system:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-02disk -or external:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-02disk-ext - -4 disk system:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-04disk -or external:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-04disk-ext - -6 disk system:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-06disk -or external:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-06disk-ext - -8 disk system:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-08disk -or external:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-08disk-ext - -10 disk system:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-10disk -or external:: - ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-10disk-ext - - -Double and triple check your configuration settings (On RHEL-6 ``cat -/boot/grub/menu.lst`` and on RHEL-7 ``cat /boot/grub2/grub.cfg``), -especially your IP information. In places like ServerBeach not all hosts -have the same netmask or gateway. Once everything you are ready to run -the commands to get it set up to boot next boot. - -RHEL-6:: - - echo "savedefault --default=0 --once" | grub --batch - shutdown -r now - -RHEL-7:: - - grub2-reboot 0 - shutdown -r now - -Installation ------------- - -Once the box logs you out, start pinging the IP address. It will disappear -and come back. Once you can ping it again, try to open up a VNC session. -It can take a couple of minutes after the box is back up for it to -actually allow vnc sessions. The VNC password is in the kickstart script -on batcave01:: - - grep vnc /mnt/fedora/app/fi-repo/rhel/ks/hardware-rhel-7-nohd - - vncviewer $IP:1 - -If using the standard kickstart script, one can watch as the install -completes itself, there should be no need to do anything. If using the -hardware-rhel-6-nohd script, one will need to configure the drives. The -password is in the kickstart file in the kickstart repo. - -Post Install ------------- -Run ansible on the box asap to set root passwords and other security features. -Don't leave a newly installed box sitting around. diff --git a/docs/sops/koji-builder-setup.rst b/docs/sops/koji-builder-setup.rst deleted file mode 100644 index 841955e..0000000 --- a/docs/sops/koji-builder-setup.rst +++ /dev/null @@ -1,138 +0,0 @@ -.. title: Infrastructure Koji Builder SOP -.. slug: infra-koji-builder -.. date: 2012-11-29 -.. taxonomy: Contributors/Infrastructure - -====================== -Setup Koji Builder SOP -====================== - -Contents -======== - -- Setting up a new koji builder -- Resetting/installing an old koji builder - -Builder Setup -============== -Setting up a new koji builder involves a goodly number of steps: - -Network Overview ----------------- - -1. First get an instance spun up following the kickstart sop. - -2. Define a hostname for it on the 125 network and a $hostname-nfs name - for it on the .127 network. - -3. make sure the instance has 2 network connections: - - - eth0 should be on the .125 network - - eth1 should be on the .127 network - - For VM eth0 should be on br0, eth1 on br1 on the vmhost. - -Setup Overview --------------- - -- install the system as normal:: - - virt-install -n $builder_fqdn -r $memsize \ - -f $path_to_lvm --vcpus=$numprocs \ - -l http://10.5.126.23/repo/rhel/RHEL6-x86_64/ \ - -x "ksdevice=eth0 ks=http://10.5.126.23/repo/rhel/ks/kvm-rhel-6 \ - ip=$ip netmask=$netmask gateway=$gw dns=$dns \ - console=tty0 console=ttyS0" \ - --network=bridge=br0 --network=bridge=br1 \ - --vnc --noautoconsole - -- run python ``/root/tmp/setup-nfs-network.py`` - this should print out the -nfs hostname that you made above - -- change root pw - -- disable selinux on the machine in /etc/sysconfig/selinux - -- reboot - -- setup ssl cert into private/builders - use fqdn of host as DN - - - login to fas01 as root - - ``cd /var/lib/fedora-ca`` - - ``./kojicerthelper.py normal --outdir=/tmp/ \ - --name=$fqdn_of_the_new_builder --cadir=. --caname=Fedora`` - - - info for the cert should be like this:: - - Country Name (2 letter code) [US]: - State or Province Name (full name) [North Carolina]: - Locality Name (eg, city) [Raleigh]: - Organization Name (eg, company) [Fedora Project]: - Organizational Unit Name (eg, section) []:Fedora Builders - Common Name (eg, your name or your servers hostname) []:$fqdn_of_new_builder - Email Address []:buildsys@fedoraproject.org - - - scp the file in ``/tmp/${fqdn}_key_and_cert.pem`` over to batcave01 - - - put file in the private repo under ``private/builders/${fqdn}.pem`` - - - ``git add`` + ``git commit`` - - - ``git push`` - - -- run ``./sync-hosts`` in infra-hosts repo; ``git commit; git push`` - -- as a koji admin run:: - - koji add-host $fqdnr i386 x86_64 - - (note: those are yum basearchs on the end - season to taste) - - -Resetting/installing an old koji builder ----------------------------------------- - -- disable the builder in koji (ask a koji admin) -- halt the old system (halt -p) -- undefine the vm instance on the buildvmhost:: - - virsh undefine $builder_fqdn - -- reinstall it - from the buildvmhost run:: - - virt-install -n $builder_fqdn -r $memsize \ - -f $path_to_lvm --vcpus=$numprocs \ - -l http://10.5.126.23/repo/rhel/RHEL6-x86_64/ \ - -x "ksdevice=eth0 ks=http://10.5.126.23/repo/rhel/ks/kvm-rhel-6 \ - ip=$ip netmask=$netmask gateway=$gw dns=$dns \ - console=tty0 console=ttyS0" \ - --network=bridge=br0 --network=bridge=br1 \ - --vnc --noautoconsole - -- watch install via vnc:: - - vncviewer -via bastion.fedoraproject.org $builder_fqdn:1 - -- when the install finishes: - - - start the instance on the buildvmhost:: - - virsh start $builder_fqdn - - - set it to autostart on the buildvmhost:: - - virsh autostart $builder_fqdn - -- when the guest comes up - - - login via ssh using the temp root password - - python /root/tmp/setup-nfs-network.py - - change root password - - disable selinux in /etc/sysconfig/selinux - - reboot - - ask a koji admin to re-enable the host - - - - diff --git a/docs/sops/koji.rst b/docs/sops/koji.rst deleted file mode 100644 index 7de7eea..0000000 --- a/docs/sops/koji.rst +++ /dev/null @@ -1,212 +0,0 @@ -.. title: Koji Infrastructure SOP -.. slug: infra-koji -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -======================= -Koji Infrastructure SOP -======================= - -.. note:: - We are transitioning from two buildsystems, koji for Fedora and plague for - EPEL, to just using koji. This page documents both. - -Koji and plague are our buildsystems. They share some of the same machines -to do their work. - -Contents -======== - -1. Contact Information -2. Description -3. Add packages into Buildroot -4. Troubleshooting and Resolution - - 1. Restarting Koji - 2. kojid won't start or some builders won't connect - 3. OOM (Out of Memory) Issues - - 1. Increase Memory - 2. Decrease weight - - 4. Disk Space Issues - -5. Should there be mention of being sure filesystems in chroots are -unmounted before you delete the chroots? - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-build group - -Persons - mbonnet, dgilmore, f13, notting, mmcgrath, SmootherFrOgZ - -Location - Phoenix - -Servers - - koji.fedoraproject.org - - buildsys.fedoraproject.org - - xenbuilder[1-4] - - hammer1, ppc[1-4] - -Purpose - Build packages for Fedora. - -Description -=========== - -Users submit builds to koji.fedoraproject.org or -buildsys.fedoraproject.org. From there it gets passed on to the builders. - -.. important:: - At present plague and koji are unaware of each other. A result of this may - be an overloaded builder. A easy fix for this is not clear at this time - -Add packages into Buildroot -=========================== - -Some contributors may have the need to build packages against fresh built -packages which are not into buildroot yet. Koji has override tags as a -Inheritance to the build tag in order to include them into buildroot which -can be set by:: - - koji tag-pkg dist-$release-override - -Troubleshooting and Resolution -============================== - -Restarting Koji ---------------- - -If for some reason koji needs to be restarted, make sure to restart the -koji master first, then the builders. If the koji master has been down for -a short enough time the builders do not need to be restarted.:: - - service httpd restart - service kojira restart - service kojid restart - -.. important:: - If postgres becomes interrupted in some way, koji will need to be - restarted. As long as the koji master daemon gets restarted the builders - should reconnect automatically. If the db server has been restarted and - the builders don't seem to be building, restart their daemons as well. - -kojid won't start or some builders won't connect ------------------------------------------------- - -In the event that some items are able to connect to koji while some are -not, please make sure that the database is not filled up on connections. -This is common if koji crashes and the db connections aren't properly -cleared. Upon restart many of the connections are full so koji cannot -reconnect. Clearing old connections is easy, guess about how long it the -new koji has been up and pick a number of minutes larger then that and -kill those queries. From db3 as postgres run:: - - echo "select procpid from pg_stat_activity where usename='koji' and now() - query_start \ - >= '00:40:00' order by query_start;" | psql koji | grep "^ " | xargs kill - -OOM (Out of Memory) Issues --------------------------- - -Out of memory issues occur from time to time on the build machines. There -are a couple of options for correction. The first fix is to just restart -the machine and hope it was a one time thing. If the problem continues -please choose from one of the following options. - -Increase Memory -``````````````` - -The xen machines can have memory increased on their corresponding xen -hosts. At present this is the table: - -+----------+-------------+ -| xen3 | xenbuilder1 | -+----------+-------------+ -| xen4 | xenbuilder2 | -+----------+-------------+ -| disabled | xenbuilder3 | -+----------+-------------+ -| xen8 | xenbuilder4 | -+----------+-------------+ - -Edit ``/etc/xen/xenbuilder[1-4]`` and add more memory. - -Decrease weight -``````````````` - -Each builder has a weight as to how much work can be given to it. -Presently the only way to alter weight is actually changing the database -on db3:: - - $ sudo su - postgres - -bash-2.05b$ psql koji - koji=# select * from host limit 1; - id | user_id | name | arches | task_load | capacity | ready | enabled - ---+---------+------------------------+-----------+-----------+----------+-------+--------- - 6 | 130 | ppc3.fedora.redhat.com | ppc ppc64 | 1.5 | 4 | t | t - (1 row) - koji=# update host set capacity=2 where name='ppc3.fedora.redhat.com'; - -Simply update capacity to a lower number. - -Disk Space Issues ------------------- - -The builders use a lot of temporary storage. Failed builds also get left -on the builders, most should get cleaned but plague does not. The easiest -thing to do is remove some older cache dirs. - -Step one is to turn off both koji and plague:: - - /etc/init.d/plague-builder stop - /etc/init.d/kojid stop - -Next check to see what file system is full:: - - df -h - -.. important:: - If any one of the following directories is full, send an outage - notification as outlined in: [62]Infrastructure/OutageTemplate to the - fedora-infrastructure-list and fedora-devel-list, then contact Mike - McGrath - - - /mnt/koji - - /mnt/ntap-fedora1/scratch - - /pub/epel - - /pub/fedora - -Typically just / will be full. The next thing to do is determine if we -have any extremely large builds left on the builder. Typical locations -include /var/lib/mock and /mnt/build (/mnt/build actually is on the local -filesystem):: - - du -sh /var/lib/mock/* /mnt/build/* - -``/var/lib/mock/dist-f8-build-10443-1503`` - classic koji build -``/var/lib/mock/fedora-6-ppc-core-57cd31505683ef1afa533197e91608c5a2c52864`` - classic plague build - -If nothing jumps out immediately, just start deleting files older than one -week. Once enough space has been freed start koji and plague back up:: - - /etc/init.d/plague-builder start - /etc/init.d/kojid start - -Unmounting ----------- - -.. warning:: - Should there be mention of being sure filesystems in chroots - are unmounted before you delete the chroots? - - Res ipsa loquitur. - diff --git a/docs/sops/koschei.rst b/docs/sops/koschei.rst deleted file mode 100644 index b7b5418..0000000 --- a/docs/sops/koschei.rst +++ /dev/null @@ -1,219 +0,0 @@ -.. title: Koschei SOP -.. slug: infra-koschei -.. date: 2016-09-29 -.. taxonomy: Contributors/Infrastructure - -=========== -Koschei SOP -=========== - -Koschei is a continuous integration system for RPM packages. -Koschei runs package scratch builds after dependency change or -after time elapse and reports package buildability status to -interested parties. - -Production instance: https://apps.fedoraproject.org/koschei -Staging instance: https://apps.stg.fedoraproject.org/koschei - -Contents --------- -1. Contact information -2. Deployment -3. Description -4. Configuration -5. Disk usage -6. Database -7. Managing koschei services -8. Suspespending koschei operation -9. Limiting Koji usage -10. Fedmsg notifications -11. Setting admin announcement -12. Adding package groups -13. Set package static priority - -Contact Information -------------------- -Owner - mizdebsk, msimacek -Contact - #fedora-admin -Location - Fedora Cloud -Purpose - continuous integration system - - -Deployment ----------- - sudo rbac-playbook groups/koschei-backend.yml - sudo rbac-playbook groups/koschei-web.yml - -Description ------------ -Koschei is deployed on two separate machines - koschei-backend and koschei-web - -Frontend (koschei-web) is a Flask WSGi application running with httpd. -It displays information to users and allows editing package groups and -changing priorities. - -Backend (koschei-backend) consists of multiple services: - -- koschei-watcher - listens to fedmsg events for complete builds and - changes build states in the database. Additionally listens to - repo-done events which are enqueued to be processed by - koschei-resolver - -- koschei-resolver - resolves package dependencies in given repo using - hawkey and compares them with previous iteration to get a dependency - diff. There are two types of resolutions: - - build resolution - resolves complete build in the repo in which it - was done on Koji. Produces the dependency differences visible in the - frontend. - new repo resolution - resolves all packages in newest repo available - in Koji. The output is a base for scheduling new builds. - -- koschei-scheduler - schedules new builds based on multiple criteria: - - dependency priority - dependency changes since last build valued by - their distance in the dependency graph. - manual and static priorities - set manually in the frontend. Manual - priority is reset after each build, static priority persists - time priority - time since last build (logarithmical formula) - -- koschei-polling - polls the same types of events as koschei-watcher - without reliance on fedmsg - - -Configuration -------------- -Koschei configuration is in ``/etc/koschei/config-backend.cfg`` and -``/etc/koschei/config-frontend.cfg``, and is merged with the default -configuration in ``/usr/share/koschei/config.cfg`` (the ones in etc -overrides the defaults in usr). Note the merge is recursive. The -configuration contains all configurable items for all Koschei services -and the frontend. The alterations to configuration that aren't -temporary should be done through ansible playbook. Configuration -changes have no effect on already running services -- they need to be -restarted, which happens automatically when using the playbook. - - -Disk usage ----------- -Koschei doesn't keep on disk anything that couldn't be recreated -easily - all important data is stored in PostgreSQL database, -configuration is managed by Ansible, code installed by RPM and so on. - -To speed up operation and reduce load on external servers, Koschei -caches some data obtained from services it integrates with. Most -notably, YUM repositories downloaded from Koji are kept in -``/var/cache/koschei/repodata``. Each repository takes about 100 MB -of disk space. Maximal number of repositories kept at time is -controlled by ``cache_l2_capacity`` parameter in -``config-backend.cfg`` (``config-backend.cfg.j2`` in Ansible). If -repodata cache starts to consume too much disk space, that value can -be decreased - after restart, koschei-resolver will remove least -recently used cache entries to respect configured cache capacity. - - -Database --------- -Koschei needs to connect to a PostgreSQL database, other database -systems are not supported. Database connection is specified in the -configuration under the "database_config" key that can contain the -following keys: username, password, host, port, database. - -After an update of koschei, the database needs to be migrated to new -schema. This is handled using alembic:: - - alembic -c /usr/share/koschei/alembic.ini upgrade head - -The backend services need to be stopped during the migration. - - -Managing koschei services -------------------------- -Koschei services are systemd units managed through systemctl. They can -be started and stopped independently in any order. The frontend is run -using httpd. - - -Suspespending koschei operation -------------------------------- -For stopping builds from being scheduled, stopping the koschei-scheduler -service is enough. For planned Koji outages, it's recommended to stop -koschei-scheduler. It is not necessary, as koschei can recover -from Koji errors and network errors automatically, but when Koji -builders are stopped, it may cause unexpected build failures that would -be reported to users. Other services can be left running as they -automatically restart themselves on Koji and network errors. - - -Limiting Koji usage -------------------- -Koschei is by default limited to 30 concurrently running builds. This -limit can be changed in the configuration under -"koji_config"/"max_builds" key. There's also Koji load monitoring, that -prevents builds from being scheduled when Koji load is higher that -certain threshold. That should prevent scheduling builds during mass -rebuilds, so it's not necessary to stop scheduling during those. - - -Fedmsg notifications --------------------- -Koschei optionally supports sending fedmsg notifications for package -state changes. The fedmsg dispatch can be turned on and off in the -configuration (key "fedmsg-publisher"/"enabled"). Koschei doesn't supply -configuration for fedmsg, it lets the library to load it's own (in -/etc/fedmsg.d/). - - -Setting admin announcement --------------------------- -Koschei can display announcement in web UI. This is mostly useful to -inform users about outages or other problems. - -To set announcement, run as koschei user:: - - koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage" - -or:: - - koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild" - -To clear announcement, run as koschei user:: - - koschei-admin clear-notice - - -Adding package groups ---------------------- -Packages can be added to one or more group. Currently, only Koschei -admins can add new groups. - -To add new group named "mynewgroup", run as koschei user: - - koschei-admin add-group mynewgroup - -To add new group named "mynewgroup" and populate it with some -packages, run as koschei user: - - koschei-admin add-group mynewgroup pkg1 pkg2 pkg3 - - -Set package static priority ---------------------------- -Some packages are more or less important and can have higher or lower -priority. Any user can change manual priority, which is reset after -package is rebuilt. Admins can additionally set static priority, which -is not affected by package rebuilds. - -To set static priority of package "foo" to value "100", run as -koschei user:: - - koschei-admin set-priority --static foo 100 diff --git a/docs/sops/layered-image-buildsys.rst b/docs/sops/layered-image-buildsys.rst deleted file mode 100644 index 650cf3d..0000000 --- a/docs/sops/layered-image-buildsys.rst +++ /dev/null @@ -1,283 +0,0 @@ -.. title: Layered Image Build System -.. slug: layered-image-buildsys -.. date: 2016-12-15 -.. taxonomy: Contributors/Infrastructure - -========================== -Layered Image Build System -========================== - -The `Fedora Layered Image Build System`_, often referred to as `OSBS`_ -(OpenShift Build Service) as that is the upstream project that this is based on, -is used to build Layered Container Images in the Fedora Infrastructure via Koji. - - -Contents -======== - -1. Contact Information -2. Overview -3. Setup -4. Outage - - -Contact Information -=================== - -Owner - Adam Miller (maxamillion) - -Contact - #fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main, sysadmin-releng - -Location - osbs-control01, osbs-master01, osbs-node01, osbs-node02 - registry.fedoraproject.org, candidate-registry.fedoraproject.org - - osbs-control01.stg, osbs-master01.stg, osbs-node01.stg, osbs-node02.stg - registry.stg.fedoraproject.org, candidate-registry.stg.fedoraproject.org - - x86_64 koji buildvms - -Purpose - Layered Container Image Builds - - -Overview -======== - -The build system is setup such that Fedora Layered Image maintainers will submit -a build to Koji via the ``fedpkg container-build`` command a ``docker`` -namespace within `DistGit`_. This will trigger the build to be scheduled in -`OpenShift`_ via `osbs-client`_ tooling, this will create a custom -`OpenShift Build`_ which will use the pre-made buildroot `Docker`_ image that we -have created. The `Atomic Reactor`_ (``atomic-reactor``) utility will run within -the buildroot and prep the build container where the actual build action will -execute, it will also maintain uploading the `Content Generator`_ metadata back -to `Koji`_ and upload the built image to the candidate docker registry. This -will run on a host with iptables rules restricting access to the docker bridge, -this is how we will further limit the access of the buildroot to the outside -world verifying that all sources of information come from Fedora. - -Completed layered image builds are hosted in a candidate docker registry which -is then used to pull the image and perform tests with `Taskotron`_. The -taskotron tests are triggered by a `fedmsg`_ message that is emitted from -`Koji`_ once the build is complete. Once the test is complete, taskotron will -send fedmsg which is then caught by the `RelEng Automation`_ Engine that will -run the Automatic Release tasks in order to push the layered image into a stable -docker registry in the production space for end users to consume. - -For more information, please consult the `RelEng Architecture Document`_. - - -Setup -===== - -The Layered Image Build System setup is currently as follows (more detailed view -available in the `RelEng Architecture Document`_): - -:: - - === Layered Image Build System Overview === - - +--------------+ +-----------+ - | | | | - | koji hub +----+ | batcave | - | | | | | - +--------------+ | +----+------+ - | | - V | - +----------------+ V - | | +----------------+ - | koji builder | | +-----------+ - | | | osbs-control01 +--------+ | - +-+--------------+ | +-----+ | | - | +----------------+ | | | - | | | | - | | | | - | | | | - V | | | - +----------------+ | | | - | | | | | - | osbs-master01 +------------------------------+ [ansible] - | +-------+ | | | | - +----------------+ | | | | | - ^ | | | | | - | | | | | | - | V V | | | - | +-----------------+ +----------------+ | | | - | | | | | | | | - | | osbs-node01 | | osbs-node02 | | | | - | | | | | | | | - | +-----------------+ +----------------+ | | | - | ^ ^ | | | - | | | | | | - | | +-----------+ | | - | | | | - | +------------------------------------------+ | - | | - +-------------------------------------------------------------+ - - -Deployment ----------- -The osbs-control01 host is where the `ansible-ansbile-openshift-ansible`_ role -is called from the `osbs-cluster.yml`_ playbook in order to configure the -OpenShift Cluster where OSBS is deployed on top of. - - -Operation ---------- -Koji Hub will schedule the containerBuild on a koji builder via the -koji-containerbuild-hub plugin, the builder will then submit the build in -OpenShift via the koji-containerbuild-builder plugin which uses the osbs-client -python API that wraps the OpenShift API along with a custom OpenShift Build JSON -payload. - -The Build is then scheduled in OpenShift and it's logs are captured by the koji -plugins. Inside the buildroot, atomic-reactor will upload the built container -image as well as provide the metadata to koji's content generator. - - -Outage -====== - -If Koji is down, then builds can't be scheduled but repairing Koji is outside -the scope of this document. - -If either the candidate-registry.fedoraproject.org or registry.fedoraproject.org -Container Registries are unavailable, but repairing those is also outside the -scope of this document. - -OSBS Failures -------------- - -OpenShift Build System itself can have various types of failures that are known -about and the recovery procedures are listed below. - -Ran out of disk space -~~~~~~~~~~~~~~~~~~~~~ - -Docker uses a lot of disk space, and while the osbs-nodes have been alloted what -is considered to be ample disk space for builds (since they are automatically -cleaned up periodically) it is possible this will run out. - -To resolve this, run the following commands: - -:: - - # These command will clean up old/dead docker containers from old OpenShift - # Pods - - $ for i in $(sudo docker ps -a | awk '/Exited/ { print $1 }'); do sudo docker rm $i; done - - $ for i in $(sudo docker images -q -f 'dangling=true'); do sudo docker rmi $i; done - - - # This command should only be run on osbs-master01 (it won't work on the - # nodes) - # - # This command will clean up old builds and related artifacts in OpenShift - # that are older than 30 days (We can get more aggressive about this if - # necessary, the main reason these still exist is in the event we need to - # debug something. All build info we care about is stored in Koji.) - - $ oadm prune builds --orphans --keep-younger-than=720h0m0s --confirm - -A node is broken, how to remove it from the cluster? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If a node is having an issue, the following command will effectively remove it -from the cluster temporarily. - -In this example, we are removing osbs-node01 - -:: - - $ oadm manage-node osbs-node01.phx2.fedoraproject.org --schedulable=true - - -Container Builds are unable to access resources on the network -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Sometimes the Container Builds will fail and the logs will show that the -buildroot is unable to access networked resources (docker registry, dnf repos, -etc). - -This is because of a bug in OpenShift v1.3.1 (current upstream release at the -time of this writing) where an OpenVSwitch flow is left behind when a Pod is -destroyed instead of the flow being deleted along with the Pod. - -Method to confirm the issue is unfortunately multi-step since it's not -a cluster-wide issue but isolated to the node experiencing the problem. - -First in the koji createContainer task there is a log file called -openshift-incremental.log and in there you will find a key:value in some JSON -output similar to the following: - -:: - - 'openshift_build_selflink': u'/oapi/v1/namespaces/default/builds/cockpit-f24-6`` - - -The last field of the value, in this example ``cockpit-f24-6`` is the OpenShift -build identifier. We need to ssh into ``osbs-master01`` and get information -about which node that ran on. - -:: - - # On osbs-master01 - # Note: the output won't be pretty, but it gives you the info you need - - $ sudo oc get build cockpit-f25-3 -o yaml | grep osbs-node - - -Once you know what machine you need, ssh into it and run the following: - -:: - - $ sudo docker run --rm -ti buildroot /bin/bash' - - # now attempt to run a curl command - - $ curl https://google.com - # This should get refused, but if this node is experiencing the networking - # issue then this command will hang and eventually time out - -How to fix: - -Reboot the affected node that's experiencing the issue, when the node comes back -up OpenShift will rebuild the flow tables on OpenVSwitch and things will be back -to normal. - -:: - - systemctl reboot - - - - - -.. CITATIONS/LINKS -.. _fedmsg: http://www.fedmsg.com/en/latest/ -.. _Koji: https://fedoraproject.org/wiki/Koji -.. _Docker: https://github.com/docker/docker/ -.. _OpenShift: https://www.openshift.org/ -.. _Taskotron: https://taskotron.fedoraproject.org/ -.. _docker-registry: https://docs.docker.com/registry/ -.. _RelEng Automation: https://pagure.io/releng-automation -.. _osbs-client: https://github.com/projectatomic/osbs-client -.. _docker-distribution: https://github.com/docker/distribution/ -.. _Atomic Reactor: https://github.com/projectatomic/atomic-reactor -.. _DistGit: - https://fedoraproject.org/wiki/Infrastructure/VersionControl/dist-git -.. _OpenShift Build: - https://docs.openshift.org/latest/dev_guide/builds.html -.. _Content Generator: - https://fedoraproject.org/wiki/Koji/ContentGenerators -.. _RelEng Architecture Document: - https://docs.pagure.org/releng/layered_image_build_service.html -.. _osbs-cluster: - https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/groups/osbs-cluster.yml -.. _ansible-ansible-openshift-ansible: - https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/ansible-ansible-openshift-ansible diff --git a/docs/sops/linktracking.rst b/docs/sops/linktracking.rst deleted file mode 100644 index 67470aa..0000000 --- a/docs/sops/linktracking.rst +++ /dev/null @@ -1,79 +0,0 @@ -.. title: Link Tracking SOP -.. slug: infra-link-tracking -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -============= -Link tracking -============= - -Using link tracking is [43]an easy way for us to find out how people are -getting to our download page. People might click over to our download page -from any of a number of areas, and knowing the relative usage of those -links can help us understand what materials we're producing are more -effective than others. - -Adding links -============ - -Each link should be constructed by adding ? to the URL, followed by a -short code that includes: - -* an indicator for the link source (such as the wiki release notes) -* an indicator for the Fedora release in specific (such as F15 for the - final, or F15a for the Alpha test release) - -So a link to get.fp.o from the one-page release notes would become -http://get.fedoraproject.org/?opF15. - -FAQ -=== -I want to copy a link to my status update for social networking, or my blog. - If you're posting a status update to identi.ca, for example, use - the link tracking code for status updates. Don't copy a link - straight from an announcement that includes link tracking from the - announcement. You can copy the link itself but remember to change - the portion after the ? to instead use the st code for status - updates and blogs, followed by the Fedora release version (such as - F16a, F16b, or F16), like this:: - - http://fedoraproject.org/get-prerelease?stF16a - -I want to point people to the announcement from my blog. Should I use the announcement link tracking code? - The actual URL link itself is the announcement URL. Add the link - tracking code for blogs, which would start with ?st and end with - the Fedora release version, like this:: - - http://fedoraproject.org/wiki/F16_release_announcement?stF16a - -The codes -========= - -.. note:: - Additions to this table are welcome. -=============================================== ========== -Link source Code -=============================================== ========== -Email announcements an ------------------------------------------------ ---------- -Wiki announcements wkan ------------------------------------------------ ---------- -Front page fp ------------------------------------------------ ---------- -Front page of wiki wkfp ------------------------------------------------ ---------- -The press release Red Hat makes rhpr ------------------------------------------------ ---------- -http://redhat.com/fedora rhf ------------------------------------------------ ---------- -Test phase release notes on wkrn ------------------------------------------------ ---------- -Official release notes rn ------------------------------------------------ ---------- -Official installation guide ig ------------------------------------------------ ---------- -One-page release notes op ------------------------------------------------ ---------- -Status links (blogs, social media) st -=============================================== ========== - diff --git a/docs/sops/loopabull.rst b/docs/sops/loopabull.rst deleted file mode 100644 index 4f8a12a..0000000 --- a/docs/sops/loopabull.rst +++ /dev/null @@ -1,122 +0,0 @@ -.. title: Loopabull -.. slug: loopabull -.. date: 2017-01-17 -.. taxonomy: Contributors/Infrastructure - - -.. ########################################################################## -.. NOTE: This document is currently under construction. The service described - herein is not yet in production. -.. ########################################################################## - - -========= -Loopabull -========= - -`Loopabull`_ is an event-driven `Ansible`_-based automation engine. This is used -for various tasks, originally slated for `Release Engineering Automation`_. - -Contents -======== - -1. Contact Information -2. Overview -3. Setup -4. Outage - - -Contact Information -=================== - -Owner - Adam Miller (maxamillion) - -Contact - #fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main, sysadmin-releng - -Location - - TBD - -Purpose - Event Driven Automation of tasks within the Fedora Infrastructure and Fedora - Release Engineering - - -Overview -======== - -The `loopabull`_ system is setup such that an event will take place within the -infrastructure and a `fedmsg`_ is sent, then loopabull will consume that -message, trigger an `Ansible`_ `playbook`_ that shares a name with the fedmsg -topic, and provide the payload of the fedmsg to the playbook as `extra -variables`_. - - -Setup -===== - -The setup is relatively simple, the Overview above describes it and a more -detailed version can be found in the `releng docs`. - -:: - - +-----------------+ +-------------------------------+ - | | | | - | fedmsg +------------>| Looper | - | | | (fedmsg handler plugin) | - | | | | - +-----------------+ +-------------------------------+ - | - | - +-------------------+ | - | | | - | | | - | Loopabull +<-------------+ - | (Event Loop) | - | | - +---------+---------+ - | - | - | - | - V - +----------+-----------+ - | | - | ansible-playbook | - | | - +----------------------+ - -Deployment ----------- - -TBD - - -Outage -====== - -In the event that loopabull isn't responding or isn't running playbooks as it -should be, the following scenarios should be approached. - -Network Interruption --------------------- - -Sometimes if the network is interrupted, the loopabull service will hang because -the fedmsg listener will hold a dead socket open. The service simply needs to be -restarted at that point. - -:: - - systemctl restart loopabull.service - -.. CITATIONS/LINKS -.. _Ansible: https://www.ansible.com/ -.. _fedmsg: http://www.fedmsg.com/en/latest/ -.. _loopabull: https://github.com/maxamillion/loopabull -.. _playbook: http://docs.ansible.com/ansible/playbooks.html -.. _Release Engineering Automation: https://pagure.io/releng-automation -.. _releng docs: https://docs.pagure.org/releng/automation_engine.html -.. _extra variables: - https://github.com/ansible/ansible/blob/devel/docs/man/man1/ansible-playbook.1.asciidoc.in diff --git a/docs/sops/mailman.rst b/docs/sops/mailman.rst deleted file mode 100644 index b171357..0000000 --- a/docs/sops/mailman.rst +++ /dev/null @@ -1,119 +0,0 @@ -.. title: Mailman Infrastructure SOP -.. slug: infra-mailmain -.. date: 2016-10-07 -.. taxonomy: Contributors/Infrastructure - -========================== -Mailman Infrastructure SOP -========================== - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main, sysadmin-tools, sysadmin-hosted - -Location - phx2 - -Servers - mailman01, mailman02, mailman01.stg - -Purpose - Provides mailing list services. - -Description -=========== - -Mailing list services for Fedora projects are located on the -mailman01.phx2.fedoraproject.org server. - -Common Tasks -============ - -Creating a new mailing list ---------------------------- - -* Log into mailman01 -* ``sudo -u mailman mailman3 create @lists.fedora(project|hosted).org --owner @fedoraproject.org --notify`` - - .. note :: - Note that list names should make sense, and not contain the words 'fedora' - or 'list' - the fact that it has to do with Fedora and that it's a list - are both obvious from the domain of the email address. - - .. important:: - Please make sure to add a valid description to the newly - created list. (to avoid [no description available] on listinfo index) - -Removing content from archives -============================== - -We don't. - -It's not easy to remove content from the archives and it's generally -useless as well because the archives are often mirrored by third parties -as well as being in the INBOXs of all of the people on the mailing list at -that time. Here's an example message to send to someone who requests -removal of archived content:: - - Greetings, - - We're sorry to say that we don't remove content from the mailing list archives. - Doing so is a non-trivial amount of work and usually doesn't achieve anything - because the content has already been disseminated to a wide audience that we do - not control. The emails have gone out to all of the subscribers of the mailing - list at that time and also (for a great many of our lists) been copied by third - parties (for instance: http://markmail.org and http://gmane.org). - - Sorry we cannot help further, - - Mailing lists and their owners - -Checking Ownership -================== - -Are you in need of checking who owns a certain mailing list without having -to search around on list's frontpages? - -If yes, mailman have a nice tool that will help us gaining our result in a -few seconds: - -Get a full list of all the mailing lists hosted on the server: (either -fedoraproject.org or fedorahosted.org):: - - sudo /usr/lib/mailman/bin/list_admins -a - -See which lists are owned by example@example.com:: - - sudo /usr/lib/mailman/bin/list_admins -a | grep example@example.com - -Troubleshooting and Resolution -============================== - -List Administration -------------------- - -Specific users are marked as 'site admins' in the database. - -Please file a issue if you feel you need to have this access. - -Restart Procedure ------------------ - -If the server needs to be restarted mailman should come back on it's own. -Otherwise each service on it can be restarted:: - - sudo service mailman3 restart - sudo service postfix restart - -How to delete a mailing list -============================ - -Delete a list, but keep the archives:: - - sudo -u mailman mailman3 remove - diff --git a/docs/sops/making-ssl-certificates.rst b/docs/sops/making-ssl-certificates.rst deleted file mode 100644 index 6b39f52..0000000 --- a/docs/sops/making-ssl-certificates.rst +++ /dev/null @@ -1,57 +0,0 @@ -.. title: Infrastructure SSL Certificate Creation SOP -.. slug: infra-ssl-create -.. date: 2012-07-17 -.. taxonomy: Contributors/Infrastructure - -============================ -SSL Certificate Creation SOP -============================ - -Every now and then you will need to create an SSL certificate for a -Fedora Service. - -Creating a CSR for a new server. -================================ - -Know your hostname, ie `lists.fedoraproject.org``:: - - export ssl_name= - - -Create the cert. 8192 does not work with various boxes so we use 4096 currently.:: - - openssl genrsa -out ${ssl_name}.pem 4096 - openssl req -new -key ${ssl_name}.pem -out $(ssl_name}.csr - - Country Name (2 letter code) [XX]:US - State or Province Name (full name) []:NM - Locality Name (eg, city) [Default City]:Raleigh - Organization Name (eg, company) [Default Company Ltd]:Red Hat - Organizational Unit Name (eg, section) []:Fedora Project - Common Name (eg, your name or your server's hostname) - []:lists.fedorahosted.org - Email Address []:admin@fedoraproject.org - - Please enter the following 'extra' attributes - to be sent with your certificate request - A challenge password []: - An optional company name []: - -send the CSR to the signing authority and wait for a cert. -place all three into private directory so that you can make certs in -the future. - -Creating a temporary self-signed certificate. -============================================= - -Repeat the steps above but add in the following:: - - openssl x509 -req -days 30 -in ${ssl_name}.csr -signkey ${ssl_name}.pem -out ${ssl_name}.cert - Signature ok - subject=/C=US/ST=NM/L=Raleigh/O=Red Hat/OU=Fedora - Project/CN=lists.fedorahosted.org/emailAddress=admin@fedoraproject.org - -Getting Private key - -We only want a self-signed certificate to be good for a short time so 30 -days sounds good. diff --git a/docs/sops/massupgrade.rst b/docs/sops/massupgrade.rst deleted file mode 100644 index 40f491a..0000000 --- a/docs/sops/massupgrade.rst +++ /dev/null @@ -1,438 +0,0 @@ -.. title: Mass Upgrade Infrastructure SOP -.. slug: infra-mass-upgrade -.. date: 2013-07-29 -.. taxonomy: Contributors/Infrastructure - -=============================== -Mass Upgrade Infrastructure SOP -=============================== - -Every once in a while, we need to apply mass upgrades to our servers for -various security and other upgrades. - -Contents --------- - -1. Contact Information -2. Preparation -3. Staging -4. Special Considerations - - * Disable builders - * Post reboot action - * Schedule autoqa01 reboot - * Bastion01 and Bastion02 and openvpn server - * Special yum directives - -5. Update Leader -6. Group A reboots -7. Group B reboots -8. Group C reboots -9. Doing the upgrade -10. Doing the reboot -11. Aftermath - -Contact Information -------------------- - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin-main, - infrastructure@lists.fedoraproject.org, #fedora-noc -Location: - All over the world. -Servers: - all -Purpose: - Apply kernel/other upgrades to all of our servers - -Preparation -=========== - -1. Determine which host group you are going to be doing updates/reboots - on. - - Group "A" - servers that end users will see or note being down - and anything that depends on them. - Group "B" - servers that contributors will see or note being - down and anything that depends on them. - Group "C" - servers that infrastructure will notice are down, - or are redundent enough to reboot some with others taking the - load. - -2. Appoint an 'Update Leader' for the updates. -3. Follow the [61]Outage Infrastructure SOP and send advance notification - to the appropriate lists. Try to schedule the update at a time when - many admins are around to help/watch for problems and when impact for - the group affected is less. Do NOT do multiple groups on the same day - if possible. -4. Plan an order for rebooting the machines considering two factors: - - * Location of systems on the kvm or xen hosts. [You will normally - reboot all systems on a host together] - * Impact of systems going down on other services, operations and - users. Thus since the database servers and nfs servers are the - backbone of many other systems, they and systems that are on the - same xen boxes would be rebooted before other boxes. - -5. To aid in organizing a mass upgrade/reboot with many people helping, - it may help to create a checklist of machines in a gobby document. -6. Schedule downtime in nagios. -7. Make doubly sure that various app owners are aware of the reboots - -Staging -======= - Any updates that can be tested in staging or a pre-production environment - should be tested there first. Including new kernels, updates to core - database applications / libraries. Web applications, libraries, etc. - -Special Considerations -====================== - -While this may not be a complete list, here are some special things that -must be taken into account before rebooting certain systems: - -Disable builders ----------------- - -Before the following machines are rebooted, all koji builders should be -disabled and all running jobs allowed to complete: - - * db04 - * nfs01 - * kojipkgs02 - -Builders can be removed from koji, updated and re-added. Use:: - - koji disable-host NAME - - and - - koji enable-host NAME - -.. note:: you must be a koji admin - -Additionally, rel-eng and builder boxes may need a special version of rpm. -Make sure to check with rel-eng on any rpm upgrades for them. - -Post reboot action ------------------- - -The following machines require post-boot actions (mostly entering -passphrases). Make sure admins that have the passphrases are on hand for -the reboot: - - * backup-2 (LUKS passphrase on boot) - * sign-vault01 (NSS passphrase for sigul service) - * sign-bridge01 (NSS passphrase for sigul bridge service) - * serverbeach* (requires fixing firewall rules): - -Each serverbeach host needs 3 or 4 iptables rules added anytime it's -rebooted or libvirt is upgraded:: - - iptables -I FORWARD -o virbr0 -j ACCEPT - iptables -I FORWARD -i virbr0 -j ACCEPT - iptables -t nat -I POSTROUTING -s 192.168.122.3/32 -j SNAT --to-source 66.135.62.187 - -.. note:: The source is the internal guest ips, the to-source is the external ips that - map to that guest ip. If there are multiple guests, each one needs - the above SNAT rule inserted. - -Schedule autoqa01 reboot ------------------------- -There is currently an autoqa01.c host on cnode01. Check with QA folks -before rebooting this guest/host. - -Bastion01 and Bastion02 and openvpn server ------------------------------------------- - -We need one of the bastion machines to be up to provide openvpn for all -machines. Before rebooting bastion02, modify: -``manifests/nodes/bastion0*.phx2.fedoraproject.org.pp`` files to start openvpn -server on bastion01, wait for all clients to re-connect, reboot bastion02 -and then revert back to it as openvpn hub. - -Special yum directives ----------------------- - -Sometimes we will wish to exclude or otherwise modify the yum.conf on a -machine. For this purpose, all machines have an include, making them read -[62]http://infrastructure.fedoraproject.org/infra/hosts/FQHN/yum.conf.include -from the infrastructure repo. If you need to make such changes, add them -to the infrastructure repo before doing updates. - -Update Leader -============= - -Each update should have a Leader appointed. This person will be in charge -of doing any read-write operations, and delegating to others to do tasks. -If you aren't specficially asked by the Leader to reboot or change -something, please don't. The Leader will assign out machine groups to -reboot, or ask specific people to look at machines that didn't come back -up from reboot or aren't working right after reboot. It's important to -avoid multiple people operating on a single machine in a read-write manner -and interfering with changes. - -Group A reboots -=============== - -Group A machines are end user critical ones. Outages here should be -planned at least a week in advance and announced to the announce list. - -List of machines currently in A group (note: this is going to be -automated) - -These hosts are grouped based on the virt host they reside on: - -* torrent02.fedoraproject.org -* ibiblio02.fedoraproject.org - -* people03.fedoraproject.org -* ibiblio03.fedoraproject.org - -* collab01.fedoraproject.org -* serverbeach09.fedoraproject.org - -* db05.phx2.fedoraproject.org -* virthost03.phx2.fedoraproject.org - -* db01.phx2.fedoraproject.org -* virthost04.phx2.fedoraproject.org - -* db-fas01.phx2.fedoraproject.org -* proxy01.phx2.fedoraproject.org -* virthost05.phx2.fedoraproject.org - -* ask01.phx2.fedoraproject.org -* virthost06.phx2.fedoraproject.org - -These are the rest: - -* bapp02.phx2.fedoraproject.org -* bastion02.phx2.fedoraproject.org -* app05.fedoraproject.org -* backup02.fedoraproject.org -* bastion01.phx2.fedoraproject.org -* fas01.phx2.fedoraproject.org -* fas02.phx2.fedoraproject.org -* log02.phx2.fedoraproject.org -* memcached03.phx2.fedoraproject.org -* noc01.phx2.fedoraproject.org -* ns02.fedoraproject.org -* ns04.phx2.fedoraproject.org -* proxy04.fedoraproject.org -* smtp-mm03.fedoraproject.org -* batcave02.phx2.fedoraproject.org -* mm3test.fedoraproject.org -* packages02.phx2.fedoraproject.org - -Group B reboots ---------------- -This Group contains machines that contributors use. Announcements of -outages here should be at least a week in advance and sent to the -devel-announce list. - -These hosts are grouped based on the virt host they reside on: - -* db04.phx2.fedoraproject.org -* bvirthost01.phx2.fedoraproject.org - -* nfs01.phx2.fedoraproject.org -* bvirthost02.phx2.fedoraproject.org - -* pkgs01.phx2.fedoraproject.org -* bvirthost03.phx2.fedoraproject.org - -* kojipkgs02.phx2.fedoraproject.org -* bvirthost04.phx2.fedoraproject.org - -These are the rest: - -* koji04.phx2.fedoraproject.org -* releng03.phx2.fedoraproject.org -* releng04.phx2.fedoraproject.org - -Group C reboots ---------------- -Group C are machines that infrastructure uses, or can be rebooted in such -a way as to continue to provide services to others via multiple machines. -Outages here should be announced on the infrastructure list. - -Group C hosts that have proxy servers on them: - -* proxy02.fedoraproject.org -* ns05.fedoraproject.org -* hosted-lists01.fedoraproject.org -* internetx01.fedoraproject.org - -* app01.dev.fedoraproject.org -* darkserver01.dev.fedoraproject.org -* fakefas01.fedoraproject.org -* proxy06.fedoraproject.org -* osuosl01.fedoraproject.org - -* proxy07.fedoraproject.org -* bodhost01.fedoraproject.org - -* proxy03.fedoraproject.org -* smtp-mm02.fedoraproject.org -* tummy01.fedoraproject.org - -* app06.fedoraproject.org -* noc02.fedoraproject.org -* proxy05.fedoraproject.org -* smtp-mm01.fedoraproject.org -* telia01.fedoraproject.org - -* app08.fedoraproject.org -* proxy08.fedoraproject.org -* coloamer01.fedoraproject.org - - Other Group C hosts: - -* ask01.stg.phx2.fedoraproject.org -* app02.stg.phx2.fedoraproject.org -* proxy01.stg.phx2.fedoraproject.org -* releng01.stg.phx2.fedoraproject.org -* value01.stg.phx2.fedoraproject.org -* virthost13.phx2.fedoraproject.org - -* db-fas01.stg.phx2.fedoraproject.org -* pkgs01.stg.phx2.fedoraproject.org -* packages01.stg.phx2.fedoraproject.org -* virthost11.phx2.fedoraproject.org - -* app01.stg.phx2.fedoraproject.org -* koji01.stg.phx2.fedoraproject.org -* db02.stg.phx2.fedoraproject.org -* fas01.stg.phx2.fedoraproject.org -* virthost10.phx2.fedoraproject.org - - -* autoqa01.qa.fedoraproject.org -* autoqa-stg01.qa.fedoraproject.org -* bastion-comm01.qa.fedoraproject.org -* batcave-comm01.qa.fedoraproject.org -* virthost-comm01.qa.fedoraproject.org - -* compose-x86-01.phx2.fedoraproject.org - -* compose-x86-02.phx2.fedoraproject.org - -* download01.phx2.fedoraproject.org -* download02.phx2.fedoraproject.org -* download03.phx2.fedoraproject.org -* download04.phx2.fedoraproject.org -* download05.phx2.fedoraproject.org - -* download-rdu01.vpn.fedoraproject.org -* download-rdu02.vpn.fedoraproject.org -* download-rdu03.vpn.fedoraproject.org - -* fas03.phx2.fedoraproject.org -* secondary01.phx2.fedoraproject.org -* memcached04.phx2.fedoraproject.org -* virthost01.phx2.fedoraproject.org - -* app02.phx2.fedoraproject.org -* value03.phx2.fedoraproject.org -* virthost07.phx2.fedoraproject.org - -* app03.phx2.fedoraproject.org -* value04.phx2.fedoraproject.org -* ns03.phx2.fedoraproject.org -* darkserver01.phx2.fedoraproject.org -* virthost08.phx2.fedoraproject.org - -* app04.phx2.fedoraproject.org -* packages02.phx2.fedoraproject.org -* virthost09.phx2.fedoraproject.org - -* hosted03.fedoraproject.org -* serverbeach06.fedoraproject.org - -* hosted04.fedoraproject.org -* serverbeach07.fedoraproject.org - -* collab02.fedoraproject.org -* serverbeach08.fedoraproject.org - -* dhcp01.phx2.fedoraproject.org -* relepel01.phx2.fedoraproject.org -* sign-bridge02.phx2.fedoraproject.org -* koji03.phx2.fedoraproject.org -* bvirthost05.phx2.fedoraproject.org - -* (disable each builder in turn, update and reenable). -* ppc11.phx2.fedoraproject.org -* ppc12.phx2.fedoraproject.org - -* backup03 - -Doing the upgrade -================= - -If possible, system upgrades should be done in advance of the reboot (with -relevant testing of new packages on staging). To do the upgrades, make -sure that the Infrastructure RHEL repo is updated as necessary to pull in -the new packages ([63]Infrastructure Yum Repo SOP) - -On batcave01, as root run:: - - func-yum [--host=hostname] update - -..note: --host can be specified multiple times and takes wildcards. - -pinging people as necessary if you are unsure about any packages. - -Additionally you can see which machines still need rebooted with:: - - sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py | grep yes - -You can also see which machines would need a reboot if updates were all -applied with:: - - sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py after-updates | grep yes - -Doing the reboot -================ - -In the order determined above, reboots will usually be grouped by the -virtualization hosts that the servers are on. You can see the guests per -virt host on batcave01 in /var/log/virthost-lists.out - -To reboot sets of boxes based on which virthost they are we've written a special -script which facilitates it:: - - func-vhost-reboot virthost-fqdn - -ex:: - - sudo func-vhost-reboot virthost13.phx2.fedoraproject.org - -Aftermath -========= - -1. Make sure that everything's running fine -2. Reenable nagios notification as needed -3. Make sure to perform any manual post-boot setup (such as entering - passphrases for encrypted volumes) -4. Close outage ticket. - - -Non virthost reboots: ---------------------- - -If you need to reboot specific hosts and make sure they recover - consider using:: - - sudo func-host-reboot hostname hostname1 hostname2 ... - -If you want to reboot the hosts one at a time waiting for each to come back before rebooting the next -pass a -o to func-host-reboot. - - - diff --git a/docs/sops/mastermirror.rst b/docs/sops/mastermirror.rst deleted file mode 100644 index fced0fc..0000000 --- a/docs/sops/mastermirror.rst +++ /dev/null @@ -1,81 +0,0 @@ -.. title: Master Mirror Infrastructure SOP -.. slug: infra-master-mirror -.. date: 2011-12-22 -.. taxonomy: Contributors/Infrastructure - -================================ -Master Mirror Infrastructure SOP -================================ - -Contents -======== - -1. Contact Information -2. PHX Master Mirror Setup -3. RDU I2 Master Mirror Setup -4. Raising Issues - - -Contact Information -=================== - -Owner: - Red Hat IS -Contact: - #fedora-admin, Red Hat ticket -Location: - PHX -Servers: - server[1-5].download.phx.redhat.com -Purpose: - Provides the master mirrors for Fedora distribution - - -PHX Master Mirror Setup -======================= - -The master mirrors are accessible as:: - - download1.fedora.redhat.com -> CNAME to download3.fedora.redhat.com - download2.fedora.redhat.com -> currently no DNS entry - download3.fedora.redhat.com -> 209.132.176.20 - download4.fedora.redhat.com -> 209.132.176.220 - download5.fedora.redhat.com -> 209.132.176.221 - -from the outside. download.fedora.redhat.com is a round robin to the above - IPs. - -The external IPs correspond to internal load balancer IPs that balance -between server[1-5]:: - - 209.132.176.20 -> 10.9.24.20 - 209.132.176.220 -> 10.9.24.220 - 209.132.176.221 -> 10.9.24.221 - -The load balancers then balance between the below Fedora IPs on the rsync -servers:: - - 10.8.24.21 (fedora1.download.phx.redhat.com) - server1.download.phx.redhat.com - 10.8.24.22 (fedora2.download.phx.redhat.com) - server2.download.phx.redhat.com - 10.8.24.23 (fedora3.download.phx.redhat.com) - server3.download.phx.redhat.com - 10.8.24.24 (fedora4.download.phx.redhat.com) - server4.download.phx.redhat.com - 10.8.24.25 (fedora5.download.phx.redhat.com) - server5.download.phx.redhat.com - - -RDU I2 Master Mirror Setup -========================== - -.. note:: This section is awaiting confirmation from RH - information here may - not be 100% accurate yet. - -download-i2.fedora.redhat.com (rhm-i2.redhat.com) is a round robin -between:: - - 204.85.14.3 - 10.11.45.3 - 204.85.14.5 - 10.11.45.5 - - -Raising Issues -============== - -Issues with any of this setup should be raised in a helpdesk ticket. diff --git a/docs/sops/memcached.rst b/docs/sops/memcached.rst deleted file mode 100644 index 91693a0..0000000 --- a/docs/sops/memcached.rst +++ /dev/null @@ -1,79 +0,0 @@ -.. title: Memcached Infrastructure SOP -.. slug: infra-memcached -.. date: 2013-06-29 -.. taxonomy: Contributors/Infrastructure - -============================ -Memcached Infrastructure SOP -============================ - -Our memcached setup is currently only used for wiki sessions. With -mediawiki, sessions stored in files over NFS or in the DB are very slow. -Memcached is a non-blocking solution for our session storage. - -Contents -======== - -1. Contact Information -2. Checking Status -3. Flushing Memcached -4. Restarting Memcached -5. Configuring Memcached - -Contact Information -=================== -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main, sysadmin-web groups - -Location - PHX - -Servers - memcached03, memcached04 - -Purpose - Provide caching for Fedora web applications. - -Checking Status -=============== - -Our memcached instances are currently firewalled to only allow access from -wiki application servers. To check the status of an instance, use:: - - echo stats | nc memcached0{3,4} 11211 - -from an allowed host. - - -Flushing Memcached -================== -Sometimes, wrong contents get cached, and the cache should be flushed. -To do this, use:: - - echo flush_all | nc memcached0{3,4} 11211 - -from an allowed host. - - -Restarting Memcached -==================== -Note that restarting an memcached instance will drop all sessions stored -on that instance. As mediawiki uses hashing to distribute sessions across -multiple instances, restarting one out of two instances will result in -about half of the total sessions being dropped. - -To restart memcached:: - - sudo /etc/init.d/memcached restart - -Configuring Memcached -===================== -Memcached is currently setup as a role in the ansible git repo. The main -two tunables are the MAXCONN (the maximum number of concurrent -connections) and CACHESIZE (the amount memory to use for storage). These -variables can be set through $memcached_maxconn and $memcached_cachesize -in ansible. Additionally, other options (as described in the memcached -manpage) can be set via $memcached_options. diff --git a/docs/sops/mirrorhiding.rst b/docs/sops/mirrorhiding.rst deleted file mode 100644 index 9158591..0000000 --- a/docs/sops/mirrorhiding.rst +++ /dev/null @@ -1,45 +0,0 @@ -.. title: Mirror Hiding Infrastructure SOP -.. slug: infra-mirror-hiding -.. date: 2011-08-23 -.. taxonomy: Contributors/Infrastructure - -================================ -Mirror hiding Infrastructure SOP -================================ - -At times, such as release day, there may be a conflict between Red Hat -trying to release content for RHEL, and Fedora trying to release Fedora. -One way to limit the pain to Red Hat on release day is to hide -download.fedora.redhat.com from the publiclist and mirrorlist redirector, -which will keep most people from downloading the content from Red Hat -directly. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main, sysadmin-web group -Location - Phoenix -Servers - app3, app4 -Purpose - Hide Public Mirrors from the publiclist / mirrorlist redirector - -Description -=========== -To hide a public mirror, so it doesn't appear on the publiclist or the -mirrorlist, simply go into the MirrorManager administrative web user -interface, at [45]https://admin.fedoraproject.org/mirrormanager. Fedora -sysadmins can see all Sites and Hosts. For each Site and Host, there is a -checkbox marked "private", which if set, will hide that Site (and all its -Hosts), or just that single Host, such that it won't appear on the public -lists. - -To make a private-marked mirror public, simply clear the "private" -checkbox again. - -This change takes effect at the top of each hour. - diff --git a/docs/sops/mirrormanager-S3-EC2-netblocks.rst b/docs/sops/mirrormanager-S3-EC2-netblocks.rst deleted file mode 100644 index c3e2eef..0000000 --- a/docs/sops/mirrormanager-S3-EC2-netblocks.rst +++ /dev/null @@ -1,28 +0,0 @@ -.. title: Infrastructure AWS Mirroring SOP -.. slug: infra-aws-mirror -.. date: 2014-12-05 -.. taxonomy: Contributors/Infrastructure - -=========== -AWS Mirrors -=========== - -Fedora Infrastructure mirrors EPEL content (/pub/epel) into Amazon -Simple Storage Service (S3) in multiple regions, to make it fast for -EC2 CentOS/RHEL users to get EPEL content from an effectively local -mirror. - -For this to work, we have private mirror entries in MirrorManager, one -for each region, which include the EC2 netblocks for that region. - -Amazon updates their list of network blocks roughly monthly, as they -consume additional address space. Therefore, we need to make the -corresponding changes into MirrorManager's entries for same. - -Amazon publishes their list of network blocks on their forum site, -with the subject "Announcement: Amazon EC2 Public IP Ranges". As of -November 2014, this was -https://forums.aws.amazon.com/ann.jspa?annID=1701 - -As of November 19, 2014, Amazon publishes it as a JSON file we can download. -http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html diff --git a/docs/sops/mirrormanager.rst b/docs/sops/mirrormanager.rst deleted file mode 100644 index 8872236..0000000 --- a/docs/sops/mirrormanager.rst +++ /dev/null @@ -1,104 +0,0 @@ -.. title: MirrorManager Infrastucture SOP -.. slug: infra-mirrormanager -.. date: 2012-04-24 -.. taxonomy: Contributors/Infrastructure - -================================ -MirrorManager Infrastructure SOP -================================ - -Mirrormanager manages mirrors for fedora distribution. - -Contents - -1. Contact Information -2. Description - - 1. Release Preparation - -3. Troubleshooting and Resolution - - 1. Regenerating the Publiclist - 2. Hung admin.fedoraproject.org/mirrormanager - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main, sysadmin-web - -Location - Phoenix - -Servers - app01, app02, app03, app04, app05, app06, bapp02 - -Purpose - Manage mirrors for Fedora distribution - -Description -=========== - -Mirrormanager handles our mirroring system. It keeps track of lists of -valid mirrors and handles handing out metalink urls to end users to download -packages from. - -On the backend app server (bapp01 or bapp02), mirrormanager runs crawlers to -check mirror contents, a job to update the public lists and other -housekeeping jobs. This data is then synced to the app* servers to serve to -end users. - -Release Preparation -=================== - -MirrorManager should automatically detect the new release version, and -will create a new Version() object in the database. This is visible on the -Version page in the web UI, and on mirrors.fp.o. - -If the versioning scheme changes, it's possible this will fail. If so, -contact the Mirror Wrangler. - -Troubleshooting and Resolution -============================== - -Regenerating the Publiclist ---------------------------- - -On bapp02:: - - sudo -u mirrormanager -i - -then:: - - /usr/share/mirrormanager/server/update-mirrorlist-server > /tmp/mirrormanager-mirrorlist.log 2>&1 && \ - /usr/share/mirrormanager/mm_sync_out - -To make this take effect immediately, you may need to remove the cache on -the proxies:: - - # As root on proxy0[1-7] - rm -rf /srv/cache/mod_cache/* - -Hung admin.fedoraproject.org/mirrormanager ------------------------------------------- - -This generally happens when an app server loses connection to db2. - -1. on bapp02 and app[1-6], su up, and restart apache. - -2. on bapp02, if crawlers and update-master-directory-list are likewise - hung, kill them too. You may need to delete stale - ``/var/lock/mirrormanager/*`` lockfiles as well. - -Restarting mirrorlist_server ----------------------------- - -mirrorlist_server on the app* machines is managed via supervisord. If you want -to restart it, use:: - - supervisorctl restart - - diff --git a/docs/sops/mote.rst b/docs/sops/mote.rst deleted file mode 100644 index 7914c5b..0000000 --- a/docs/sops/mote.rst +++ /dev/null @@ -1,107 +0,0 @@ -.. title: mote SOP -.. slug: infra-mote -.. date: 2015-06-13 -.. taxonomy: Contributors/Infrastructure - -=========== -mote SOP -=========== - -mote is a MeetBot log wrangler, providing -an user-friendly interface for viewing logs produced -by Fedora's IRC meetings. - -Production instance: http://meetbot.fedoraproject.org/ -Staging instance: http://meetbot.stg.fedoraproject.org - -Contents --------- -1. Contact information -2. Deployment -3. Description -4. Configuration -5. Database -6. Managing mote -7. Suspespending mote operation -8. Changing mote's name and category definitions - -Contact Information -------------------- -Owner - cydrobolt -Contact - #fedora-admin -Location - Fedora Infrastructure -Purpose - IRC meeting coordination - - -Deployment ----------- -If you have access to rbac-playbook:: - - sudo rbac-playbook groups/value.yml - -Forcing Reload --------------- - -There is a playbook that can force mote to update its cache -in case it gets stuck somehow:: - - sudo rbac-playbook manual/rebuild/mote.yml - -Doing Upgrades --------------- - -Put a new copy of the mote rpm in the infra repo and run:: - - sudo rbac-playbook manual/upgrade/mote.yml - -Description ------------ -mote is a Python webapp running on Flask with mod_wsgi. -It can be used to view past logs, browse meeting minutes, or -glean other information relevant to Fedora's IRC meetings. -It employs a JSON file store cache, in addition to a -memcached store which is currently not in use with -Fedora infrastructure. - - -Configuration -------------- -mote configuration is located in ``/etc/mote/config.py``. The -configuration contains all configurable items for all mote services. -Alterations to configuration that aren't temporary should be done through ansible playbooks. -Configuration changes have no effect on running services -- they -need to be restarted, which can be done using the playbook. - - -Database --------- -mote does not currently utilise any databases, although it uses a -file store in Fedora Infrastructure and has an optional memcached store -which is currently unused. - -Managing mote -------------------------- -mote is ran using mod_wsgi and httpd, hence, you must -manage the ``httpd`` service to change mote's status. - -Suspespending mote operation -------------------------------- -mote can be stopped by stopping the ``httpd`` service. -:: - service httpd stop - -Changing mote's name and category definitions ------------------------------------------------- -mote uses a set of JSON name and category definitions to provide -friendly names, aliases, and listings on its interface. -These definitions can be located in mote's GitHub repository, -and need to be pulled into ansible in order to be deployed. - -These files are ``name_mappings.json`` and ``category_mappings.json``. -To deploy an update to these definitions, place the updated name and -category mapping files in ``ansible/roles/mote/templates``. Run -the playbook in order to deploy your changes. diff --git a/docs/sops/nagios.rst b/docs/sops/nagios.rst deleted file mode 100644 index 7afebb0..0000000 --- a/docs/sops/nagios.rst +++ /dev/null @@ -1,94 +0,0 @@ -.. title: Infrastructure Nagios SOP -.. slug: infra-nagios -.. date: 2012-07-09 -.. taxonomy: Contributors/Infrastructure - -============================ -Fedora Infrastructure Nagios -============================ - -Contact Information -=================== - -Owner - sysadmin-main, sysadmin-noc -Contact - #fedora-admin, #fedora-noc -Location - Anywhere -Servers - noc01, noc02, noc01.stg, batcave01 -Purpose - This SOP is to describe nagios configurations - -Configuration -============= - -Fedora Project runs two nagios instances, nagios (noc01) -https://admin.fedoraproject.org/nagios and nagios-external (noc02) -http://admin.fedoraproject.org/nagios-external, you must be in -the 'sysadmin' group to access them. - -Apart from the two production instances, we are currently running a staging -instance for testing-purposes available through SSH at noc01.stg. - -nagios (noc01) - The nagios configuration on noc01 should only monitor general host statistics - ansible status, uptime, apache status (up/down), SSH etc. - - The configurations are found in nagios ansible module: ansible/roles/nagios - -nagios-external (noc02) - The nagios configuration on noc02 is located outside of our main datacenter - and should monitor our user websites/applications (fedoraproject.org, FAS, - PackageDB, Bodhi/Updates). - - The configurations are found in nagios ansible role: roles/nagios - - -.. note:: - Production and staging instances through SSH: - Please make sure you are into 'sysadmin' and 'sysadmin-noc' FAS groups - before trying to access these hosts. - - See SSH Access SOP - -NRPE ----- - -We are currently using NRPE to execute remote Nagios plugins on any host of -our network. - -A great guide about it and its usage mixed up with some nice images about -its structure can be found at: -http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf - -Understanding the Messages -========================== - -General: --------- - -Nagios notifications are generally easy to read, and follow this consistent -format:: - - ** PROBLEM/ACKNOWLEDGEMENT/RECOVERY alert - hostname/Check is WARNING/CRITICAL/OK ** - ** HOST DOWN/UP alert - hostname ** - -Reading the message will provide extra information on what is wrong. - -Disk Space Warning/Critical: ----------------------------- - -Disk space warnings normally include the following information:: - - DISK WARNING/CRITICAL/OK - free space: mountpoint freespace(MB) (freespace(%) inode=freeinodes(%)): - -A message stating "(1% inode=99%)" means that the diskspace is critical not -the inode usage and is a sign that more diskspace is required. - -Further Reading ---------------- - -* Ansible SOP -* Outages SOP diff --git a/docs/sops/netapp.rst b/docs/sops/netapp.rst deleted file mode 100644 index e83c7ef..0000000 --- a/docs/sops/netapp.rst +++ /dev/null @@ -1,143 +0,0 @@ -.. title: Infrastructure Netapp SOP -.. slug: infra-netapp -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -========================= -Netapp Infrastructure SOP -========================= - -Provides primary mirrors and additional storage in PHX2 - -Contents -======== - -1. Contact Information -2. Description -3. Public Mirrors - - 1. Snapshots - -4. PHX NFS Storage - - 1. Access - 2. Snapshots - -5. iscsi - - 1. Updating LVM - 2. Mounting ISCSI - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main, releng - -Location - Phoenix, Tampa Bay, Raleigh - -Servers - batcave01, virt servers, application servers, builders, releng boxes - -Purpose - Provides primary mirrors and additional storage in PHX2 - -Description -=========== - -At present we have three netapps in our infrastructure. One in TPA, RDU -and PHX. For purposes of visualization its easiest to think of us as -having 4 netapps, 1 TPA, 1 RDU and 1 PHX for public mirrors. And an -additional 1 in PHX used for additional storage not related to the public -mirrors. - -Public Mirrors -============== - -The netapps are our primary public mirrors. The canonical location for the -mirrors is currently in PHX. From there it gets synced to RDU and TPA. - -Snapshots ---------- - -Snapshots on the PHX netapp are taken hourly. Unfortunately the way it is -setup only Red Hat employees can access this mirror (this is scheduled to -change when PHX becomes the canonical location but that will take time to -setup and deploy). The snapshots are available, for example, on wallace in:: - - /var/ftp/download.fedora.redhat.com/.snapshot/hourly.0 - -PHX NFS Storage -=============== - -There is a great deal of storage in PHX over NFS from the netapp there. -This storage includes the public mirror. The majority of this storage is -koji however there are a few gig worth of storage that goes to wiki -attachments and other storage needs we have in PHX. - -You can access all of the nfs share shares at:: - - batcave01:/mnt/fedora - -or:: - - ntap-fedora-a.storage.phx2.redhat.com:/vol/fedora/ - -Access --------- - The netapp is provided by RHIS and as a result they also control access. - Access is controlled by IP mostly and some machines have root squashed. - Worst case scenario if batcave01 is not accessible, just bring another box - up under its IP address and use that for an emergency. - -Snapshots ---------- - -There are hourly and nightly snapshots on the netapp. They are available in:: - - batcave01:/mnt/fedora/.snapshot - -iscsi -===== - -We have iscsi deployed in a number of locations in our infrastructure for -xen machines. To get a list of what xen machines are deployed with iscsi, -just run lvs:: - - lvs /dev/xenGuests - -Live migration is possible though not fully supported at this time. Please -shut a xen machine down and bring it up on another host. Memory is the -main issue here. - -Updating LVM -------------- - -iscsi is mounted all over the place and if one xen machine creates a -logical volume the other xen machines will have to pick up those changes. -To do this run:: - - pvscan - vgscan - lvscan - vgchange -a y - -Mounting ISCSI --------------- - -On reboots sometimes the iscsi share is not remounted. This should be -automated in the future but for now run:: - - iscsiadm -m discovery -tst -p ntap-fedora-b.storage.phx2.redhat.com:3260 - sleep 1 - iscsiadm -m node -T iqn.1992-08.com.netapp:sn.118047036 -p 10.5.88.21:3260 -l - sleep 1 - pvscan - vgscan - lvscan - vgchange -a y - diff --git a/docs/sops/new-hosts.rst b/docs/sops/new-hosts.rst deleted file mode 100644 index 66ca64b..0000000 --- a/docs/sops/new-hosts.rst +++ /dev/null @@ -1,296 +0,0 @@ -.. title: Infrastucture DNS Host Addition SOP -.. slug: infra-dns-add -.. date: 2014-05-22 -.. taxonomy: Contributors/Infrastructure - -===================== -DNS Host Addition SOP -===================== - -You should be able to follow these steps in order to create a new set of -hosts in infrastructure. - -Walkthrough -=========== - -Get a DNS repo checkout on batcave01 ------------------------------------- -:: - - git clone /git/dns - cd dns - -An example always helps, so you can use git grep for something that has -been recently added to the data center/network that you want:: - - git grep badges-web01 - built/126.5.10.in-addr.arpa:69 IN PTR badges-web01.stg.phx2.fedoraproject.org. - [...lots of other stuff in built/ ignore these as they'll be generated later...] - master/126.5.10.in-addr.arpa:69 IN PTR badges-web01.stg.phx2.fedoraproject.org. - master/126.5.10.in-addr.arpa:101 IN PTR badges-web01.phx2.fedoraproject.org. - master/126.5.10.in-addr.arpa:102 IN PTR badges-web02.phx2.fedoraproject.org. - master/168.192.in-addr.arpa:109.1 IN PTR badges-web01.vpn.fedoraproject.org - master/168.192.in-addr.arpa:110.1 IN PTR badges-web02.vpn.fedoraproject.org - master/phx2.fedoraproject.org:badges-web01.stg IN A 10.5.126.69 - master/phx2.fedoraproject.org:badges-web01 IN A 10.5.126.101 - master/phx2.fedoraproject.org:badges-web02 IN A 10.5.126.102 - master/vpn.fedoraproject.org:badges-web01 IN A 192.168.1.109 - master/vpn.fedoraproject.org:badges-web02 IN A 192.168.1.110 - -So those are the files we need to edit. In the above example, two of -those files are for the host on the PHX network. The other two are for -the host to be able to talk over the VPN. Although the VPN is not -always needed, the common case is that the host will need it. (If any -clients *need to connect to it via the proxy servers* or it is not -hosted in PHX2 it will need a VPN connection). An common exception is -here the staging environment: since we only have one proxy server in -staging and it is in PHX2, a VPN connection is not typically needed for -staging hosts. - -Edit the zone file for the reverse lookup first (the \*in-addr.arpa file) -and find ips to use. The ips will be listed with a domain name of -"unused." If you're configuring a web application server, you probably -want two hosts for stg and at least two for production. Two in -production means that we don't need downtime for reboots and updates. -Two in stg means that we'll be less likely to encounter problems related -to having multiple web application servers when we take a change tested -in stg into production:: - - -105 IN PTR unused. - -106 IN PTR unused. - -107 IN PTR unused. - -108 IN PTR unused. - +105 IN PTR elections01.stg.phx2.fedoraproject.org. - +106 IN PTR elections02.stg.phx2.fedoraproject.org. - +107 IN PTR elections01.phx2.fedoraproject.org. - +108 IN PTR elections02.phx2.fedoraproject.org. - -Edit the forward domain (phx2.fedoraproject.org in our example) next:: - - elections01.stg IN A 10.5.126.105 - elections02.stg IN A 10.5.126.106 - elections01 IN A 10.5.126.107 - elections02 IN A 10.5.126.108 - -Repeat these two steps if you need to make them available on the VPN. -Note: if your stg hosts are in PHX2, you don't need to configure VPM for -them as all our stg proxy servers are in PHX2. - -Also remember to update the Serial at the top of all zone files. - -Once the files are edited, you need to run a script to build the zones. -But first, commit the changes you just made to the "source":: - - git add . - git commit -a -m 'Added staging and production elections hosts.' - -Once that is committed, you need to run a script to build the zones and -then push them to the dns servers.:: - - ./do-domains # This builds the files - git add . - git commit -a -m 'done build' - git push - - $ sudo -i ansible ns\* -a '/usr/local/bin/update-dns' # This tells the dns servers to load the new files - -Make certs -========== - -WARNING: If you already had a clone of private, make VERY sure to do a -git pull first! It's quite likely somebody else added a new host without -you noticing it, and you cannot merge the keys repos manually. (seriously, -don't: the index and serial files just wouldn't match up with the certificate, -and you would revoke the wrong certificate upon revocation). - - - -When doing 2 factor auth for sudo, the hosts that we connect from need -to have valid SSL Certs. These are currently stored in the private repo:: - - git clone /git/ansible-private && chmod 0700 ansible-private - cd ansible-private/files/2fa-certs - . ./vars - ./build-and-sign-key $FQDN # ex: elections01.stg.phx2.fedoraproject.org - -The $FQDN should be the phx2 domain name if it's in phx2, vpn if not in -phx2, and if it has no vpn and is not in phx2 we should add it to the -vpn.:: - - git add . - git commit -a - git push - - -NOTE: Make sure to re-run vars from the vpn repo. If you forget to do that, -You will just (try to) generate a second pair of 2fa certs, since the -./vars script create an environment var to the root key directory, which -is different. - -Servers that are on the VPN also need certs for that. These are also stored -in the private repo:: - - cd ansible-private/files/vpn/openvpn - . ./vars - ./build-and-sign-key $FQDN # ex: elections01.phx2.fedoraproject.org - ./build-and-sign-key $FQDN # ex: elections02.phx2.fedoraproject.org - -The $FQDN should be the phx2 domain name if it's in phx2, and just -fedoraproject.org if it's not in PHX2 (note that there is never .vpn -in the FQDN in the openvpn keys). Now commit and push.:: - - git add . - git commit -a - git push - - -ansible -======= -:: - - git clone /git/ansible - cd ansible - -To see an example:: - - git grep badges-web01 (example) - find . -name badges-web01\* - find . -name badges-web'\'*' - -inventory ---------- - -The ansible inventory file lists all the hosts that ansible knows about -and also allows you to create sets of hosts that you can refer to via a -group name. For a typical web application server set of hosts we'd -create things like this:: - - [elections] - elections01.phx2.fedoraproject.org - elections02.phx2.fedoraproject.org - - [elections-stg] - elections01.stg.phx2.fedoraproject.org - elections02.stg.phx2.fedoraproject.org - - [... find the staging group and add there: ...] - - [staging] - db-fas01.stg.phx2.fedoraproject.org - elections01.stg.phx2.fedoraproject.org - electionst02.stg.phx2.fedoraproject.org - -The hosts should use their fully qualified domain names here. The rules -are slightly different than for 2fa certs. If the host is in PHX2, use -the .phx2.fedoraproject.org domain name. If they aren't in PHX2, then -they usually just have .fedoraproject.org as their domain name. (If in -doubt about a not-in-PHX2 host, just ask). - - -VPN config ----------- - -If the machine is in VPN, create a file in ansible at -roles/openvpn/server/files/ccd/$FQDN with contents like: - - ifconfig-push 192.168.1.X 192.168.0.X - -Where X is the last octet of the DNS IP address assigned to the host, -so for example for elections01.phx2.fedoraproject.org that would be: - - ifconfig-push 192.168.1.44 192.168.0.44 - - -Work in progress -================ -From here to the end of file is still being worked on - -host_vars and group_vars ------------------------- - -ansible consults files in inventory/group_vars and inventory/host_vars to set parameters that can be used in templates and playbooks. You may need to edit these - -It's usually easy to copy the host_vars and group_vars from an existing host that's similar to the one you are working on and then modify a few names to make it work. For instance, for a web application server:: - - cd ~/ansible/inventory/group_vars - cp badges-web elections - -Change the following:: - - - fas_client_groups: sysadmin-noc,sysadmin-badges - + fas_client_groups: sysadmin-noc,sysadmin-web - -(You can change disk size, mem_size, number of cpus, and ports too if you need them). - -Some things will definitely need to be defined differently for each host in a -group -- notably, ip_address. You should use the ip_address you claimed in -the dns repo:: - - cd ~/ansible/inventory/host_vars - cp badges-web01.stg.phx2.fedoraproject.org elections01.stg.phx2.fedoraproject.org - - -The host will need vmhost declaration. There is a script in -``ansible/scripts/vhost-info`` that will report how much free memory and how many -free cpus each vmhost has. You can use that to inform your decision. -By convention, staging hosts go on virthost12. - -Each vmhost has a different volume group. To figure out what volume group that is, -execute the following command on the virthost.:: - - vgdisplay - -You mant want to run "lsblk" to check that the volume group you expect is the one -actually used for virtual guests. - - -.. note:: - | 19:16:01 3. add ./inventory/host_vars/FQDN host_vars for the new host. - | 19:16:56 that will have in it ip addresses, dns resolv.conf, ks url/repo, volume group to make the host lv in, etc etc. - | 19:17:10 4. add any needed vars to inventory/group_vars/ for the group - | 19:17:33 this has memory size, lvm size, cpus, etc - | 19:17:45 5. add tasks/virt_instance_create.yml task to top of group/host playbook - | 19:18:10 6. run the playbook and it will go to the virthost you set, create the lv, guest, install it, wait for it to come up, then continue configuring it. - -mailman.yml - copy it from another file. - -:: - - ./ans-vhost-freemem --hosts=virtost\* - - -group vars - -- vmhost (of the host that will host the VM) -- kickstart info (url of the kickstart itself and the repo) -- datacenter (although most likely won't change) - -The host playbook is rather basic - -- Change the name -- Most things won't change much - -:: - - ansible-playbook /srv/web/infra/ansible/infra/ansible/playbooks/grous/mailman.yml - -Adding a new proxy or webserver -=============================== - -When adding a new web server other files must be edited by hand -currently until templates replace them. These files cover getting httpd -logs from the server onto log01 so that log analysis can be done. - - roles/base/files/syncHttpLogs.sh - roles/epylog/files/merged/modules.d/rsyncd.conf - roles/hosts/files/staging-hosts - roles/mediawiki123/templates/LocalSettings.php.fp.j2 - -There are also nagios files which will need to be edited but that should -be done following the nagios document. - -References -========== - -* The making a new instance section of: http://meetbot.fedoraproject.org/meetbot/fedora-meeting-1/2013-07-17/infrastructure-ansible-meetup.2013-07-17-19.00.html diff --git a/docs/sops/nonhumanaccounts.rst b/docs/sops/nonhumanaccounts.rst deleted file mode 100644 index 338ead3..0000000 --- a/docs/sops/nonhumanaccounts.rst +++ /dev/null @@ -1,156 +0,0 @@ -.. title: Non-human Accounts Infrastructure SOP -.. slug: infra-nonhuman-accounts -.. date: 2015-03-23 -.. taxonomy: Contributors/Infrastructure - -===================================== -Non-human Accounts Infrastructure SOP -===================================== - -We have many non-human accounts for various services, used by our web -applications and certain automated scripts. - -Contents -======== - -1. Contact Information -2. FAS Accounts -3. Bugzilla Accounts -4. PackageDB Owners -5. Koji Accounts - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin -Persons: - sysadmin-main -Purpose: - Provide Non-human accounts to our various services - -FAS Accounts -============ - -A FAS account should be created when a script or application needs... - -* to query FAS information -* filesystem privileges associated with a group in FAS -* bugzilla privileges associated with the "fedorabugs" group. - -Be sure to check if Infrastructure already has a general-purpose account -that can be used before creating a new one. - -Creating a FAS account ----------------------- - -1. Go through the normal user creation process at - [50]https://admin.fedoraproject.org/accounts/ - - 1. Set the name to: (naming convention here) - 2. Set the email to the contact email for the account (this may need - to be done manually if the contact email is an @fedoraproject.org - address) - -2. Have a FAS admin set the account status to "bot" and set its UID below - 10000. Make sure to check that this does not break any group - references or file ownerships first. - - * On db-fas01, using ``$ sudo -u postgres psql fas2`` - - - Set it to a bot account so its not inactivated:: - - => UPDATE people SET status='bot' WHERE username='username'; - - - delete references to the current uid:: - - => delete from visit_identity where user_id in (select id from - people where username = 'username'); - - - Find the last used id in the range we use for bots:: - - => select id, username from people where id < 10000 order by id; - - - Set the account to use the newid This should be one more than - the largest id returned by the previous query:: - - => UPDATE people SET id=NEWID WHERE username='username'; - -3. Get the account into any necessary groups for permissions that it may - need. Common ones include: - - * Wiki editing: cla_done - * Access to SSH keys for third party users: thirdparty - * Access to SSH keys and password hashes for _internal_ fasClient - runs: fas-systems - -4. Document this account at: - https://fedoraproject.org/wiki/PackageDB_admin_requests#Pseudo-users_and_Groups_for_SIGs - - -Alternative ------------ - -This can also be achieve using SQL statements directly: - - - Find the last used id in the range we use for bots:: - - => select id, username from people where id < 10000 order by id; - - - Insert the new user:: - - => insert into people (id,username,human_name,password,email,status) - values (id, 'name','small description', 'something', - 'contact email', 'bot'); - - - Find your own user id:: - - => select id, username from people where username='your username'; - - - Find the id of the most used groups:: - - => select id, name from groups where name - in ('cla_done', 'packager', 'fedorabugs'); - - - Add the groups required:: - - => insert into person_roles(person_id, group_id, role_type, sponsor_id) - values (new_user_id, group_id, 'user', your_own_user_id); - -The final steps remains the same though: document this account at: - https://fedoraproject.org/wiki/PackageDB_admin_requests#Pseudo-users_and_Groups_for_SIGs - - -Bugzilla Accounts -================= - -A Bugzilla account should be created when a script or application needs... - -* to query or file Fedora bugs automatically - -Please make sure to coordinate with the QA and Bug Triaging teams if the -script or application involves making mass changes to bugs. - -If a bugzilla account needs "fedorabugs" permissions, follow the above -steps for a FAS Account first, then follow these instructions with the -email address you entered above. If the bugzilla account will not need -"fedorabugs" permissions but will still require an @fedoraproject.org -email, create an alias for that account first. - -1. Create a bugzilla account as normal at - [51]https://bugzilla.redhat.com/, using proper contact email for the - account. -2. Document this account at (insert location here) - -PackageDB Owners -================ - -Tie together FAS account and Bugzilla account info here - -Koji Accounts -============= - -TODO - diff --git a/docs/sops/nuancier.rst b/docs/sops/nuancier.rst deleted file mode 100644 index 7173eba..0000000 --- a/docs/sops/nuancier.rst +++ /dev/null @@ -1,159 +0,0 @@ -.. title: Nuancier SOP -.. slug: infra-nuancier -.. date: 2016-03-11 -.. taxonomy: Contributors/Infrastructure - -============= -Nuancier SOP -============= - -Nuancier is the web application used by the design team and the community to -submit and vote on the supplemental wallpapers provided with each version of -Fedora. - -Contents -======== - -1. Contact Information -2. Documentation Links - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - https://apps.fedoraproject.org/nuancier -Servers - nuancier01, nuancier02, nuancier01.stg, nuancier02.stg -Purpose - Provide a system to submit and vote on supplemental wallpapers - - -Create a new election -===================== - -* Login - -* Go to the `Admin` panel via the menu at the top - -* Click on `Create a new election`. - -* Complete the form: - - Election name - A short name used in all the pages, most often since we have one election - per release it has been of the form `Fedora XX` - - Name of the folder containing the pictures - This just links the election with the folder where the images will be - uploaded on disk. Keep it simple, safe, something like `fXX` will do. - - Year - The year when the election will be happening, this will just give some quick - sorting option - - Submission start date (in UTC) - The date from which the people will be able to submit wallpapers for the - election. The submission starts on the exact day at midnight UTC. - - Start date (in UTC) - The date when the election starts (and thus the submissions end). There is - no buffer between when the submissions end and when the votes start which - means admins have to keep up with the submissions as they are done. - - End date (in UTC) - The date when the election ends. There are no embargo on the results, they - are available right after the election ends. - - URL to claim a badge for voting - The URL at which someone can claim a badge. This URL is displayed on the - voting page as well as ones people have voted. Which means that having the - badge does not ensure people voted, at max it ensures people visited - nuancier during a voting phase. - - Number of votes an user can make - The number of wallpapers an user can choose/vote on. This was made as they - was a debate in the design team if having everyone vote on all 16 wallpapers - was a good idea or not. - - Number of candidate an user can upload - Restricts the number of wallpapers an user can submit for an election to - prevent people from uploading tens of wallpapers in one election. - -Review an election -================== - -Admins must do that regularly during a submission phase to avoid candidates from -piling up. - -* Login - -* Go to the `Admin` panel via the menu at the top - -* Find the election of interest in the list and click on `Review` - -If the images are not showing, you can generate the thumbnails using the button -`(Re-)generate cache`. - -On the review page, you will be able to filter the candidates by `Approved`, -`Pending`, `Rejected` or see them `All` (default). - -You can then check the images one by one, select their checkbox and then either -`Approve` or `Deny` all the ones you selected. - -.. note:: Rejections must be motivated in the `Reason for rejection / Comments` - input field. This motivation is then sent by email to the user - explaining why a wallpaper they submitted was not accepted into the - election. - - -Vote on an election -=================== - -Once an election is opened, a link announcing it will be available from the front -page and in the page listing the elections (`Elections` tab in the menu) a green -check-mark will appear on the `Votes` column while a red forbidden sign will -appear on the `Submissions` column. - -You can then click on the election name which will take you on the voting page. - -There, enlarge the image by clicking on them and make your choice by clicking on -the bottom right corner of the image. - -On the column on the right the total number of vote available will appear. -If you need to change remove a wallpaper from your selection, simply click on it -in the right column. - -As long as you have not picked the maximum number of candidates allowed, you can -cast your vote multiple times (but not on the same candidates of course). - - -View all the candidates of an election -====================================== - -All the candidates of an election are only accessible once the election is over. -If you wish to see all the images uploaded, simply go to the `Elections` tab and -click on the election name. - - -View the results of an election -=============================== - -The results of an election are accessible immediately after the end of it. -To see them, simply click the `Results` tab in the menu. - -There you can click on the name of the election to see the wallpaper ordered by -their number of votes or on `stats` to view some stats about the election (such -as the number of participants, the number of voters, votes or the evolution of -the votes over time). - - -Miscellaneous -============= - -Nuancier uses a gluster volume shared between the two hosts (in prod and in stg) -where are stored the images, making sure they are available to both frontends. -This may make things a little trickier sometime, be aware of it. diff --git a/docs/sops/openvpn.rst b/docs/sops/openvpn.rst deleted file mode 100644 index a16d523..0000000 --- a/docs/sops/openvpn.rst +++ /dev/null @@ -1,137 +0,0 @@ -.. title: OpenVPN SOP -.. slug: infra-openvpn -.. date: 2011-12-16 -.. taxonomy: Contributors/Infrastructure - -=========== -OpenVPN SOP -=========== - -OpenVPN is our server->server VPN solution. It is deployed in a routeless -manner and uses ansible managed keys for authentication. All hosts should -be given static IP's and a hostname.vpn.fedoraproject.org DNS address. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main - -Location - Phoenix - -Servers - bastion (vpn.fedoraproject.org) - -Purpose - Provides vpn solution for our infrastructure. - -Add a new host -=============== - -Create/sign the keys --------------------- -From batcave01 check out the private repo:: - - # This is to ensure that the clone is not world-readable at any point. - RESTORE_UMASK=$(umask -p) - umask 0077 - git clone /git/private - $RESTORE_UMASK - cd private/vpn/openvpn - -Next prepare your environment and run the build-key script. This example -is for host "proxy4.fedora.phx.redhat.com":: - - . ./vars - ./build-key $FQDN # ./revoke-full $FQDN to revoke keys that are no longer used. - git add . - git commit -a - git push - -Create Static IP ----------------- - -Giving static IP's out in openvpn is mostly painless. Take a look at other -examples but each host gets a file and 2 IP's.:: - - git clone /git/ansible - vi ansible/roles/openvpn/server/files/ccd/$FQDN - -The file format should look like this:: - - ifconfig-push 192.168.1.314 192.168.0.314 - -Basically the first IP is the IP that is contactable over the vpn and -should always take the format "192.168.1.x" and the PtPIP is the same ip -on a different network: "192.168.0.x" - -Commit and install:: - - git add . - git commit -m "What have you done?" - git push - -And then push that out to bastion:: - - sudo -i ansible-playbook $(pwd)/playbooks/groups/bastion.yml -t openvpn - -Create DNS entry ----------------- - -After you have your static IP ready, just add the entry to DNS:: - - git clone /git/dns && cd dns - vi master/168.192.in-addr.arpa - # pick out an ip that's unused - vi master/vpn.fedoraproject.org - git commit -m "What have you done?" - ./do-domains - git commit -m "done build." - git push - -And push that out to the name servers with:: - - sudo -i ansible ns\* -a "/usr/local/bin/update-dns" - -Update resolv.conf on the client --------------------------------- -To make sure traffic actually goes over the VPN, make sure the search line -in /etc/resolv.conf looks like:: - - search vpn.fedoraproject.org fedoraproject.org - -for external hosts and:: - - search phx2.fedoraproject.org vpn.fedoraproject.org fedoraproject.org - -for PHX2 hosts. - -Remove a host -============= -:: - # This is to ensure that the clone is not world-readable at any point. - RESTORE_UMASK=$(umask -p) - umask 0077 - git clone /git/private - $RESTORE_UMASK - cd private/vpn/openvpn - -Next prepare your environment and run the build-key script. This example -is for host "proxy4.fedora.phx.redhat.com":: - - . ./vars - ./revoke-full $FQDN - git add . - git commit -a - git push - - -TODO -==== -Deploy an additional VPN server outside of PHX. OpenVPN does support -failover automatically so if configured properly, when the primary VPN -server goes down all hosts should connect to the next host in the list. diff --git a/docs/sops/orientation.rst b/docs/sops/orientation.rst deleted file mode 100644 index 09081ec..0000000 --- a/docs/sops/orientation.rst +++ /dev/null @@ -1,170 +0,0 @@ -.. title: Infrastucture Orientation SOP -.. slug: infra-orientation -.. date: 2016-10-20 -.. taxonomy: Contributors/Infrastructure - -============================== -Orientation Infrastructure SOP -============================== - -Basic orientation and introduction to the sysadmin group. Welcome aboard! - -Contents -======== - -1. Contact Information -2. Description -3. Welcome to the team - - 1. Time commitment - 2. Prove Yourself - -4. Doing Work - - 1. Ansible - -5. Our Setup -6. Our Rules - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Purpose - Provide basic orientation and introduction to the sysadmin group - -Description -=========== - -Fedora's Infrastructure team is charged with keeping all the lights on, -improving pain points, expanding services, designing new services and -partnering with other teams to help with their needs. The team is highly -dynamic and primarily based in the US. This is only significant in that -most of us work during the day in US time. We do have team members all -over the globe though and generally have decent coverage. If you happen to -be one of those who is not in a traditional US time zone you are -encouraged to be around, especially in #fedora-admin during those times -when we have less coverage. Even if it is just to say "I can't help with -that but $ADMIN will be and he should be here in about 3 hours". - -The team itself is generally friendly and honest. Don't be afraid to -disagree with someone, even if you're new and they're an old timer. Just -make sure you ask yourself what is important to you and make sure to -provide data, we like that. We generally communicate on irc.freenode.net -in #fedora-admin. We have our weekly meetings on IRC and its the quickest -way to get in touch with everyone. Secondary to that we use the mailing -list. After that its our ticketing system and talk.fedoraproject.org. - -*Welcome to the team!* - -Time commitment -=============== - -Often times this is the biggest reason for turnover in our group. Some -groups like sysadmin-web and certainly sysadmin-main require a huge time -commitment. Don't be surprised if you see people working between 10-30 -hours a week on various tasks and that's the volunteers. Your time -commitment is something personal to each individual and its something that -you should take some serious thought about. In general it's almost -impossible to be a regular part of the team without at least 5-10 hours a -week dedicated to the Infrastructure team. - -Also note, if you are going to be away, let us know. As a volunteer we -can't possibly ask you to always be around all the time. Even if you're in -the middle of a project and have to stop, let us know. Nothing is worse -then thinking someone is working on something or will be around and -they're just not. Really, we all understand, got a test coming up? Busier -at work then normal? Going on a vacation? It doesn't matter, just let us -know when you're going to be gone and what you're working on so it doesn't -get forgotten. - -Additionally don't forget that its worth it to discuss with your employer -about giving time during work. They may be all for it. - -Prove Yourself -============== - -This is one of the most difficult aspects of getting involved with our -team. We can't just give access to everyone who asks for it and often -actually doing work without access is difficult. Some of the best things -you can do are: - -* Keep bugging people for work. It shows you're committed. -* Go through bugs, look at stale bugs and close bugs that have been fixed -* Try to duplicate bugs on your workstation and fix them there - -Above all stick with it. Part of proving yourself is also to show the time -commitment it actually does take. - -Doing Work -========== -Once you've been sponsored for a team its generally your job to find what -work needs to be done in the ticketing system. Be proactive about this. -The tickets can be found at: - -https://pagure.io/fedora-infrastructure/issues - -When you find a ticket that interests you contact your sponsor or the -ticket owner and offer help. While you're getting used to the way things -work, don't be offput by someone saying no or you can't work on that. It -happens, sometimes its a security thing, sometimes its a "I'm half way -through it and I'm not happy with where it is thing." Just move on to the -next ticket and go from there. - -Also don't be surprised if some of the work involved includes testing on -your own workstation. Just setup a virtual environment and get to work! -There's a lot of work that can be done to prove yourself that involves no -access at all. Doing this kind of work is a sure fire way to get in to -more groups and get more involved. Don't be afraid to take on tasks you -don't already know how to do. But don't take on something you know you -won't be able to do. Ask for help when you need it and keep in contact -with your sponsor so you know - -Ansible -======= - -Things we do gets done in Ansible. It is important that you not make changes directly on -servers. This is for many reasons but just always make changes in -Ansible. If you want to get more familiar with Ansible, set it -up yourself and give it a try. The docs are available at -https://docs.ansible.com/ - -Our Setup -========= - -Most of our work is done via bastion.fedoraproject.org. That host has -access to our other hosts, many of which are all over the globe. We have a -vpn solution setup so that knowing where the servers physically are is -only important when troubleshooting things. When you first get granted -access to one of the sysadmin-* groups, the first place you should turn is -bastion.fedoraproject.org then from there ssh to batcave01. - -We also have an architecture repo available in our git repo. To get a copy -of this repo just:: - - dnf install git - git clone https://pagure.io/fedora-infrastructure.git - -This will allow you to look through (and help fix) some of our scripts as -well as have access to our architectural documentation. Become familiar -with those docs if you're curious. There's always room to do better -documentation so if you're interested just ping your sponsor and ask about -it. - -Our Rules -========= -The Fedora Infrastructure Team does have some rules. First is the security -policy. Please ensure you are compliant with: - -https://infrastructure.fedoraproject.org/csi/security-policy/ - -before logging in to any of our servers. Many of those items rely on the -honor system. - -Additionally note that any of the software we deploy must be available in -Fedora. There are some rare exceptions to this (particularly as it relates -to specific applications to Fedora). But each exception is taken on a case -by case basis. diff --git a/docs/sops/outage.rst b/docs/sops/outage.rst deleted file mode 100644 index bdccf2c..0000000 --- a/docs/sops/outage.rst +++ /dev/null @@ -1,282 +0,0 @@ -.. title: Outage Infrastructure SOP -.. slug: infra-outage -.. date: 2015-04-23 -.. taxonomy: Contributors/Infrastructure - -========================= -Outage Infrastructure SOP -========================= - -What to do when there's an outage or when you're planning to take an -outage. - -Contents -======== - -1. Contact Information -2. Users (No Access) - - 1. Planned Outage - - 1. Contacts - - 2. Unplanned Outage - - 1. Check first - 2. Reporting or participating in an outage - -5. Infrastructure Members (Admin Access) - - 1. Planned Outage - - 1. Planning - 2. Preparations - 3. Outage - 4. Post outage cleanup - - 2. Unplanned Outage - - 1. Determine Severity - 2. First Steps - 3. Fix it - 4. Escalate - 5. The Resolution - 6. The Aftermath - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main group -Location - Anywhere -Servers - Any -Purpose - This SOP is generic for any outage -Emergency: - https://admin.fedoraproject.org/pager - - -Users (No Access) -================= - -.. note:: - Don't have shell access? Doesn't matter. Stop by and stay in #fedora-admin - if you have any expertise in what is going on, please assist. Random users - have helped the team out countless numbers of times. Any time the team - doesn't have to go to the docs to look up an answer is a time they can be - spending fixing what's busted. - -Planned Outage --------------- - -If a planned outage comes at a terrible time, just let someone know. The -Infrastructure Team does its best to keep outages out of the way but if -there's a mass rebuild going on that we don't know about and we schedule a -koji outage, let someone know. - -Contacts -```````` - -Pretty much all coordination occurs in #fedora-admin on irc.freenode.net. -Stop by there to watch more about what's going on. Just stay on topic. - -Unplanned Outage ----------------- - -Check first -``````````` - -Think something is busted? Please check with others to see if they are -also having issues. This could even include checking on another computer. -When reporting an outage remember that the admins will typically drop -everything they are doing to check what the problem is. They won't be -happy to find out your cert has expired or you're using the wrong -username. Additionally, check the status dashboard -(http://status.fedoraproject.org) to verify that there is no previously -reported outage that may be causing and/or related to your issue. - -Reporting or participating in an outage -``````````````````````````````````````` - -If you think you've found an outage, get as much information as you can -about it at a glance. Copy any errors you get to http://pastebin.ca/. -Use the following guidelines: - -Don't be general. - * BAD: "The wiki is acting slow" - * Good: "Whenever I try to save https://fedoraproject.org/wiki/Infrastructure, - I get a proxy error after 60 seconds" - -Don't report an outage that's already been reported. - * BAD: "/join #fedora-admin; Is the build system broken?" - * Good: "/join #fedora-admin; wait a minute or two; I noticed I - can't submit builds, here's the error I get:" - -Don't suggest drastic or needless changes during an outage (send it to the list) - * "Why don't you just use lighttpd?" - * "You could try limiting MaxRequestsPerChild in Apache" - -Don't get off topic or be too chatty - * "Transformers was awesome, but yeah, I think you guys know what to do next" - -Do research the technologies we're using and answer questions that may come up. - * BAD: "Can't you just fix it?" - * Good: "Hey guys, I think this is what you're looking for: - http://httpd.apache.org/docs/2.2/mod/mod_mime.html#addencoding" - -If no one can be contacted after 10 minutes or so please see the section -below called Determine Severity to determine whether or not someone -should get paged. - - -Infrastructure Members (Admin Access) -===================================== - -The Infrastructure Members section is specifically written for members -with access to the servers. This could be admin access to a box or even a -specific web application. Basically anyone with access to fix the problem. - -Planned Outage --------------- - -Any outage that is intentionally caused by a team member is a planned -outage. Even if it has to happen in the next 5 minutes. - -Planning -````````` - -All major planned outages should occur with at least 1 week notice. This -is not always possible, use best judgment. Please use our standard outage -template at: https://fedoraproject.org/wiki/Infrastructure/OutageTemplate. -Make sure to have another person review your template/announcement to -check times and services affected. Make sure to send the announcement to -the lists that are affected by the outage: announce, devel-announce, etc. - -Always create a ticket in the ticketing system: -https://fedoraproject.org/wiki/Infrastructure/Tickets -Send an email to the fedora-infrastructure-list with more details if -warranted. - -Remember to follow an existing SOP as much as possible. If anything is -missing from the SOP please add it. - -Preparations -````````````` - -Remember to schedule an outage in nagios. This is important not just so -notifications don't get sent but also important for trending and -reporting. https://admin.fedoraproject.org/nagios/ - -Outage -`````` - -Prior to beginning an outage to any monitored service on -http://status.fedoraproject.org please push an update to reflect the outage -(see status-fedora SOP). - -Report all information in #fedora-admin. Coordination is extremely -important, it's rare for our group to meet in person and IRC is our only -real-time communication device. If a web site is out please put up some -sort of outage page in its place. - -Post outage cleanup -```````````````````` - -Once the outage is over ensure that all services are up and running. -Ensure all nagios services are back to green. Notify everyone in -#fedora-admin to scan our services for issues. Once all services are -cleared update the status.fp.o dashboard. If the outage included a -new feature or major change for a group, please notify that group that the -change is ready. Make sure to close the ticket for the outage when it's -over. - -Once the services are restored, an update to the status dashboard should be -pushed to show the services are restored. - -.. important:: - Additionally update any SOP's that may have changed in the course of the - outage - -Unplanned Outage ----------------- -Unplanned outages happen, stay cool. As a team member never be afraid to -do something because you think you'll get in trouble over it. Be smart, -don't be reckless, and never say "I shouldn't do this". If an unorthodox -method or drastic change will fix the problem, do it, document it, and let -the team know. Messes can always be cleaned up after the outage. - -Determine Severity -`````````````````` - -Some outages require immediate fixing, some don't. A page should never go -out because someone can't sign the cla. Most of our admins are in US time, -use your best judgment. If it's bad enough to warrant an emergency page, -page one of the admins at: https://admin.fedoraproject.org/pager - -Use the following as loose guidelines, just use your best judgment. - -* BAD: "I can't see the Recent Changes on the wiki." -* Good: "The entire wiki is not viewable" - -* BAD: I cannot sign the CLA -* Good: I can't change my password in the account system, - I have admin access and my laptop was just stolen - -* BAD: I can't access awstats for fedoraproject.org -* Good: The mirrors list is down. - -* BAD: I think someone misspelled some words on the webpage -* Good: The web page has been hacked and I think someone - notified slashdot. - -First Steps -``````````` - -After an outage has been verified, acknowledge the outage in nagios: -https://admin.fedoraproject.org/nagios/, update the related system on the -status dashboard (see the status-fedora SOP) and verify changes at -http://status.fedoraproject.org, then head in to #fedora-admin -to figure out who is around and coordinate the next course of action. -Consult any relevent SOP's for corrective actions. - -Fix it -``````` -Fix it, Fix it, Fix it! Do whatever needs to be done to fix the problem, -just don't be stupid about it. - -Escalate -````````` -Can't fix it? Don't wait, Escalate! All of the team members have expertise -with some areas of our environment and weaknesses in other areas. Never be -afraid to tap another team member. Sometimes it's required, sometimes it's -not. The last layer of defense is to page someone. At present our team is -small enough that a full escalation path wouldn't do much good. Consult -the contact information on each SOP for more information. - -The Resolution -``````````````` -Once the services are restored, an update to the status dashboard should be -pushed to show the services are restored. - -The Aftermath -`````````````` -With any outage there will be questions. Please try as hard as possible to -answer the following questions and send them to the -fedora-infrastructure-list. - -1. What happened? -2. What was affected? -3. How long was the outage? -4. What was the root cause? - -.. important:: Number 4 is especially important. If a kernel build keeps failing because - of issues with koji caused by a database failure caused by a full - filesystem on db1. Don't say koji died because of a db failure. Any time a - root cause is discovered and not being monitored by nagios, add it if - possible. Most failures can be prevented or mitigated with proper - monitoring. - diff --git a/docs/sops/packagedatabase.rst b/docs/sops/packagedatabase.rst deleted file mode 100644 index 907f78a..0000000 --- a/docs/sops/packagedatabase.rst +++ /dev/null @@ -1,323 +0,0 @@ -.. title: Package Database Infrastucture SOP -.. slug: infra-packagedb -.. date: 2013-04-30 -.. taxonomy: Contributors/Infrastructure - -=================================== -Package Database Infrastructure SOP -=================================== - - -The PackageDB is used by Fedora developers to manage package ownership and -acls. It controls who is allowed to commit to a package and who gets -notification of changes to packages. - -PackageDB project Trac: [45]https://fedorahosted.org/packagedb/ - -Contents -i======= - -1. Contact Information -2. Troubleshooting and Resolution -3. Common Actions - - 1. Adding a new Pseudo User as a package owner - 2. Renaming a package - 3. Removing a package - 4. Add a new release - 5. Update App DB for a release going final - 6. Orphaning all the packages for a user - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin - -Persons - abadger1999 - -Location - Phoenix - -Servers - admin.fedoraproject.org Click on one of the [55]current haproxy - servers to see the physical servers - -Purpose - Manage package ownership - -Troubleshooting and Resolution -============================== - -Common Actions -============== - -Adding a new Pseudo User as a package owner -------------------------------------------- - -Sometimes you want to have a mailing list own a package so that bugzilla -email is assigned to the mailing list. Doing this requires adding a new -pseudo user to the account system and assigning that person as the package -maintainer. - -.. warning:: pseudo users often have a dash in their name. We create email - aliases via ansible that have dashes in their name in order to not - collide with fas usernames (users cannot create usernames with dashes - via the webui). Make sure that any pseudo-users you create do not - clash with existing email aliases. - -In the following examples, replace ("xen", "kernel-xen-2.6") with the -packages you are assigning to the new user and 9902 to the userid you -select in step 2 - -* Log into fas-db01. -* Log into the db as a user that can make changes:: - - $ psql -U postgres fas2 - fas2> - - * Find the current pseudo-users:: - - fas2> select id, username from people where id < 10000 order by id; - id | username - ------+------------------ - 9900 | orphan - 9901 | anaconda-maint - - * Create a new account with the next available id after 9900:: - - fas2> insert into people (id, username, human_name, password, email) - values (9902, 'xen-maint', 'Xen Maintainers', '*', 'xen-maint@redhat.com'); - -* Connect to the pkgdb as a user that can make changes:: - - $ psql -U pkgdbadmin -h db01 pkgdb - pkgdb> - -* Add the current package owner as a comaintainer of the package. If - this user is not currently on he acls for the package you can use the - following database queries:: - - insert into personpackagelisting (username, packagelistingid) - select pl.owner, pl.id from packagelisting as pl, package as p - where p.id = pl.packageid and p.name in ('xen', 'kernel-xen-2.6'); - insert into personpackagelistingacl (personpackagelistingid, acl, statuscode) - select ppl.id, 'build', 3 from personpackagelisting as ppl, packagelisting as pl, package as p - where p.id = pl.packageid and pl.id = ppl.packagelistingid and pl.owner = ppl.username - and p.name in ('xen', 'kernel-xen-2.6'); - insert into personpackagelistingacl (personpackagelistingid, acl, statuscode) - select ppl.id, 'commit', 3 from personpackagelisting as ppl, packagelisting as pl, package as p - where p.id = pl.packageid and pl.id = ppl.packagelistingid - and pl.owner = ppl.username - and p.name in ('xen', 'kernel-xen-2.6'); - insert into personpackagelistingacl (personpackagelistingid, acl, statuscode) - select ppl.id, 'approveacls', 3 from personpackagelisting as ppl, packagelisting as pl, package as p - where p.id = pl.packageid and pl.id = ppl.packagelistingid - and pl.owner = ppl.username - and p.name in ('xen', 'kernel-xen-2.6'); - - - If the owner is in the acls, you will need to figure out which packages - already acls and only add the new acls for that one. - -* Reassign the pseudo-user to be the new owner:: - - update packagelisting set owner = 'xen-maint' from package as p - where packagelisting.packageid = p.id and p.name in ('xen', 'kernel-xen-2.6'); - -Renaming a package -------------------- - -On db2:: - - sudo -u postgres psql pkgdb - select * from package where name = 'OLDNAME'; - [Make sure only the package you want is selected] - update package set name = 'NEWNAME' where name = 'OLDNAME'; - -On cvs-int:: - - CVSROOT=/cvs/pkgs cvs co CVSROOT - sed -i 's/OLDNAME/NEWNAME/g' CVSROOT/modules - cvs commit -m 'Rename OLDNAME => NEWNAME' - cd /cvs/pkgs/rpms - mv OLDNAME NEWNAME - cd NEWNAME - find . -name 'Makefile,v' -exec sed -i 's/NAME := OLDNAME/NAME := NEWNAME/' \{\} \; - cd ../../devel - rm OLDNAME - ln -s ../rpms/NEWNAME/devel . - -If the package has existed long enough to have been added to koji, run -something like the following to "retire" the old name in koji.:: - - koji block-pkg dist-f12 OLDNAME - -Removing a package -================== - -.. warning:: - Do not remove a package if it has been built for a fedora release or if - you are not also willing to remove the cvs directory. - -When a package has been added due to a typo, it can be removed in one of -two ways: marking it as a mistake with the "removed" status or deleting it -from the db entirely. Marking it as removed is easier and is explained -below. - -On db2:: - - sudo -u postgres psql pkgdb - pkgdb=# select id, name, summary, statuscode from package where name = 'b'; - id | name | summary | statuscode - ------+------+--------------------------------------------------+----------- - 6618 | b | A simple database interface to MS-SQL for Python | 3 - (rows 1) - -- Make sure there is only one package returned and it is the correct one. -- Statuscode 3 is "approved" and it's what we're changing from -- You'll also need the id for later:: - - pkgdb=# BEGIN; - pkgdb=# update package set statuscode = 17 where name = 'b'; - UPDATE 1 - -- Make sure only a single package was changed.:: - - pkgdb=# COMMIT; - - pkgdb=# select id, packageid, collectionid, owner, statuscode from packagelisting where packageid = 6618; - id | packageid | collectionid | owner | statuscode - -------+-----------+--------------+--------+----------- - 42552 | 6618 | 19 | 101437 | 3 - 38845 | 6618 | 15 | 101437 | 3 - 38846 | 6618 | 14 | 101437 | 3 - 38844 | 6618 | 8 | 101437 | 3 - (rows 4) - -- Make sure the output here looks correct (packageid is all the same, etc). -- You'll also need the ids for later:: - - pkgdb=# BEGIN; - pkgdb=# update packagelisting set statuscode = 17 where packageid = 6618; - UPDATE 4 - -- Make sure the same number of rows were committed as you saw before. - pkgdb=# COMMIT; - - pkgdb=# select * from personpackagelisting where packagelistingid in (38844, 38846, 38845, 42552); - id | userid | packagelistingid. - ----+--------+------------------ - (0 rows) - -- In this case there are no comaintainers so we don't have to do anymore. If - there were we'd have to treat them like groups handled next:: - - pkgdb=# select * from grouppackagelisting where packagelistingid in (38844, 38846, 38845, 42552); - id | groupid | packagelistingid. - -------+---------+------------------ - 39229 | 100300 | 38844 - 39230 | 107427 | 38844 - 39231 | 100300 | 38845 - 39232 | 107427 | 38845 - 39233 | 100300 | 38846 - 39234 | 107427 | 38846 - 84481 | 107427 | 42552 - 84482 | 100300 | 42552 - (8 rows) - - pkgdb=# select * from grouppackagelistingacl where grouppackagelistingid in (39229, 39230, 39231, 39232, 39233, 39234, 84481, 84482); - -- The results of this are usually pretty long. so I've omitted everything but the rows - (24 rows) -- For groups it's typically 3 (one for each of commit, build, and checkout) * -- number of grouppackagelistings. In this case, that's 24 so this matches our expectations.:: - - pkgdb=# BEGIN; - pkgdb=# update grouppackagelistingacl set statuscode = 13 where grouppackagelistingid in (39229, 39230, 39231, 39232, 39233, 39234, 84481, 84482); - -- Make sure only the number of rows you saw before were updated:: - - pkgdb=# COMMIT; - - If the package has existed long enough to have been added to koji, run - something like the following to "retire" it in koji.:: - - koji block-pkg dist-f12 PKGNAME - -Add a new release -================= - -To add a new Fedora Release, ssh to db02 and do this:: - - sudo -u postgres psql pkgdb - -- This adds the release for Package ACLs:: - - insert into collection (name, version, statuscode, owner, koji_name) values('Fedora', '13', 1, 'jkeating', 'dist-f13'); - insert into branch select id, 'f13', '.fc13', Null, 'f13' from collection where name = 'Fedora' and version = '13'; - -- If this is for mass branching we probably need to advance the branch information for devel as well.:: - - update branch set disttag = '.fc14' where collectionid = 8; - -- This adds the new release's repos for the App DB:: - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386', 'Fedora 13 - i386', 'development/13/i386/os', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386-d', 'Fedora 13 - i386 - Debug', 'development/13/i386/debug', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386-tu', 'Fedora 13 - i386 - Test Updates', 'updates/testing/13/i386/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386-tud', 'Fedora 13 - i386 - Test Updates Debug', 'updates/testing/13/i386/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64', 'Fedora 13 - x86_64', 'development/13/x86_64/os', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64-d', 'Fedora 13 - x86_64 - Debug', 'development/13/x86_64/debug', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64-tu', 'Fedora 13 - x86_64 - Test Updates', 'updates/testing/13/x86_64/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64-tud', 'Fedora 13 - x86_64 - Test Updates Debug', 'updates/testing/13/x86_64/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; - -Update App DB for a release going final -======================================= - -When a Fedora release goes final, the repositories for it change where -they live. The repo definitions allow the App browser to sync information -from the yum repositories. The PackageDB needs to be updated for the new -areas:: - - BEGIN; - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-i386-u', 'Fedora 14 - i386 - Updates', 'updates/14/i386/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-i386-ud', 'Fedora 14 - i386 - Updates Debug', 'updates/14/i386/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; - update repos set url='releases/14/Everything/i386/os/' where shortname = 'F-14-i386'; - update repos set url='releases/14/Everything/i386/debug/' where shortname = 'F-14-i386-d'; - - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-x86_64-u', 'Fedora 14 - x86_64 - Updates', 'updates/14/x86_64/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; - insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-x86_64-ud', 'Fedora 14 - x86_64 - Updates Debug', 'updates/14/x86_64/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; - update repos set url='releases/14/Everything/x86_64/os/' where shortname = 'F-14-x86_64'; - update repos set url='releases/14/Everything/x86_64/debug/' where shortname = 'F-14-x86_64-d'; - COMMIT; - -Orphaning all the packages for a user -===================================== - -This can be done in the database if you don't want to send email:: - - $ ssh db02 - $ sudo -u postgres psql pkgdb - pkgdb> select * from packagelisting where owner = 'xulchris'; - pkgdb> -- Check that the list doesn't look suspicious.... There should be a record for every fedora release * package - pkgdb> BEGIN; - pkgdb> update packagelisting set owner = 'orphan', statuscode = 14 where owner = 'xulchris'; - pkgdb> -- If the right number of rows were changed - pkgdb> COMMIT; - -.. note:: - Note that if you do it via pkgdb-client or the python-fedora API instead, - you'll want to only orphan the packages on non-EOL branches that exist to - cut down on the amount of email that's sent. That entails figuring out - what branches you need to do this on. diff --git a/docs/sops/pdc.rst b/docs/sops/pdc.rst deleted file mode 100644 index 0de23c9..0000000 --- a/docs/sops/pdc.rst +++ /dev/null @@ -1,133 +0,0 @@ -.. title: PDC SOP -.. slug: infra-pdc -.. date: 2016-04-07 -.. taxonomy: Contributors/Infrastructure - -======= -PDC SOP -======= - -Store metadata about composes we produce and "component groups". - -App: https://pdc.fedoraproject.org/ -Source for frontend: https://github.com/product-definition-center/product-definition-center -Source for backend: https://github.com/fedora-infra/pdc-updater - -Contact Information -------------------- - -Owner - Release Engineering, Fedora Infrastructure Team -Contact - #fedora-apps, #fedora-releng, #fedora-admin, #fedora-noc -Servers - pdc-web0{1,2}, pdc-backend01 -Purpose - Store metadata about composes and "component groups" - -Description ------------ - -The Product Definition Center (PDC) is a webapp and API designed for storing and -querying product metadata. We automatically populate our instance with data -from our existing releng tools/processes. It doesn't do much on its own, but -the goal is to enable us to develop more sane tooling down the road for future -releases. - -The webapp is a django app running on pdc-web0{1,2}. Unlike most of our other -apps, it does not use OpenID for authentication, but it instead uses SAML2. It -uses `mod_auth_mellon` to achieve this (in cooperation with ipsilon). The -webapp allows new data to be POST'd to it by admin users. - -The backend is a `fedmsg-hub` process running on pdc-backend01. It listens for -new composes over fedmsg and then POSTs data about those composes to PDC. It -also listens for changes to the fedora atomic host git repo in pagure and -updates "component groups" in PDC to reflect what rpm components constitute -fedora atomic host. - - -For long-winded history and explanation, see the original Change document: -https://fedoraproject.org/wiki/Changes/ProductDefinitionCenter - -Upgrading the Software ----------------------- - -There is an upgrade playbook in ``playbooks/manual/upgrade/pdc.yml`` which will -upgrade both the frontend and the backend if new packages are available. -Database schema upgrades should be handled automatically with a run of that -playbook. - -Logs ----- - -Logs for the frontend are in `/var/log/httpd/error_log` on pdc-web0{1,2}. - -Logs for the backend can be accessed with `journalctl -u fedmsg-hub -f` on -pdc-backend01. - -Restarting Services -------------------- - -The frontend runs under apache. So either `apachectl graceful` or `systemctl -restart httpd` should do it. - -The backend runs as a fedmsg-hub, so `systemctl restart fedmsg-hub` should -restart it. - -Scripts -------- - -The pdc-updater package (installed on pdc-backend01) provides three scripts: - -- pdc-updater-audit -- pdc-updater-retry -- pdc-updater-initialize - -A possible failure scenario is that we will lose a fedmsg message and the -backend will not update the frontend with info about that compose. To detect -this, we provide the `pdc-updater-audit` command (which gets run once daily by -cron with emails sent to the releng-cron list). It compare all of the entries -in PDC with all of the entries in kojipkgs and then raises an alert if there is -a discrepancy. - -Another possible failure scenario is that the fedmsg message is published and -received correctly, but there is some processing error while handling it. The -event occurred, but the import to the PDC db failed. The `pdc-updater-audit` -script should detect this discrepancy, and then an admin will need to manually -repair the problem and retry the event with the `pdc-updater-retry` command. - -If doomsday occurs and the whole thing is totally hosed, you can delete the db -and re-ingest all information available from releng with the -``pdc-updater-initialize`` tool. (Creating the initial schema needs to happen -on pdc-web01 with the standard django settings.py commands.) - -Manually Updating Information ------------------------------ - -In general, you shouldn't have to do these things. pdc-updater will -automatically create new releases and update information, but if you ever need -to manipulate PDC data, you can do it with the pdc-client tool. A copy is -installed on pdc-backend01 and there are some credentials there you'll need, so -ssh there first. - -Make sure that you are root so that you can read `/etc/pdc.d/fedora.json`. - -Try listing all of the releases:: - - $ pdc -s fedora release list - -Deactivating an EOL release:: - - $ pdc -s fedora release update fedora-21-updates --deactivate - -.. note:: There are lots more attribute you can manipulate on a release (you can change - the type, and rename them, etc..) See `pdc --help` and `pdc release --help` for - more information. - -Listing all composes:: - - $ pdc -s fedora compose list - -We're not sure yet how to flag a compose as the Gold compose, but when we do, -the answer should appear here: -https://github.com/product-definition-center/product-definition-center/issues/428 diff --git a/docs/sops/pesign-upgrade.rst b/docs/sops/pesign-upgrade.rst deleted file mode 100644 index 575682a..0000000 --- a/docs/sops/pesign-upgrade.rst +++ /dev/null @@ -1,66 +0,0 @@ -.. title: Pesign Upgrades and Reboots -.. slug: infra-pesign-maintenance -.. date: 2013-05-29 -.. taxonomy: Contributors/Infrastructure - -======================= -Pesign upgrades/reboots -======================= - -Fedora has (currently) 2 special builders. These builders are used to -build a small set of packages that need to be signed for secure boot. -These packages include: grub2, shim, kernel, pesign-test-app - -When rebooting or upgrading pesign on these machines, you have to -follow a special process to unlock the signing keys. - -Contact Information -=================== - -Owner - Fedora Release Engineering, Kernel/grub2/shim/pesign maintainers -Contact - #fedora-admin, #fedora-kernel -Servers - bkernel01, bkernel02 -Purpose - Upgrade or restart singning keys on kernel/grub2/shim builders - -Procedure -=========== - -0. Coordinate with pesign maintainers or pesign-test-app commiters as well -as releng folks that have the pin to unlock the signing key. - -1. remove builder from koji:: - - koji disable-host bkernel01.phx2.fedoraproject.org - -2. Make sure all builds have completed. - -3. Stop existing processes:: - - service pcscd stop - service pesign stop - -4. Perform updates or reboots. - -5. Restart services (if you didn't reboot):: - - service pcscd start - service pesign start - -6. Unlock signing key:: - - pesign-client -t "OpenSC Card (Fedora Signer)" -u - (enter pin when prompted) - -7. Make sure no builds are in progress, then Re-add builder to koji, remove other builder:: - - koji enable-host bkernel01.phx2.fedoraproject.org - koji disable-host bkernel02.phx2.fedoraproject.org - -8. Have a commiter send a build of pesign-test-app and make sure it's signed correctly. - -9. If so, repeat process with second builder. - diff --git a/docs/sops/planetsubgroup.rst b/docs/sops/planetsubgroup.rst deleted file mode 100644 index fcec6c7..0000000 --- a/docs/sops/planetsubgroup.rst +++ /dev/null @@ -1,68 +0,0 @@ -.. title: Fedora Planet Subgroup SOP -.. slug: infra-planet-subgroup -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -================================== -Planet Subgroup Infrastructure SOP -================================== - -Fedora's planet infrastructure produces planet configs out of users' -``~/.planet`` files in their homedirs on fedorapeople.org. You can also create -subgroups of users into other planets. This document explains how to setup -new subgroups. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Servers - batcave01/ planet.fedoraproject.org -Purpose - provide easy setup of new planet groups on - planet.fedoraproject.org - -following: - -The Setup - -1. on batcave01:: - - cp -a configs/system/planet/grouptmpl configs/system/planet/newgroupname - -2. cd to the new directory - -3. run:: - - perl -pi -e "s/%%groupname/newgroupname/g" fpbuilder.conf base_config planet-group.cron templates/* - - replacing newgroupname with the groupname you want - -4. git add the whole dir - -5. edit ``manifests/services/planet.pp`` - -6. copy and paste everything from begging to end of the design team group, to use as a template. - -7. modify what you copied replacing design with the new group name - -8. save it - -9. check everything in - -10. run ansible on planet and check if it works - -Use -=== - -Tell the requester to then copy their current .planet file to -.planet.newgroupname. For example with the design team:: - - cp ~/.planet ~/.planet.design - -This will then show up on the new feed - -http://planet.fedoraproject.org/design/ - diff --git a/docs/sops/privatefedorahosted.rst b/docs/sops/privatefedorahosted.rst deleted file mode 100644 index 24ea3aa..0000000 --- a/docs/sops/privatefedorahosted.rst +++ /dev/null @@ -1,69 +0,0 @@ -.. title: Fedorahosted Private Tickets SOP -.. slug: infra-fedorahosted-private-tickets -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -=============================================== -Private fedorahosted tickets Infrastructure SOP -=============================================== - -Provides for users only viewing tickets they are involved with. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-hosted - -Location - - -Servers - hosted1 - -Purpose - Provides for users only viewing tickets they are involved with. - -Description -=========== - -Fedora Hosted Projects have the option of setting ticket permissions so -that only users involved with tickets can see them. This plugin requires -someone in sysadmin-hosted to set it up, and requires justification to -use. The only current implementation is a request tracking system at -[45]https://fedorahosted.org/famnarequests for tracking requests for North -American ambassadors since mailing addresses, etc will be put in there. - -Implementation -============== - -On hosted1:: - - sudo -u apache vim /srv/web/trac/projects//conf/trac.ini - -Add the following to the appropriate sections of ``trac.ini``:: - - [privatetickets] - group_blacklist = anonymous, authenticated - - [components] - privatetickets.* = enabled - - [trac] - permission_policies = PrivateTicketsPolicy, DefaultPermissionPolicy, LegacyAttachmentPolicy - -.. note:: For projects not currently using plugins, you'll have to add the - [components] section, and you'll need to add the permission_policies to - the [trac] section. - -Next, someone with TRAC_ADMIN needs to grant TICKET_VIEW_SELF (a new -permission) to authenticated. This permission allows users to view tickets -that they are either owner, CC, or reporter on. There are other options -more fully described at [46]the upstream site. - -Make sure that TICKET_VIEW is removed from anonymous, or else this plugin -will have no effect. - diff --git a/docs/sops/publictest-dev-stg-production.rst b/docs/sops/publictest-dev-stg-production.rst deleted file mode 100644 index 5bda0f5..0000000 --- a/docs/sops/publictest-dev-stg-production.rst +++ /dev/null @@ -1,88 +0,0 @@ -.. title: Infrastucture Machine Classifications -.. slug: infra-machine-classes -.. date: 2011-10-30 -.. taxonomy: Contributors/Infrastructure - -===================================== -Fedora Infrastructure Machine Classes -===================================== - -Contact Information -=================== - -Owner - sysadmin-main, application developers -Contact - sysadmin-main -Location - Everywhere we have machines. -Servers - publictest, dev, staging, production -Purpose - Explain our use of various types of machines. - -Introduction -============ - -This document explains what are various types of machines are used for in -the life cycle of providing an application or resource. - -Public Test machines -==================== - -publictest instances are used for early investigation into a resource or application. -At this stage the application might not be packaged yet, and we want to see if it's -worth packaging and starting it on the process to be available in production. -These machines are accessable to anyone in the sysadmin-test group, and coordination -of use of instances is done on an ad-hock basis. These machines are re-installed -every cycle cleanly, so all work must be saved before this occurs. - -Authentication must not be against the production fas server. We have -fakefas.fedoraproject.org setup for these systems instead. - -.. note:: We're planning on merging publictest into the development servers. - Environment-wise they'll be mostly the same (one service per machine, a - group to manage them, no proxy interaction, etc) Service by service we'll - assign timeframes to the machines before being rebuilt, decommissioned if - no progress, etc. - -Development -=========== - -These instances are for applications that are packaged and being investigated for -deployment. Typically packages and config files are modified locally to get the -application or resource working. No caching or proxies are used. Access is to a -specific sysadmin group for that application or resource. These instances can -be re-installed on request to 'start over' getting configration ready. - -Some services hosted on dev systems are for testing new programs. These will -usually be associated with an RFR and have a limited lifetime before the new -service has to prove itself worthy of continued testing, to be moved on to -stg, or have the machine decommissioned. Other services are for developing -existing services. They are handy if the setup of the service is tricky or -lengthy and the person in charge wants to maintain the .dev server so that -newer contributors don't have to perform that setup in order to work on the -service. - -Authentication must not be against the production fas server. We have -fakefas.fedoraproject.org setup for these systems instead. - -.. note:: fakefas will be renamed fas01.dev at some point in the future - -Staging -======= - -These instances are used to integrate the application or resource into ansible -as well as proxy and caching setups. These instances should use ansible to deploy -all parts of the application or resource possible. Access to these instances -is only to a sysadmin group for that application, who may or may not have sudo -access. Permissions on stg mirror permissions on production (for instance, -sysadmin-web would have access to the app servers in stg the same as -production). - -Production -========== - -These instances are used to serve the ready for deployment application to the public. -All changes are done via ansible and access is restricted. Changes should be done -here only after testing in staging. diff --git a/docs/sops/rdiff-backup.rst b/docs/sops/rdiff-backup.rst deleted file mode 100644 index 1b92026..0000000 --- a/docs/sops/rdiff-backup.rst +++ /dev/null @@ -1,108 +0,0 @@ -.. title: rdiff-backup Infrastructure SOP -.. slug: infra-rdiff -.. date: 2013-11-01 -.. taxonomy: Contributors/Infrastructure - -================ -rdiff-backup SOP -================ - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - Phoenix -Servers - backup03 and others -Purpose - backups of critical data - -Description -=========== - -We are now running a rdiff-backup of all our critical data on a daily basis. -This allows us to keep incremental changes over time as well has have a recent copy -in case of disaster recovery. - -The backups are run from backup03 every day at 22:10UTC as root. -All config is in ansible. - -The cron job checks out the ansible repo from git, then runs ansible-playbook with -the rdiff-backup playbook. This playbook looks at variables to decide which -machines and partitions to backup. - -- First, machines in the backup_clients group in inventory are operated on. - If a host is not in that group it is not backed up via rdiff-backup. - -- Next, any machines in the backup_clients group will have their /etc and /home - directories backed up by the server running rdiff-backup and using the rdiff-backup - ssh key to access the client. - -- Next, if any of the hosts in backup_clients have a variable set for - host_backup_targets, those directories will also be backed up in the same - manner as above with the rdiff-backup ssh key. - -For each backup an email will be sent to sysadin-backup-members with a summary. - -Backups are stored on a netapp volume, so in addition to the incrementals -that rdiff-backup provides there are netapp snapshots. This netapp volume is -mounted on /fedora_backups and is running dedup on the netapp side. - -Rebooting backup03 -================== - -When backup03 is rebooted, you must restart the ssh-agent and reload the -rdiff-backup ssh key into that agent so backups can take place. - -:: - - sudo -i - ssh-agent -s > sshagent - source sshgent - ssh-add .ssh/rdiff-backup-key - -Adding a new host to backups -============================ - -1. add the host to the backup_clients inventory group in ansible. - -2. If you wish to backup more than /etc and /home, add a variable to: - inventory/host_vars/fqdn like: - host_backup_targets: ['/srv'] - -3. On the client to be backed up, install rdiff-backup. - -4. On the client to be backed up, install the rdiff-backup ssh public key to - ``/root/.ssh/authorized_keys`` - It should be restricted from:: - - from="10.5.126.161,192.168.1.64" - - and command can be restricted to:: - - command="rdiff-backup --server --restrict-update-only" - -Restoring from backups -====================== -rdiff backup keeps a copy of the most recent version of files on disk, so if you -wish to restore the last backup copy, simply rsync from backup03. If you wish an older -incremental, see rdiff-backup man page for how to specify the exact time. - -Retention -========= - -Backups are currently kept forever, but likely down the road we will look at -pruning them some to match available space. - -Public_key: -=========== - -:: - - ssh-dss - AAAAB3NzaC1kc3MAAACBAJr3xqn/hHIXeth+NuXPu9P91FG9jozF3Q1JaGmg6szo770rrmhiSsxso/Ibm2mObqQLCyfm/qSOQRynv6tL3tQVHA6EEx0PNacnBcOV7UowR5kd4AYv82K1vQhof3YTxOMmNIOrdy6deDqIf4sLz1TDHvEDwjrxtFf8ugyZWNbTAAAAFQCS5puRZF4gpNbaWxe6gLzm3rBeewAAAIBcEd6pRatE2Qc/dW0YwwudTEaOCUnHmtYs2PHKbOPds0+Woe1aWH38NiE+CmklcUpyRsGEf3O0l5vm3VrVlnfuHpgt/a/pbzxm0U6DGm2AebtqEmaCX3CIuYzKhG5wmXqJ/z+Hc5MDj2mn2TchHqsk1O8VZM+1Ml6zX3Hl4vvBsQAAAIALDt5NFv6GLuid8eik/nn8NORd9FJPDBJxgVqHNIm08RMC6aI++fqwkBhVPFKBra5utrMKQmnKs/sOWycLYTqqcSMPdWSkdWYjBCSJ/QNpyN4laCmPWLgb3I+2zORgR0EjeV2e/46geS0MWLmeEsFwztpSj4Tv4e18L8Dsp2uB2Q== - root@backup03-rdiff-backup diff --git a/docs/sops/requestforresources.rst b/docs/sops/requestforresources.rst deleted file mode 100644 index e1e4852..0000000 --- a/docs/sops/requestforresources.rst +++ /dev/null @@ -1,158 +0,0 @@ -.. title: Infrastructure Request for Resources SOP -.. slug: infra-rfr -.. date: 2015-04-23 -.. taxonomy: Contributors/Infrastructure - -========================= -Request for resources SOP -========================= - -Contents -========= - -1. Contact Information -2. Introduction -3. Pre sponsorship -4. Planning -5. Development Instance -6. Staging Instance -7. Production deployment -8. Maintenance - -Contact Information -==================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - fedoraproject.org/wiki -Servers - dev, stg, production -Purpose - Explains the technical part of Request for Resources - -Introduction -============ - -Once a RFR has a sponsor and has been generally agreed to move forward, -this SOP will describe the technical parts of moving a RFR through the -various steps it needs from idea to implementation. Note that for high -level and non technical requirements, please see the main RFR page. - -A RFR will go through (at least) the following steps, but note that it can -be dropped, removed or reverted at any time in the process and that MUST -items MUST be provided before the next step is possible. - -Pre sponsorship -=============== - -Until a RFR has a sysadmin-main person who is sponsoring and helping with -the request, no further technical action should take place with this SOP. -Please see the main RFR SOP to aquire a sponsor and do the steps needed -before implementation starts. If your resource requires packages to be -complete, please finish your packaging work before moving forward with the -RFR (accepted/approved packages in Fedora/EPEL). If your RFR only has a -single person working on it, please gather at least another person before -moving forward. Single points of failure are to be avoided. - -Requirements for continuing: ----------------------------- - -* MUST have a RFR ticket. - -* MUST have the ticket assigned and accepted by someone in - infrastructure sysadmin-main group. - -Planning -======== - -Once a sponsor is aquired and all needed packages have been packaged and -are available in EPEL, we move on to the planning phase. In this phase -discussion should take place about the application/resource on the -infrastructure list and IRC. Questions about how the resource could be -deployed should be considered: - -* Should the resource be load balanced? - -* Does the resource need caching? - -* Can the resource live on it's own instance to separate it from more - critical services? - -* Who all is involved in maintaining and deploying the instance? - -Requirements for continuing: ----------------------------- - -* MUST discuss/note the app on the infrastructure mailing list and - answer feedback there. - -* MUST determine who is involved in the deployment/maintaining the - resource. - -Development Instance -==================== - -In this phase a development instance is setup for the resource. This -instance is a single virtual host running the needed OS. The RFR sponsor -will create this instance and also create a group 'sysadmin-resource' for -the resource, adding all responsible parties to the group. It's then up to -sysadmin-resource members to setup the resource and test it. Questions -asked in the planning phase should be investigated once the instance is -up. Load testing and other testing should be performed. Issues like -expiring old data, log files, acceptable content, packaging issues, -configuration, general bugs, security profile, and others should be -investigated. At the end of this step a email should be sent to the -infrastucture list explaining the testing done and inviting comment. - -Requirements for continuing: ----------------------------- - -* MUST have RFR sponsor sign off that the resource is ready to move to - the next step. - -* MUST have answered any outstanding questions on the infrastructure - list about the resource. Decisions about caching, load balancing and - how the resource would be best deployed should be determined. - -* MUST add any needed SOP's for the service. Should there be an Update - SOP? A troubleshooting SOP? Any other tasks that might need to be done - to the instance when those who know it well are not available? - -Staging Instance -================ - -The next step is to create a staging instance for the resource. In this -step the resource is fully added to Ansible/configuration management. The -resource is added to caching/load balancing/databases and tested in this -new env. Once initial deployment is done and tested, another email to the -infrastructure list is done to note that the resource is available in -staging. - -Requirements for continuing: ----------------------------- - -* MUST have sign off of RFR sponsor that the resource is fully - configured in Ansible and ready to be deployed. - -* MUST have a deployment schedule for going to production. This will - need to account for things like freezes and availability of - infrastructure folks. - -Production deployment -===================== - -Finally the staging changes are merged over to production and the resource -is deployed. - -Monitoring of the resource is added and confirmed to be effective. - -Maintenance -=========== - -The resource will then follow the normal rules for production. Honoring -freezes, updating for issues or security bugs, adjusting for capacity, -etc. - diff --git a/docs/sops/resultsdb.rst b/docs/sops/resultsdb.rst deleted file mode 100644 index b4d917d..0000000 --- a/docs/sops/resultsdb.rst +++ /dev/null @@ -1,56 +0,0 @@ -.. title: Infrastucture resultsdb SOP -.. slug: infra-resultsdb -.. date: 2014-09-24 -.. taxonomy: Contributors/Infrastructure - -============= -resultsdb SOP -============= - -store results from taskotron tasks - -Contact Information -=================== - -Owner - Fedora QA Devel, Fedora Infrastructure Team -Contact - #fedora-qa, #fedora-admin, #fedora-noc -Location - PHX2 -Servers - resultsdb-dev01.qa, resultsdb-stg01.qa, resultsdb01.qa -Purpose - store results from taskotron tasks - -Architecture -============ - -ResultsDB as a system is made up of two parts - a results storage API and a -simple html based frontend for humans to view the results accessible through -that API (resultsdb and resultsdb_frontend). - -Deployment -========== - -The only part of resultsdb deployment that isn't currently in the ansible -playbooks is database initialization (disabled due to bug). - -Once the resultsdb app has been installed, initialize the database, run: -resultsdb init_db - -Updating -======== - -Database schema changes are not currently supported with resultsdb and the app -can be updated like any other web application: - -- update app -- restart httpd - -Backup -====== - -All important information in ResultsDB is stored in its database - backing up -that database is sufficient for backup and restoring that database from a -snapshot is sufficient for restoring. diff --git a/docs/sops/reviewboard.rst b/docs/sops/reviewboard.rst deleted file mode 100644 index a257f71..0000000 --- a/docs/sops/reviewboard.rst +++ /dev/null @@ -1,161 +0,0 @@ -.. title: ReviewBoard Infrastucture SOP -.. slug: infra-reviewboard -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -============================== -ReviewBoard Infrastructure SOP -============================== - -Review Board is a powerful web-based code review tool that offers -developers an easy way to handle code reviews. It scales well from small -projects to large companies and offers a variety of tools to take much of -the stress and time out of the code review process. - -Contents --------- - -1. Contact Information -2. File Locations -3. Troubleshooting and Resolution - - * Restarting - -4. Create a new repository in ReviewBoard - - * Creating a new git repository - * Creating a new bzr repository - * Create a default reviewer for a repository - -Contact Information -------------------- - -Owner: - Fedora Infrastructure Team - -Contact: - #fedora-admin, sysadmin-main, sysadmin-hosted - -Location: - ServerBeach - -Servers: - hosted[1-2] - -Purpose: - Provide our fedorahosted users a way to review code. - -File Locations -============== -Main Config File: - hosted[1-2]:/srv/reviewboard/conf/settings_local.py - -ReviewBoard: - hosted[1-2]:/etc/httpd/conf.d/fedorahosted.org/reviewboard.conf - -Upstream: - https://fedorahosted.org/reviewboard/ - -Troubleshooting and Resolution -============================== - -Restarting ----------- - -After an update, to restart reviewboard just restart apache. Doing a -service httpd stop and then a service httpd start should do it. - -Create a new repository in ReviewBoard -====================================== - -Creating a new git repository ------------------------------ - -1. Enter the admin interface. If you have admin privilege, a link will be - visible in the upper-right corner of the dashboard. -2. In the admin dashboard click "Add" next to "Repositories" -3. For the name, enter the Fedora Hosted project short name. (e.g. if the - project is [53]https://fedorahosted.org/sssd, then the repository name - should be sssd) -4. "Show this repository" must be checked. -5. Hosting service is "Custom" -6. Repository type is Git -7. Path should be /srv/git/project_short_name.git - (e.g. /srv/git/sssd.git) -8. Mirror path should be - git://git.fedorahosted.org/git/project_short_name.git - -.. note:: Mirror path is used by client tools such as post-review to - determine to which repository a submission belongs - -9. Raw file URL mask should be left blank -10. Username and Password should both be left blank -11. The bug tracker URL may vary from project to project, but if they are - using the Fedora Hosted Trac bugtracker, it should be - - * Type: Trac - * Bug Tracker URL: [54]https://fedorahosted.org/project_short_name - (e.g. [55]https://fedorahosted.org/sssd) - -12. Do not set a Bug Tracker URL - -Creating a new bzr repository ------------------------------ -1. Go to the admin dashboard to [56]add a new repository. -2. For the name, enter the Fedora Hosted project short name. (e.g. if the - project is [57]https://fedorahosted.org/kitchen, then the repository - name should be kitchen) -3. "Show this repository" must be checked. -4. Hosting service is "Custom" -5. Repository type is Bazaar -6. Path should be /srv/git/project_short_name/branch_name - (e.g. /srv/bzr/kitchen/devel) -- reviewboard doesn't understand how to work - with repository conventions; it just works on branches. -7. Mirror path should be - bzr://bzr.fedorahosted.org/bzr/project_short_name/branch_name - -.. note:: Mirror path is used by client tools such as post-review to - determine to which repository a submission belongs - -8. Username and Password should both be left blank -9. The bug tracker URL may vary from project to project, but if they are - using the Fedora Hosted Trac bugtracker, it should be - - * Type: Trac - * Bug Tracker URL: [58]https://fedorahosted.org/project_short_name - (e.g. [59]https://fedorahosted.org/kitchen) - -10. Do not set a Bug Tracker URL - -Create a default reviewer for a repository ------------------------------------------- - -Reviews should be sent to the project development mailing list unless -otherwise requested. - -1. Enter the admin interface. If you have admin privilege, a link will be - visible in the upper-right corner of the dashboard. -2. In the admin dashboard click "Add" next to "Review Groups" -3. Enter the following values: - - * Name: The project short name - * Display Name: project_short_name Review Group - * Mailing List: Development discussion list for the project - -4. Do not select any users -5. Return to the main admin dashboard and click on "Add" next to "Default - Reviewers" -6. Enter the following values: - - * Name: Something unique and sensible - * File Regular Expression: enter '.*' (without the quotes) - -.. note:: This means that by default, the mailing list should receive - email for reviews of all files in the repository - -7. Under "Default groups", select the group you created above and click - the arrow pointing right. -8. Do not select any default people -9. Under "Repositories", select the repository added above and click the - arrow pointing right. -10. Save your changes. diff --git a/docs/sops/scmadmin.rst b/docs/sops/scmadmin.rst deleted file mode 100644 index 1ce632d..0000000 --- a/docs/sops/scmadmin.rst +++ /dev/null @@ -1,294 +0,0 @@ -.. title: Infrastructure SCM Admin SOP -.. slug: infra-scm-admin -.. date: 2015-01-01 -.. taxonomy: Contributors/Infrastructure - -============= -SCM Admin SOP -============= - -.. warning:: Most information here (probably 1.4 and later) is not updated for - pkgdb2 and therefore not correct anymore. - -Contents -======== - -1. Creating New Packages - - 1. Obtaining process-git-requests - 2. Prerequisites - 3. Running the script - 4. Steps for manual processing - - 1. Using pkgdb-client - 2. Using pkgdb2branch - 3. Update Koji - - 5. Helper Scripts - - 1. mkbranchwrapper - 2. setup_package - - 6. Pseudo Users for SIGs - -2. Deprecate Packages -3. Undeprecate Packages -4. Performing mass comaintainer requests - -Creating New Packages -===================== - -Package creation is mostly automatic and most details are handled by a script. - -Obtaining process-git-requests ------------------------------- - -The script is not currently packaged; lives in the rel-eng -git repository. You can check it out with:: - - git clone https://git.fedorahosted.org/git/releng - -and keep this up to date by running:: - - git pull - -occasionally somewhere in the checked-out tree occasionally before -processing new requests. - -The script lives in ``scripts/process-git-requests``. - -Prerequisites -------------- - -You must have the python-bugzilla and python-fedora packages installed. - -Before running process-git-requests, you should run:: - - bugzilla login - -The "Username" you will be prompted for is the email address attached to -your bugzilla account. This will obtain a cookie so that the script can -update bugzilla tickets. The cookie is good for quite some time (at least -a month); if you wish to remove it, delete the ``~/.bugzillacookies`` file. - -It is also advantageous to have your Fedora ssh key loaded so that you can -ssh into pkgs.fedoraproject.org without being prompted for a password. - -It perhaps goes without saying that you will need unfirewalled and -unproxied access to ports 22, 80 and 443 on various Fedora machines. - -Running the script ------------------- - -Simply execute the process-git-requests script and follow the prompts. It -can provide the text of all comments in the bugzilla ticket for inspection -and will perform various useful checks on the ticket and the included SCM -request. If there are warnings present, you will need to accept them -before being allowed to process the request. - -Note that the script only looks at the final request in a ticket; this -permits users to tack on a new request at any time and re-raise the -fedora-cvs flag. Packagers do not always understand this, though, so it is -necessary to read through the ticket contents to make sure that's the -request matches reality. - -After a request has been accepted, the script will create the package in -pkgdb (which may require your password) and attempt to log into the SCM -server to create the repository. If this does not succeed, the package -name is saved and when you finish processing a command line will be output -with instructions on creating the repositories manually. If you hit Crtl-C -or the script otherwise aborts, you may miss this information. If so, see -below for information on running pkgdb2branch.py on the SCM server; you -will need to run it for each package you created. - -Steps for manual processing ---------------------------- - -It is still useful to document the process of handling these requests -manually in the case that process-git-requests has issues. - -1. Check Bugzilla Ticket to make sure it looks ok -2. Add the package information to the packagedb with pkgdb-client -3. Use pkgdb2branch to create the branches on the cvs server - - .. warning:: Do not run multiple instances of pkgdb2branch in parallel! - This will cause them to fail due to mismatching 'modules' files. It's not - a good idea to run addpackage, mkbranchwrapper, or setup_package by - themselves as it could lead to packages that don't match their packagedb - entry. - -4. Update koji. - -Using pkgdb-client -`````````````````` - -Use pkgdb-client to update the pkgdb with new information. For instance, -to add a new package::: - - pkgdb-client edit -u toshio -o terjeros \ - -d 'Python module to extract EXIF information' \ - -b F-10 -b F-11 -b devel python-exif - -To update that package later and add someone to the initialcclist do:: - - pkgdb-client edit -u toshio -c kevin python-exif - -To add a new branch for a package:: - - pkgdb-client edit -u toshio -b F-10 -b EL-5 python-exif - -To allow provenpackager to edit a branch:: - - pkgdb-client edit -u toshio -b devel -a provenpackager python-exif - -To remove provenpackager commit rights on a branch:: - - pkgdb-client edit -u toshio -b EL-5 -b EL-4 -r provenpackager python-exif - -More options can be found by running ``pkgdb-client --help`` - -You must be in the cvsadmin group to use pkgdb-client. It can be run on a -non-Fedora Infrastructure box if you set the PACKAGEDBURL environment -variable to the public URL:: - - export PACKAGEDBURL=https://admin.fedoraproject.org/pkgdb - -.. note:: - You may be asked to CC fedora-perl-devel-list on a perl package. This can - be done with the username "perl-sig". This is presently a user, not a - group so it cannot be used as an owner or comaintainer, only for CC. - -Using pkgdb2branch ------------------- - -Use pkgdb2branch.py to create branches for a package. pkgdb2branch.py -takes a list of package names on the command line and creates the branches -that are specified in the packagedb. The script lives in /usr/local/bin on -the SCM server (pkgs.fedoraproject.org) and must be run there. - -For instance, ``pkgdb2branch.py python-exif qa-assistant`` will create branches -specified in the packagedb for python-exif and qa-assistant. - -pkgdb2branch can only be run from pkgs.fedoraproject.org. - -Update Koji ------------ - -Optionally you can synchronize pkgdb and koji by hand: it is done -automatically hourly by a cronjob. There is a script for this in the -admin/ directory of the CVSROOT module. - -Since dist-f13 and later inherit from dist-f12, and currently dist-f12 is -the basis of our stack, it's easiest to just call:: - - ./owner-sync-pkgdb dist-f12 - -Just run ``./owners-sync-pkgdb`` for usage output. - -This script requires that you have a properly configured koji client -installed. - -owner-sync-pkgdb requires the koji client libraries which are not -available on the cvs server. So you need to run this from one of your -machines. - -Helper Scripts -============== - -These scripts are invoked by the scripts above, doing some of the heavy -lifting. They should not ordinarily be called on their own. - -mkbranchwrapper ---------------- - -``/usr/local/bin/mkbranchwrapper`` is a shell script which takes a list of -packages and branches. For instance:: - - mkbranchwrapper foo bar EL-5 F-11 - -will create modules foo and bar for devel if they don't exist and branch -them for the other 4 branches passed to the script. If the devel branch -exists then it just branches. If there is no branches passed the module is -created in devel only. - -``mkbranchwrapper`` has to be run from cvs-int. - -.. important:: mkbranchwrapper is not used by any current programs. Use pkgdb2branch instead. - -setup_package -------------- - -``setup_package`` creates a new blank module in devel only. It can be run from -any host. To create a new package run:: - - setup_package foo - -setup_package needs to be called once for each package. it could be -wrapped in a shell script similar to:: - - #!/bin/bash - - PACKAGES="" - - for arg in $@; do - PACKAGES="$PACKAGES $arg" - done - - echo "packages=$PACKAGES" - - for package in $PACKAGES; do - ~/bin/setup_package $package - done - - -then call the script with all branches after it. - -.. note:: setup_package is currently called from pkgdb2branch. - -Pseudo Users for SIGs ---------------------- - -See [62]Package_SCM_admin_requests#Pseudo-users_for_SIGs for the current list. - -Deprecate Packages ------------------- - -Any packager can deprecate a package. click on the deprecate package -button for the package in the webui. There's currently no ``pkgdb-client`` -command to deprecate a package. - -Undeprecate Packages --------------------- - -Any cvsadmin can undeprecate a package. Simply use pkgdb-client to assign -an owner and the package will be undeprecated:: - - pkgdb-client -o toshio -b devel qa-assistant - -As a cvsadmin you can also log into the pkgdb webui and click on the -unretire package button. Once clicked, the package will be orphaned rather -than deprecated. - -Performing mass comaintainer requests -------------------------------------- - -* Confirm that the requestor has 'approveacls' on all packages they wish - to operate on. If they do not, they MUST request the change via FESCo. - -* Mail maintainers/co-maintainers affected by the change to inform them - of who requested the change and why. - -* Download a copy of this script: - http://git.fedorahosted.org/git/?p=fedora-infrastructure.git;a=blob;f=scripts/pkgdb_bulk_comaint/comaint.py;hb=HEAD - -* Edit the script to have the proper package owners and package name - pattern. - -* Edit the script to have the proper new comaintainers. - -* Ask someone in ``sysadmin-web`` to disable email sending on bapp01 for the - pkgdb (following the instructions in comments in the script) - -* Copy the script to an infrastructure host (like cvs01) that can - contact bapp01 and run it. - diff --git a/docs/sops/selinux.rst b/docs/sops/selinux.rst deleted file mode 100644 index e9362c3..0000000 --- a/docs/sops/selinux.rst +++ /dev/null @@ -1,124 +0,0 @@ -.. title: SELinux Infrastructure SOP -.. slug: infra-selinux -.. date: 2012-03-19 -.. taxonomy: Contributors/Infrastructure - -========================== -SELinux Infrastructure SOP -========================== - -SELinux is a fundamental part of our Operating System but still has a -large learning curve and remains quite intimidating to both developers and -system administrators. Fedora's Infrastructure has been growing at an -unfathomable rate, and is full of custom software that needs to be locked -down. The goal of this SOP is to make it simple to track down and fix -SELinux policy related issues within Fedora's Infrastructure. - -Fully deploying SELinux is still an ongoing task, and can be tracked in -fedora-infrastructure [45]ticket #230. - -Contents -======== - -1. Contact Information -2. Step One: Realizing you have a problem -3. Step Two: Tracking down the violation -4. Step Three: Fixing the violation - - 1. Allowing ports - 2. Toggling an SELinux boolean - 3. Setting custom context - 4. Deploying custom policy modules - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main, sysadmin-web groups -Purpose - To ensure that we are able to fully wield the power of SELinux - within our infrastructure. - -Step One: Realizing you have a problem -======================================= - -If you are trying to find a specific problem on a host go look in the -audit.log per-host on our cental log server. See the syslog SOP for -more information. - -Step Two: Tracking down the violation -===================================== - -Generate SELinux policy allow rules from logs of denied operations. This -is useful for getting a quick overview of what has been getting denied on -the local machine:: - - audit2allow -la - -You can obtain more detailed audit messages by using ausearch to get the -most recent violations:: - - ausearch -m avc -ts recent - -Again -see the syslog SOP for more information here. - -Step Three: Fixing the violation -================================ - -Below are examples of using our current ansible configuration to make -SELinux deployment changes. These constructs are currently home-brewed, -and do not exist in upstream Ansible. For these functions to work, you must -ensure that the host or servergroup is configured with 'include selinux', -which will enable SELinux in permissive mode. Once a host is properly -configured, this can be changed to 'include selinux-enforcing' to enable -SELinux Enforcing mode. - -.. note:: - Most services have $service_selinux manpages that are automatically generated from policy. - -Toggling an SELinux boolean ---------------------------- - -SELinux booleans, which can be viewed by running `semanage boolean -l`, -can easily be configured using the following syntax within your ansible -configuration.:: - - seboolean: name=httpd_can_network_connect_db state=yes persistent=yes - -Setting custom context ----------------------- - -Our infrastructure contains many custom applications, which may utilize -non-standard file locations. These issues can lead to trouble with -SELinux, but they can easily be resolved by setting custom file context.:: - - "file: path=/var/tmp/l10n-data recurse=yes setype=httpd_sys_content_t" - - -Fixing odd errors from the logs -------------------------------- -If you see messages like this in the log reports:: - - restorecon:/etc/selinux/targeted/contexts/files/file_contexts: Multiple same / specifications for /home/fedora. - matchpathcon: / /etc/selinux/targeted/contexts/files/file_contexts: Multiple same / / specifications for /home/fedora. - -Then it is likely you have an overlapping filecontext in your local selinux context configuration - in this case likely one added by ansible accidentally. - -To find it run this:: - - semanage fcontext -l | grep /path/being/complained/about - -sometimes it is just an ordering problem and reversing them solves it -other times it is just an overlap, period. - -look at the context and delete the one you do not want or reorder. - -To delete run:: - - semanage fcontext -d '/entry/you/wish/to/delete' - -This just removes that filecontext - no need to worry about files being deleted. - -Then rerun the triggering command and see if the problem is solved. diff --git a/docs/sops/sigul-upgrade.rst b/docs/sops/sigul-upgrade.rst deleted file mode 100644 index 14b8897..0000000 --- a/docs/sops/sigul-upgrade.rst +++ /dev/null @@ -1,73 +0,0 @@ -.. title: Sigul Servers Maintenance SOP -.. slug: infra-sigul-mainenance -.. date: 2015-02-04 -.. taxonomy: Contributors/Infrastructure -============================== -Sigul servers upgrades/reboots -============================== - -Fedora currently has 1 sign-bridge and 2 sign-vault machines for primary, there -is a similar setup for secondary architectures. When upgrading or rebooting -these machines, some special steps must be taken to ensure everything is -working as expected. - -Contact Information -------------------- - -Owner - Fedora Release Engineering -Contact - #fedora-admin, #fedora-noc -Servers - sign-vault03, sign-vault04, sign-bridge02, secondary-bridge01.qa -Purpose - Upgrade or restart sign servers - -Description ------------ -0. Coordinate with releng on timing. Make sure no signing is happening, and -none is planned for a bit. - -Sign-bridge02, secondary-bridge01.qa: - - 1. Apply updates or changes - - 2. Reboot virtual instance - - 3. Once it comes back, start the sigul_bridge service and enter empty password. - -Sign-vault03/04: - - 1. Determine which server is currently primary. It's the one that has the - floating ip address for sign-vault02 on it. - - 2. Login to the non primary server via serial or management console. - (There is no ssh access to these servers) - - 3. Take a lvm snapshot:: - - lvcreate --size 5G --snapshot --name YYYMMDD /dev/mapper/vg_signvault04-lv_root - - Replace YYMMDD with todays year, month, day and the vg with the correct name - Then apply updates. - - 4. Confirm the server comes back up ok, login to serial console or management - console and start the sigul_server process. Enter password when prompted. - - 5. On the primary server, down the floating ip address:: - - ip addr del 10.5.125.75 dev eth0 - - 6. On the secondary server, up the floating ip address:: - - ip addr add 10.5.125.75 dev eth0 - - 7. Have rel-eng folks sign some packages to confirm all is working. - - 8. Update/reboot the old primary server and confirm it comes back up ok. - -.. note:: Changes to database - - When making any changes to the database (new keys, etc), it's important to - sync the data from the primary to the secondary server. This process is - currently manual. diff --git a/docs/sops/sshaccess.rst b/docs/sops/sshaccess.rst deleted file mode 100644 index 053f2f6..0000000 --- a/docs/sops/sshaccess.rst +++ /dev/null @@ -1,148 +0,0 @@ -.. title: SSH Access SOP -.. slug: infra-ssh-access -.. date: 2012-09-24 -.. taxonomy: Contributors/Infrastructure - -============================= -SSH Access Infrastructure SOP -============================= - -Contents -======== - -1. Contact Information -2. Introduction -3. SSH configuration -4. SSH Agent forwarding -5. Troubleshooting - -Contact Information -=================== - -Owner - sysadmin-main -Contact - #fedora-admin or admin@fedoraproject.org -Location - PHX2 -Servers - All PHX2 and VPN Fedora machines -Purpose - Access via ssh to Fedora project machines. - -Introduction -============ - -This page will contain some useful instructions about how you can safely -login into Fedora PHX2 machines successfully using a public key -authentication. As of 2011-05-27, all machines require a SSH key to -access. Password authentication will no longer work. Note that this SOP -has nothing to do with actually gaining access to specific machines. For -that you MUST be in the correct group for shell access to that machine. -This SOP simply describes the process once you do have valid and -appropriate shell access to a machine. - -SSH configuration -================= -First of all: (on your local machine):: - - vi ~/.ssh/config - -.. note:: - This file, and any keys, need to be chmod 600, or you will get a "Bad owner or - permissions" error. The .ssh directory must be mode 700. - -then, add the following:: - - Host bastion.fedoraproject.org - User FAS_USERNAME - ProxyCommand none - ForwardAgent no - Host *.phx2.fedoraproject.org *.qa.fedoraproject.org 10.5.125.* 10.5.126.* 10.5.127.* *.vpn.fedoraproject.org - User FAS_USERNAME - ProxyCommand ssh -W %h:%p bastion.fedoraproject.org - -One slight annoyance with this method is that you must include the -.phx2.fedoraproject.org part when you SSH to Fedora machines in order for -the connection to be tunneled through bastion. - -To avoid this You can add aliases for each of the Fedora machines you login to by -modifying the second Host line:: - - Host *.phx2.fedoraproject.org 10.5.125.* 10.5.126.* 10.5.127.* *.vpn.fedoraproject.org batcave01 noc01 # list all hosts here - -How ProxyCommand works? - -A connection is established to the bastion host - -+-------+ +--------------+ -| you | ---ssh---> | bastion host | -+-------+ +--------------+ -Bastion host establish a connction to the target server - -+--------------+ +--------+ -| bastion host | -------> | server | -+--------------+ +--------+ -Your client then connects through the Bastion and reaches the target server - -+-----+ +--------------+ +--------+ -| you | | bastion host | | server | -| | ===ssh=over=bastion============================> | | -+-----+ +--------------+ +--------+ - -PuTTY SSH configuration -======================= - -You can configure Putty the same way by doing this: - -0.In the session section type batcave01.phx2.fedoraproject.org port 22 -1.In Connection:Data enter your FAS_USERNAME -2.In Connection:Proxy add the proxy settings -.ProxyHostname is bastion.fedoraproject.org -.Port 22 -.Username FAS_USERNAME -.Proxy Command plink %user@%proxyhost %host:%port -3.In Connection:SSH:Auth remember to insert the same key file for authentication you have used on FAS profile - -SSH Agent forwarding -==================== - -You should normally have:: - - ForwardAgent no - -For Fedora hosts (this is the default in OpenSSH). You can override this -on a per-session basis by using '-A' with ssh. SSH agents could be misused -if you connect to a compromised host with forwarding on (the attacker can -use your agent to authenticate them to anything you have access to as long -as you are logged in). Additionally, if you do need SSH agent forwarding -(say for copying files between machines), you should remember to logout as -soon as you are done to not leave your agent exposed. - -Troubleshooting -=============== - -* 'channel 0: open failed: administratively prohibited: open failed' - - If you receive this message for a machine proxied through bastion, then - bastion was unable to connect to the host. This most likely means that - tried to SSH to a nonexistent machine. You can debug this by trying to - connect to that machine from bastion. - -* if your local username is different from the one registered in FAS, - please remember to set up a User variable (like above) where you - specify your FAS username. If that's missing SSH will try to login by - using your local username, thus it will fail. - -* ssh -vv is very handy for debugging what sections are matching and - what are not. - -* If you get access denied several times in a row, please consult with - #fedora-admin. If you try too many times with an invalid config your - IP could be added to denyhosts. - -* If you are running an OpenSSH version less than 5.4, then the -W - option is not available. In that case, use the following ProxyCommand - line instead:: - - ProxyCommand ssh -q bastion.fedoraproject.org exec nc %h %p diff --git a/docs/sops/sshknownhosts.rst b/docs/sops/sshknownhosts.rst deleted file mode 100644 index 380cd90..0000000 --- a/docs/sops/sshknownhosts.rst +++ /dev/null @@ -1,34 +0,0 @@ -.. title: SSH Known Hosts Infrastructure SOP -.. slug: infra-ssh-known-hosts -.. date: 2015-04-23 -.. taxonomy: Contributors/Infrastructure - -================================== -SSH known hosts Infrastructure SOP -================================== - -Provides Known Hosts file that is globally deployed and publicly available at -https://admin.fedoraproject.org/ssh_known_hosts - -Contact Information -=================== -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin group -Location: - all -Servers: - all -Purpose: - Provides Known Hosts file that is globally deployed. - - -Adding a host alias to the ssh_known_hosts -=========================================== - -If you need to add a host alias to a host in ssh_known_hosts simply -go to the dir for the host in infra-hosts and add a file named host_aliases -to the git repo in that dir. Put one alias per line and save. - -Then the next time fetch-ssh-keys runs it will add those aliases to known hosts. diff --git a/docs/sops/staging-infra.rst b/docs/sops/staging-infra.rst deleted file mode 100644 index b813815..0000000 --- a/docs/sops/staging-infra.rst +++ /dev/null @@ -1,142 +0,0 @@ -.. title: Infrastructure Staging SOP -.. slug: infra-staging -.. date: 2012-04-18 -.. taxonomy: Contributors/Infrastructure - -=========== -Staging SOP -=========== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-main -Location - Mostly in PHX2 -Servers - *stg* -Purpose - Staging environment to test changes to apps and create initial Ansible configs. - -Introduction -============ - -Fedora uses a set of staging servers for several purposes: - -* When applications are initially being deployed, the staging version of - those applications are setup with a staging server that is used to create the - initial Ansible configuration for the application/service. - -* Established applications/services use staging for testing. This testing includes: - - - Bugfix updates - - Configuration changes managed by Ansible - - Upstream updates to dependent packages (httpd changes for example) - -Goals -===== - -The staging servers should be self contained and have all the needed databases and such -to function. At no time should staging resources talk to production instances. We use firewall -rules on our production servers to make sure no access is made from staging. - -Staging instances do often use dumps of production databases and data, and -thus access to resources in staging should be controlled as it is in -production. - -DNS and naming -=============== - -All staging servers should be in the ``stg.phx2.fedoraproject.org`` domain. -``/etc/hosts`` files are used on stg servers to override dns in cases where staging resources -should talk to the staging version of a service instead of the production one. -In some cases, one staging server may be aliased to several services or applications that -are on different machines in production. - -Syncing databases -================= - -Syncing FAS ------------ - -Sometimes you want to resync the staging fas server with what's on -production. To do that, dump what's in the production db and then import -it into the staging db. Note that resyncing the information will remove -any of the information that has been added to the staging fas servers. So -it's good to mention that you're doing this on the infra list or to people -who you know are working on the staging fas servers so they can either -save their changes or ask you to hold off for a while. - -On db01:: - - ssh db01 - sudo -u postgres pg_dump -C fas2 |xz -c fas2.dump.xz - scp fas2.dump.xz db02.stg: - -On fas01.stg (postgres won't drop the database if something is accessing it) -(ATM, fas in staging is not load balanced so we only have to do this on one server):: - - $ sudo /etc/init.d/httpd stop - -On db02.stg:: - - $ echo 'drop database fas2' |sudo -u postgres psql - $ xzcat fas2.dump.xz | sudo -u postgres psql - -On fas01.stg:: - - $ sudo /etc/init.d/httpd start - -Other databases behave similarly. - -External access -=============== - -There is http/https access from the internet to staging instances to allow testing. -Simply replace the production resource domain with stg.fedoraproject.org and -it should go to the staging version (if any) of that resource. - -Ansible and Staging -=================== - -All staging machine configurations is now in the same branch -as master/production. - -There is a 'staging' environment - Ansible variable "env" is equal to -"staging" in playbooks for staging things. This variable can be used -to differentiate between producion and staging systems. - -Workflow for staging changes -============================ - -1. If you don't need to make any Ansible related config changes, don't -do anything. (ie, a new version of an app that uses the same config -files, etc). Just update on the host and test. - -2. If you need to make Ansible changes, either in the playbook of the -application or outside of your module: - - - Make use of files ending with .staging (see resolv.conf in global for - an example). So, if there's persistent changes in staging from - production like a different config file, use this. - - - Conditionalize on environment:: - - - name: your task - ... - when: env == "staging" - - - name: production-only task - ... - when: env != "staging" - -- These changes can stay if they are helpful for further testing down - the road. Ideally normal case is that staging and production are - configure in the same host group from the same Ansible playbook. - -Time limits on staging changes -============================== - -There is no hard limit on time spent in staging, but where possible we should -limit the time in staging so we are not carrying changes from production for a -long time and possible affecting other staging work. diff --git a/docs/sops/staging.rst b/docs/sops/staging.rst deleted file mode 100644 index 3002414..0000000 --- a/docs/sops/staging.rst +++ /dev/null @@ -1,146 +0,0 @@ -.. title: Infrastructure Ansible Staging SOP -.. slug: infra-staging-ansible -.. date: 2015-04-23 -.. taxonomy: Contributors/Infrastructure - -=================== -Ansible Staging SOP -=================== - -Owner - Fedora Infrastructure Team = - -Contact - #fedora-admin, sysadmin-main = - -Location - Mostly in PHX2 = - -Servers - *stg* = - -Purpose - Staging environment to test changes to apps and create initial Ansible configs. = - -Introduction -======================= - -Fedora uses a set of staging servers for several purposes: - -* When applications are initially being deployed, the staging version of - those applications are setup with a staging server that is used to create the - initial Ansible configuration for the application/service. - -* Established applications/services use staging for testing. This testing includes: - - - Bugfix updates - - Configuration changes managed by Ansible - - Upstream updates to dependent packages (httpd changes for example) - -Goals -===== - -The staging servers should be self contained and have all the needed databases and such -to function. At no time should staging resources talk to production instances. We use firewall -rules on our production servers to make sure no access is made from staging. - -Staging instances do often use dumps of production databases and data, and -thus access to resources in staging should be controlled as it is in -production. - -DNS and naming -============== - -All staging servers should be in the ``stg.phx2.fedoraproject.org`` domain. -/etc/hosts files are used on stg servers to override dns in cases where staging resources -should talk to the staging version of a service instead of the production one. -In some cases, one staging server may be aliased to several services or applications that -are on different machines in production. - -Syncing databases -================= - -Syncing FAS ------------ -Sometimes you want to resync the staging fas server with what's on -production. To do that, dump what's in the production db and then import -it into the staging db. Note that resyncing the information will remove -any of the information that has been added to the staging fas servers. So -it's good to mention that you're doing this on the infra list or to people -who you know are working on the staging fas servers so they can either -save their changes or ask you to hold off for a while. - -On db01:: - - $ ssh db01 - $ sudo -u postgres pg_dump -C fas2 |xz -c fas2.dump.xz - $ scp fas2.dump.xz db02.stg: - -On fas01.stg (postgres won't drop the database if something is accessing it) -(ATM, fas in staging is not load balanced so we only have to do this on one server):: - - $ sudo /etc/init.d/httpd stop - -On db02.stg:: - - $ echo 'drop database fas2' |sudo -u postgres psql - $ xzcat fas2.dump.xz | sudo -u postgres psql - -On fas01.stg:: - - $ sudo /etc/init.d/httpd start - -Other databases behave similarly. - -External access -=============== - -There is http/https access from the internet to staging instances to allow testing. -Simply replace the production resource domain with stg.fedoraproject.org and -it should go to the staging version (if any) of that resource. - -Ansible and Staging -=================== - -All staging machine configurations is now in the same branch -as master/production. - -There is a 'staging' environment - Ansible variable "env" is equal to -"staging" in playbooks for staging things. This variable can be used -to differentiate between producion and staging systems. - -Workflow for staging changes -============================ - -1. If you don't need to make any Ansible related config changes, don't - do anything. (ie, a new version of an app that uses the same config - files, etc). Just update on the host and test. - -2. If you need to make Ansible changes, either in the playbook of the - application or outside of your module: - - - Make use of files ending with .staging (see resolv.conf in global for - an example). So, if there's persistent changes in staging from - production like a different config file, use this. - - - Conditionalize on environment: - - - name: your task:: - ... - when: env == "staging" - - - name: production-only task:: - - ... - when: env != "staging" - - - These changes can stay if they are helpful for further testing down - the road. Ideally normal case is that staging and production are - configure in the same host group from the same Ansible playbook. - -Time limits on staging changes -=============================== - -There is no hard limit on time spent in staging, but where possible we should -limit the time in staging so we are not carrying changes from production for a -long time and possible affecting other staging work. diff --git a/docs/sops/stagingservers.rst b/docs/sops/stagingservers.rst deleted file mode 100644 index d206e07..0000000 --- a/docs/sops/stagingservers.rst +++ /dev/null @@ -1,144 +0,0 @@ -.. title: Infrastucture Staging Server SOP -.. slug: infra-staging-sop -.. date: 2012-04-18 -.. taxonomy: Contributors/Infrastructure - -==================================== -Fedora Infrastructure Staging Hosts -==================================== - -Owner - Fedora Infrastructure Team - -Contact - #fedora-admin, sysadmin-main - -Location - Mostly in PHX2 - -Servers - *stg* - -Purpose - Staging environment to test changes to apps and create initial Ansible configs. - -Introduction -============ -Fedora uses a set of staging servers for several purposes: - -* When applications are initially being deployed, the staging version of - those applications are setup with a staging server that is used to create the - initial Ansible configuration for the application/service. - -* Established applications/services use staging for testing. This testing includes: - - - Bugfix updates - - Configuration changes managed by Ansible - - Upstream updates to dependent packages (httpd changes for example) - -Goals -===== - -The staging servers should be self contained and have all the needed databases and such -to function. At no time should staging resources talk to production instances. We use firewall -rules on our production servers to make sure no access is made from staging. - -Staging instances do often use dumps of production databases and data, and -thus access to resources in staging should be controlled as it is in -production. - -DNS and naming -================== - -All staging servers should be in the 'stg.phx2.fedoraproject.org' domain. -/etc/hosts files are used on stg servers to override dns in cases where staging resources -should talk to the staging version of a service instead of the production one. -In some cases, one staging server may be aliased to several services or applications that -are on different machines in production. - -Syncing databases -================== - -Syncing FAS ------------- -Sometimes you want to resync the staging fas server with what's on -production. To do that, dump what's in the production db and then import -it into the staging db. Note that resyncing the information will remove -any of the information that has been added to the staging fas servers. So -it's good to mention that you're doing this on the infra list or to people -who you know are working on the staging fas servers so they can either -save their changes or ask you to hold off for a while. - -On db01:: - - $ ssh db01 - $ sudo -u postgres pg_dump -C fas2 |xz -c fas2.dump.xz - $ scp fas2.dump.xz db02.stg: - -On fas01.stg (postgres won't drop the database if something is accessing it) -(ATM, fas in staging is not load balanced so we only have to do this on one server):: - - $ sudo /etc/init.d/httpd stop - -On db02.stg:: - - $ echo 'drop database fas2' |sudo -u postgres psql - $ xzcat fas2.dump.xz | sudo -u postgres psql - -On fas01.stg:: - - $ sudo /etc/init.d/httpd start - -Other databases behave similarly. - -External access -================== - -There is http/https access from the internet to staging instances to allow testing. -Simply replace the production resource domain with stg.fedoraproject.org and -it should go to the staging version (if any) of that resource. - -Ansible and Staging -==================== - -All staging machine configurations is now in the same branch -as master/production. - -There is a 'staging' environment - Ansible variable "env" is equal to -"staging" in playbooks for staging things. This variable can be used -to differentiate between producion and staging systems. - -Workflow for staging changes -============================ - -1. If you don't need to make any Ansible related config changes, don't - do anything. (ie, a new version of an app that uses the same config - files, etc). Just update on the host and test. - -2. If you need to make Ansible changes, either in the playbook of the - application or outside of your module: - - - Make use of files ending with .staging (see resolv.conf in global for - an example). So, if there's persistent changes in staging from - production like a different config file, use this. - - - Conditionalize on environment:: - - - name: your task - ... - when: env == "staging" - - - name: production-only task - ... - when: env != "staging" - - - These changes can stay if they are helpful for further testing down - the road. Ideally normal case is that staging and production are - configure in the same host group from the same Ansible playbook. - -Time limits on staging changes -=============================== - -There is no hard limit on time spent in staging, but where possible we should -limit the time in staging so we are not carrying changes from production for a -long time and possible affecting other staging work. diff --git a/docs/sops/status-fedora.rst b/docs/sops/status-fedora.rst deleted file mode 100644 index 4d91874..0000000 --- a/docs/sops/status-fedora.rst +++ /dev/null @@ -1,87 +0,0 @@ -.. title: Fedora Status Service SOP -.. slug: infra-fedora-status -.. date: 2015-04-23 -.. taxonomy: Contributors/Infrastructure - -=========================== -Fedora Status Service - SOP -=========================== - -Fedora-Status is the software that generates the page at -http://status.fedoraproject.org/. This page should be kept -up to date with the current status of the services ran by -Fedora Infrastructure. - -This page is hosted at an OpenShift instance. -The upstream repository is fedora-status on FedoraHosted.org. - -Contact Information -------------------- - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, #fedora-noc -Servers: - An OpenShift instance -Purpose: - Give status information to users about the current - status of our public services. -Upstream: - http://git.fedorahosted.org/git/fedora-status.git - -How it works ------------- -To keep this website as stable as can be, the page is -generated at the time of upload by OpenShift. - -As soon as you push to the OpenShift repo, a build hook -will create the HTML page. - -Only members of sysadmin-noc and sysadmin-main can update -the status website. - -Updating the page ------------------ -1. Check out the repo at:: - - ssh://bab5ba6eb9b94f2083fdeefc5e87309b@status-fedora2.rhcloud.com/~/git/status.git/ - -2. cd status -3. Run ./manage.py - -manage.py takes 3+ arguments:: - -[status] "[short summary message]" [service] ([service] .....) - -[service] values can be found on http://status.fedoraproject.org/ on the RIGHT -SIDE of the header of each box. Examples are: 'wiki', 'pkgdb', and 'fedmsg'. - -You can use "-" (dash) to imply "All services" - -It accepts any number of additional services - -[status] should be:: - -'major' - A major service ourage. -'minor' - A minor service outage (e.g. limited/geographical outage) -'scheduled' - The current downtime to this service is related to a scheduled outage -'good' - Everything is fine and the service is functioning 100%. - -[short summary message] is what appears in the body of the box and should tell -users what is happening/why the service is down. - -You can use "-" (dash) to imply "Everything seems to be working." as the -status. - -Examples:: - -./manage.py major "We're performing maintenance on the wiki database" wiki -./manage.py zodbot minor "Some IRC channels are having issues doing XYZ." zodbot -./manage.py good - - # Set all services to good/default. -./manage.py good - wiki # Set wiki status to 'good' with the default message. -./manage.py good - wiki zodbot # Set both wiki and zodbot to good with default message. - -You can use the --general-info flag to set a "global" message, which appears -under the main status bar at the top of the page. Use this for big events that -effect all services, or to announce things like upcoming outages. diff --git a/docs/sops/syslog.rst b/docs/sops/syslog.rst deleted file mode 100644 index 16e716b..0000000 --- a/docs/sops/syslog.rst +++ /dev/null @@ -1,163 +0,0 @@ -.. title: Log Infrastructure SOP -.. slug: infra-syslog -.. date: 2014-09-01 -.. taxonomy: Contributors/Infrastructure - -====================== -Log Infrastructure SOP -====================== - -Logs are centrally referred to our loghost and managed from there by -rsyslog to create several log outputs. - -Epylog provides twice-daily log reports of activities on our systems. -It runs on our central loghost and generates reports on all systems -centrally logging. - -Contact Information -=================== - -Owner: - Fedora Infrastructure Team -Contact: - #fedora-admin, sysadmin-main -Location: - Phoenix -Servers: - log01.phx2.fedoraproject.org -Purpose: - Provides our central logs and reporting - - -Essential data/locations: -========================= - -* Logs compiled using rsyslog on log01 into a single set of logs for all - systems:: - - /var/log/merged/ - - These logs are rotated every day and kept for only 2 days. This set of logs - is only used for immediate analysis and more trivial 'tailing' of - the log file to watch for events. - -* Logs for each system separately in ``/var/log/hosts`` - - These logs are maintained forever, practically, or for as long as we - possibly can. They are broken out into a ``$hostname/$YEAR/$MON/$DAY`` directory - structure so we can locate a specific day's log immediately. - -* Log reports generated by epylog: - Log reports generated by epylog are outputted to /srv/web/epylog/merged - - The reports are accessible via a web browser from https://admin.fedoraproject.org/epylog/merged/ - - This path requires a username and a password to access. To add your username - and password you must first join the sysadmin-logs group then login to - ``log01.phx2.fedoraproject.org`` and run this command:: - - htpasswd -m /srv/web/epylog/.htpasswd $your_username - - when prompted for a password please input a password which is NOT YOUR - FEDORA ACCOUNT SYSTEM PASSWORD. - -.. important:: - - Let's say that again to be sure you got it: - - DO _NOT_ HAVE THIS BE THE SAME AS YOUR FAS PASSWORD - -Configs: -======== - -Epylog configs are controlled by ansible - please see the ansible epylog -module for more details. Specifically the files in ``roles/epylog/files/merged/`` - - -Generating a one-off epylog report: ------------------------------------ -If you wish to generate a specific log report you will need to run the -following command on log01:: - - sudo /usr/sbin/epylog -c /etc/epylog/merged/epylog.conf --last 5h - -You can replace '5h' with other time measurements to control the amount of -time you want to view from the merged logs. This will mail a report -notification to all the people in the sysadmin-logs group. - - -Audit logs, centrally: ----------------------- -We've taken the audit logs and enabled our rsyslogd on the hosts to relay -the audit log contents to our central log server. - -Here's how we did that: - -1. modify the selinux policy so that rsyslogd can read the file(s) in - ``/var/log/audit/audit.log`` - - BEGIN Selinux policy module:: - - module audit_via_syslog 1.0; - - require { - type syslogd_t; - type auditd_log_t; - class dir { search }; - class file { getattr read open }; - - } - - #============= syslogd_t ============== - allow syslogd_t auditd_log_t:dir search; - allow syslogd_t auditd_log_t:file { getattr read open }; - - END selinux policy module - -2. add config to rsyslog on the clients to repeatedly send all changes - to their audit.log file to the central syslog server as local6:: - - # monitor auditd log and send out over local6 to central loghost - $ModLoad imfile.so - - # auditd audit.log - $InputFileName /var/log/audit/audit.log - $InputFileTag tag_audit_log: - $InputFileStateFile audit_log - $InputFileSeverity info - $InputFileFacility local6 - $InputRunFileMonitor - - then modify your emitter to the syslog server to send local6.* there - -3. on the syslog server - setup log destinations for: - - - merged audit logs of all hosts - explicitly drop any non-AVC audit message here) - magic exclude line is:: - - :msg, !contains, "type=AVC" ~ - - - that line must be directly above the log entry you want to filter - and it has a cascade effect on everything below it unless you - disable the filter - - - per-host audit logs - this is everything from audit.log - -4. On the syslog server - we can run audit2allow/audit2why on the audit logs - sent there by doing this:: - - grep 'hostname' /var/log/merged/audit.log | sed 's/^.*tag_audit_log: //' | audit2allow - - the sed is to remove the log prefix garbage from syslog transferring the msg - - -Future: -======= - -- additional log reports for errors from http processes or servers -- SEC - Simple Event Coordinator to report, immediately, on events from a - log stream - available in fedora/epel. -- New report modules within epylog diff --git a/docs/sops/taskotron.rst b/docs/sops/taskotron.rst deleted file mode 100644 index f5f8724..0000000 --- a/docs/sops/taskotron.rst +++ /dev/null @@ -1,142 +0,0 @@ -.. title: Taskotron SOP -.. slug: infra-taskotron -.. date: 2014-12-16 -.. taxonomy: Contributors/Infrastructure - -============== -taskotron SOP -============== - -run automated tasks to check items in Fedora - -Contact Information -=================== - -Owner - Fedora QA Devel, Fedora Infrastructure Team - -Contact - #fedora-qa, #fedora-admin, #fedora-noc - -Location - PHX2 - -Servers - - taskotron-dev01.qa - - taskotron-stg01.qa - - taskotron01.qa - - taskotron-client*.qa - -Purpose - run automated tasks on fedora components and report results - of those task runs - -Architecture -============ - -Taskotron is a system for running automated tasks in fedora based on incoming -signals (fedmsgs in our case). - -The system is made up of several components: - -- trigger - -- task execution system - this is a master/slave system, currently using buildbot - -- results storage (covered in the resultsdb SOP) - -- mirror task git repos - -Deploying the Taskotron Master -============================== - -The Taskotron master node is responsible for: - -1) listening for fedmsgs and scheduling tasks as appropriate - -2) coordinating execution of tasks on client nodes - -Before doing the initial deployment, a ssh keypair is needed for the taskotron -clients to check tasks out from the git mirror. Generate a password-less -keypair (needed for deploying clients) and put the contents of the public key -in the 'buildslave_ssh_pubkey' variable. - -When running the ``taskotron-[dev,stg,prod]`` group playbook for the first time, -it will fail partway through during buildmaster configuration because -buildmaster initialization is not part of the playbook (it fails if re-run and -no upgrade is needed). Once you hit that failure, run the following on the -taskmaster node as the buildmaster user, run:: - - buildbot upgrade-master /home/buildmaster/master - -After running the ``upgrade-master`` command, continue the playbook and it -should run to completion. - - -Deploying the Taskotron Clients -=============================== - -Before deploying the taskotron clients, get the host key of the taskotron -master and populate ``buildmaster_pubkey`` in the client's group_vars file. This -will make sure that git checkouts from the master node work without human -intervention. - -Deploying the Taskotron clients is a matter of running the proper group -playbook once the variable files are properly filled in. No additional -configuration is needed. - -Updating -======== - -This part of the SOP can also be used to idle taskotron - just skip the update -and reboot steps but turn off fedmsg-hub and shut down the buildslave -services. The buildslave and fedmsg-hub processes will need to be restarted to -un-idle the system but buildbot will restart anything that was running once the -buildslaves come back up. - -.. note:: it would be wise to update resultsdb while the taskotron system is not - processing jobs - that is covered in a separate SOP. - -There are multiple parts to updating Taskotron: clients, master and git mirrors. - -1. on a non-affected machine, run taskotron-trigger such that it records the -jobs that have been triggered -2. stop fedmsg-hub on the taskotron master so that no new jobs come in -3. wait for buildbot to become idle -4. run ``systemctl stop buildslave`` on all affected clients -5. run the ``update_grokmirror_repos.yml`` playbook on the system to update -6. update and reboot the master node -7. update and reboot the client nodes -8. start the buildslave process on all client nodes (they aren't set to start at boot) - -Once all affected machines are back up, verify that all services have come -back up cleanly and start the process of running any jobs which may have been -missed during the downtime: - -1. there will be a /var/log/taskotron-trigger/jobs.csv file containing jobs -which need to be run on the non-affected machine running taskotron-trigger -mentioned above. Copy the relevant contents of that file to the taskotron -master node as "newjobs.csv" (filename isn't important) -2. on the master node, run 'jobrunner newjobs.csv' - -If the jobs are submitted without error, the update process is done. - - -Backup -====== - -There are two major things which need to be backed up for Taskotron: job data -and the buildmaster database. - -The buildmaster database is a normal postgres dump from the database server. -The job data is stored on the taskotron master node in -/home/buildmaster/master/ directory. The files in 'master/' are not important -but all subdirectories outside of 'templates/' and 'public_html/' are. - -Restore from Backup -=================== - -To restore from backup, load the database dump and restore backed up files to -the provisioned master before starting the buildmaster service. - diff --git a/docs/sops/torrentrelease.rst b/docs/sops/torrentrelease.rst deleted file mode 100644 index 4e071a3..0000000 --- a/docs/sops/torrentrelease.rst +++ /dev/null @@ -1,70 +0,0 @@ -.. title: Torrent Releases Infrastructure SOP -.. slug: infra-torrent-releases -.. date: 2011-10-03 -.. taxonomy: Contributors/Infrastructure - -=================================== -Torrent Releases Infrastructure SOP -=================================== - - http://torrent.fedoraproject.org/ is our master torrent server for - Fedora distribution. It runs out of ServerBeach. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, sysadmin-torrent group -Location - ibiblio -Servers - torrent.fedoraproject.org -Purpose - Provides the torrent master server for Fedora distribution - -Torrent Release -=============== - -When you want to add a new torrent to the tracker at -[46]http://torrent.fedoraproject.org you need to take the following steps -to have it listed correctly: - -1. login to torrent.fedoraproject.org. If you are unable to do so please - contact the fedora infrastructure group about access. This procedure - requires membership in the torrentadmin group. - -2. upload the files you want to add to the torrent to - torrent.fedoraproject.org:/srv/torrent/new/$yourOrg/ - -3. use sha1sum and verify the file you have uploaded matches the source - -4. organize the files into subdirs (or not) as you would like - -5. run /srv/torrent/new/maketorrent [file-or-dir-to-torrent] - ([file-or-dir-to-torrent]) to generate a .torrent file or files - -6. copy the .torrent file(s) to: /srv/torrent/www/torrents/$yourOrg/ - -7. cd to /srv/torrent/torrent-generator/ or /srv/torrent/spins-generator/ - (depending on if it is an official release or spins release) - -8. add a .ini file in this directory for the content you'll be - torrenting. If you're not doing a normal Fedora release the filename - should in the brackets should be [$yourOrg/File-Of-1.1.torrent] — the - format of each section should be as follows:: - - [Zod-livecd-1-i386.torrent] - description=Fedora Core 6 Zod LiveCD 1 iso image for i386. - size=683M - releasedate=2006-12-22 - group=Fedora Core 6 Zod LiveCD 1 - -9. mv all files from /srv/torrent/new/$yourOrg into - /srv/torrent/btholding/ - this includes the files you uploaded as well - as the .torrent files you've created. - -Your files will be linked on the website and available on the tracker -after this. - diff --git a/docs/sops/unbound.rst b/docs/sops/unbound.rst deleted file mode 100644 index 430f95a..0000000 --- a/docs/sops/unbound.rst +++ /dev/null @@ -1,19 +0,0 @@ -.. title: Infrastructure Unbound SOP -.. slug: infra-unbound -.. date: 2013-11-22 -.. taxonomy: Contributors/Infrastructure - -========================== -Fedora Infra Unbound Notes -========================== - -Sometimes, especially after updates/reboots you will see alerts like this:: - - 18:46:55 < zodbot> PROBLEM - unbound-tummy01.fedoraproject.org/Unbound 443/tcp is WARNING: DNS WARNING - 0.037 seconds response time (dig returned an error status) (noc01) - 18:51:06 < zodbot> PROBLEM - unbound-tummy01.fedoraproject.org/Unbound 80/tcp is WARNING: DNS WARNING - 0.035 seconds response time (dig returned an error status) (noc01) - -To correct this, restart unbound on the relevant node (in the example case -above, unbound-tummy01), by running the restart_unbound Ansible playbook from -batcave01.:: - - sudo -i ansible-playbook /srv/web/infra/ansible/playbooks/restart_unbound.yml --extra-vars="target=unbound-tummy01.fedoraproject.org" diff --git a/docs/sops/virt-image.rst b/docs/sops/virt-image.rst deleted file mode 100644 index e2f7cd8..0000000 --- a/docs/sops/virt-image.rst +++ /dev/null @@ -1,72 +0,0 @@ -.. title: Infrastructure -.. slug: no-idea -.. date: 2015-07-09 -.. taxonomy: Contributors/Infrastructure -================================== -Fedora Infrastructure Kpartx Notes -================================== - -How to mount virtual partitions -=============================== - -There can be multiple reasons you need to work with the contents of a -virtual machine without that machine running. - -1. You have decommisioned the system and found you need to get something - that was not backed up. - -2. The system is for some reason unbootable and you need to change some - file to make it work. - -3. Forensics work of some sort. - -In the case of 1 and 2 the following commands and tools are -invaluable. In the case of 3, you should work with the Fedora Security -Team and follow their instructions completely. - -Steps to Work With Virtual System -================================= - -1. Find out what physical server the virtual machine image is on. - A. Log into batcave01.phx2.fedoraproject.org - - B. search for the hostname in the file /var/log/virthost-lists.out:: - - $ grep proxy01.phx2.fedoraproject.org /var/log/virthost-lists.out - virthost05.phx2.fedoraproject.org:proxy01.phx2.fedoraproject.org:running:1 - - C. If the image does not show up in the list then most likely it is - an image which has been decommissioned. You will need to search - the virtual hosts more directly. - - # for i in `awk -F: '{print $1}' /var/log/virthost-lists.out | - sort -u`; do - ansible $i -m shell -a 'lvs | grep proxy01.phx2' - done - -2. Log into the virtual server and make sure the image is shutdown. Even - in cases where the system is not working correctly it may have still - have a running qemu on the physical server. It is best to confirm that - the box is dead. - - # virsh destroy - -3. We will be using the kpartx command to make the guest image ready for - mounting. - - # lvs | grep - # kpartx -l /dev/mapper/- - # kpartx -a /dev/mapper/- - # vgscan - # vgchange -ay /dev/mapper/ - # mount /dev/mapper/ /mnt - -4. Edit the files as needed. - -5. Tear down the tree. - - # umount /mnt - # vgchange -an - # vgscan - # kpartx -d /dev/mapper/- - diff --git a/docs/sops/virt-notes.rst b/docs/sops/virt-notes.rst deleted file mode 100644 index d060c40..0000000 --- a/docs/sops/virt-notes.rst +++ /dev/null @@ -1,52 +0,0 @@ -.. title: Infrastucture libvirt tools SOP -.. slug: infra-libvirt -.. date: 2012-04-30 -.. taxonomy: Contributors/Infrastructure -=================================== -Fedora Infrastructure Libvirt Notes -=================================== - -Notes/FAQ on using libvirt/virsh/virt-manager in our environment - -how do I migrate a guest from one virthost to another -===================================================== - -multiple steps: - -1. setup an unpassworded root ssh key to allow communication between - the two virthosts as root. This is only temporary, so, while scary - it is not a big deal. Right now, this also means modifying - the ``/etc/ssh/sshd_config`` to ``permitroot without-password``. - -2. setup storage on the destination end to match the source storage. - If the path to the storage is not the same on both systems - (ie: not the same path into ``/dev/Guests00/myguest``) then take a copy - of the guest xml file from ``/etc/libvirt/qemu`` and modify it so it has - the right path. If you need to do this you need to add ``--xml thisfile.xml`` - to the arguments below AFTER the word 'migrate' - -3. as root on source location:: - - virsh -c qemu:///system migrate --p2p --tunnelled \ - --copy-storage-all myguest \ - qemu+ssh://root@destinationvirthost/system - - This should start the migration process and it will output absolutely - jack-squat on the cli for you to know this. On the destination system - go look in /var/log/libvirt/qemu/myguest.log (tail -f will show you the - progress results as a percentage completed) - - .. note:: - --p2p and --tunnelled are so it goes direct from one host to the other - but uses ssh. - -4. Once the migration is complete you will probably need to run this - on the new virthost:: - - virsh dumpxml myguest > /etc/libvirt/qemu/myguest.xml - virsh destroy myguest - virsh define /etc/libvirt/qemu/myguest.xml - virsh autostart myguest - virsh start myguest - - diff --git a/docs/sops/virtio.rst b/docs/sops/virtio.rst deleted file mode 100644 index e7ad09a..0000000 --- a/docs/sops/virtio.rst +++ /dev/null @@ -1,24 +0,0 @@ -.. title: Infrastructure virtio SIP -.. slug: infra-virtio -.. date: 2014-05-01 -.. taxonomy: Contributors/Infrastructure - -============ -virtio notes -============ - -We have found that virtio is faster/more stable than emulating other cards -on our VMs. - -To switch a VM to virtio: - -- Remove from DNS if it's a proxy -- Log into the vm and shut it down -- Log into the virthost that the VM is on, and `sudo virsh edit ` -- Add this line to the appropriate bridge interface(s):: - - - -- Save/quit the editor -- `sudo virsh start ` -- Re-add to DNS if it's a proxy diff --git a/docs/sops/voting.rst b/docs/sops/voting.rst deleted file mode 100644 index 42e6cb2..0000000 --- a/docs/sops/voting.rst +++ /dev/null @@ -1,214 +0,0 @@ -.. title: Voting and Elections Infrastructure SOP -.. slug: infra-voting -.. date: 2014-07-10 -.. taxonomy: Contributors/Infrastructure - -========================= -Voting Infrastructure SOP -========================= - -The live voting instance can be found at -https://admin.fedoraproject.org/voting and the staging instance at -https://admin.stg.fedoraproject.org/voting/ - -The code base can be found at -http://git.fedorahosted.org/git/?p=elections.git - -Contents -======== - -1. Contact Information -2. Creating a new election - - 1. Creating the election - 2. Adding Candidates - 3. Who can vote - -3. Modifying an Election - - 1. Changing the details of an Election - 2. Removing a candidate - 3. Releasing the results of an embargoed election - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin, elections -Location - PHX -Servers - elections0{1,2}, elections01.stg, db02 -Purpose - Provides a system for voting on Fedora matters - -Creating a new election -======================= - -Creating the elections ----------------------- - -* Log in - -* Go to "Admin" in the menu at the top, select "Create new election" and fill - in the form. - -* The "usefas" option results in candidate names being looked up as FAS - usernames an displayed as their real name. - -* An alias should be added when creating a new election as this is used in - the link on the page of listed elections on the frontpage. - -* Complete the election form: - - Alias - A short name for the election. It is the name that will be - used in the templates. - - ``Example: FESCo2014`` - - Summary - A simple name that will be used in the URLs and as in the - links in the application - - ``Example: FESCo elections 2014`` - - Description - A short description about the elections that will be - displayed above the choices in the voting page - - Type - Allow setting the types of elections (more on that below) - - Maxium Range/Votes - Allow setting options for some election type - (more on that below) - - URL - A URL pointing to more information about the election - - ``Example: the wiki page presenting the election`` - - Start Date - The Start of the elections (UTC) - - End Date - The Close of the elections (UTC) - - Number Elected - The number of seats that will be selected among the candidates after the election - - Candidates are FAS users? - Checkbox allowing integration between FAS account - and their names retrieved from FAS. - - Embargo results - If this is set then it will require manual intervention - to release the results of the election - - Legal voters groups - Used to restrict the votes to one or more FAS groups. - - Admin groups - Give admin rights on that election to one or more FAS groups - - -Adding Candidates -================= - -The list of all the elections can be found at voting/admin/ - -Click on the election of interest and and select "Add a candidate". - -Each candidate is added with a name and an URL. The name can be his/her FAS username -(interesting if the checkbox that candidates are FAS users has been checked when creating the calendar) or something else. - -The URL can be a reference to the wiki page where they nominated themselves. - -This will add extra candidates to the available list. - -Who can vote -============ - -If no 'Legal voters groups' have been defined when creating the election, the -election will be opened to anyone having signed the CLA and being in one -other group (commonly referred to CLA+1). - -Modifying an Election -===================== - -Changing the details of an Election - -.. note:: - this page can also be used to verify details of an election before it opens for voting. - -The list of all the elections can be found at ``/voting/admin/`` - -After finding the right election, click on it to have the overview and select -"Edit election" under the description. - -Edit a candidate -================ - -On the election overview page found via ``/voting/admin/`` (and clicking on the -election of interest), next to each candidate is an `[edit]` button allowing -the admins to edit the information relative to the candidate. - -Removing a candidate -==================== - -On the election overview page found via ``/voting/admin/`` (and clicking on the -election of interest), next to each candidate is an `[x]` button allowing -the admins to remove the candidatei from the election. - - -Releasing the results of an embargoed election -============================================== - -Visit the elections admin interface and edit the election to uncheck the -'Embargo results?' checkbox. - -Results -======= - -Admins have early access to the results of the elections (regardless of the -embargo status). - -The list of the closed elections can be found at /voting/archives. - -Find there the election of interest and click on the "Results" link in the -last column of the table. -This will show you the Results page included who was elected based on the -number of seats elected entered when creating the election. - -You may use these information to send out the results email. - -Legacy -====== - -.. note:: - The information below should now be included in the Results page (see above) - but I left them here in case. - -Other things you might need to query ------------------------------------- - -The current election software doesn't retrieve all of the information that -we like to include in our results emails. So we have to query the database -for the extra information. You can use something like this to retrieve the -total number of voters for the election:: - - ELECT e.id, e.shortdesc, COUNT(distinct v.voter) FROM elections AS e LEFT - JOIN votes AS v ON e.id=v.election_id WHERE e.shortdesc in ('FAmSCo - February - 2014') GROUP BY e.id, e.shortdesc; - - -You may also want to include the vote tally per candidate for convenience -when the FPL emails the election results:: - - SELECT e.id, e.shortdesc, c.name, c.novotes FROM elections AS e LEFT JOIN - fvotecount AS c ON e.id=c.election_id WHERE e.shortdesc in ('FAmSCo - February - 2014', 'FESCo - February 2014') ; - diff --git a/docs/sops/wiki.rst b/docs/sops/wiki.rst deleted file mode 100644 index bd32d39..0000000 --- a/docs/sops/wiki.rst +++ /dev/null @@ -1,41 +0,0 @@ -.. title: Wiki Infrastructure SOP -.. slug: infra-wiki -.. date: 2012-09-13 -.. taxonomy: Contributors/Infrastructure - -======================= -Wiki Infrastructure SOP -======================= - - Managing our wiki. - -Contact Information -=================== - -Owner - Fedora Infrastructure Team / Fedora Website Team -Contact - #fedora-admin or #fedora-websites on irc.freenode.net -Location: http - //fedoraproject.org/wiki/ -Servers - proxy[1-3] app[1-2,4] -Purpose - Provides our production wiki - -Description -=========== -Our wiki currently runs mediawiki. - -.. important:: - Whenever you changes anything on the wiki (bugfix, configuration, plugins, - ...), please update the page at https://fedoraproject.org/wiki/WikiChanges . - -Dealing with Spammers: -======================= -If you find a spammer is editing pages in the wiki do the following: - -1. admin disable their account in fas, add 'wiki spammer' as the comment -2. block their account in the wiki from editing any additional pages -3. go to the list of pages they've edited and rollback their changes - one by one. If there are many get someone to help you. - diff --git a/docs/sops/zodbot.rst b/docs/sops/zodbot.rst deleted file mode 100644 index 97d2fd8..0000000 --- a/docs/sops/zodbot.rst +++ /dev/null @@ -1,108 +0,0 @@ -.. title: Zodbot Infrastucture SOP -.. slug: infra-zodbot -.. date: 2014-12-18 -.. taxonomy: Contributors/Infrastructure - -========================= -Zodbot Infrastructure SOP -========================= - -zodbot is a supybot based irc bot that we use in our #fedora channels. - -Contents -======== - -1. Contact Information -2. Description -3. shutdown -4. startup -5. Processing interrupted meeting logs -6. Becoming an admin - -Contact Information -=================== - -Owner - Fedora Infrastructure Team -Contact - #fedora-admin -Location - Phoenix -Servers - value01 -Purpose - Provides our IRC bot - -Description -=========== - -zodbot is a supybot based irc bot that we use in our #fedora channels. -It runs on value01 as the daemon user. We do not config manage the -zodbot.conf because supybot makes changes to it on its own. Therefore it -gets backed up and is treated as data. - -shutdown - ``killall supybot`` - -startup - `` cd /srv/web/meetbot`` - # zodbot current needs to be started in the meetbot directory. - # This requirement will go away in a later meetbot release. - ``sudo -u daemon supybot -d /var/lib/zodbot/conf/zodbot.conf`` - -Startup issues -============== - -If the bot won't connect, with an error like:: - - "Nick/channel is temporarily unavailable" - -found in ``/var/lib/zodbot/logs/messages.log``, hop on Freenode (with your own -IRC client) and do the following:: - - /msg nickserv release zodbot [the password] - -The password can be found on the bot's host in -``/var/lib/zodbot/conf/zodbot.conf`` - -This should allow the bot to connect again. - -Processing interrupted meeting logs -=================================== - -zodbot forgets about meetings if they are in progress when the bot goes -down; therefore, the meetings never get processed. Users may request a -ticket in [52]our Trac instance to have meeting logs processed. - -Trac tickets for meeting log processing should consist of a URL where -zodbot had saved the log so far and an uploaded file containing the rest -of the log. The logs are stored in /srv/web/meetbot. Append the remainder -of the log uploaded to Trac (don't worry too much about formatting; -meeting.py works well with irssi- and XChat-like logs), then run:: - - sudo python /usr/lib/python2.7/site-packages/supybot/plugins/MeetBot/meeting.py replay /path/to/fixed.log.txt - -Close the Trac ticket, letting the user know that the logs are processed -in the same directory as the URL they gave you. - -Becoming an admin -================= - -Register with zodbot on IRC.:: - - /msg zodbot misc help register - -You have to identify to the bot to do any admin type commands, and you -need to have done so before anyone can give you privs. - -After doing this, ask in #fedora-admin on IRC and someone will grant you -privs if you need them. You'll likely be added to the admin group, which -has the following capabilities (the below snippet is from an IRC log -illustrating how to get the list of capabilities). - -:: - - 21:57 < nirik> .list admin - 21:57 < zodbot> nirik: capability add, capability remove, channels, ignore add, - ignore list, ignore remove, join, nick, and part - diff --git a/docs/sysadmin-guide/index.rst b/docs/sysadmin-guide/index.rst new file mode 100644 index 0000000..2e41183 --- /dev/null +++ b/docs/sysadmin-guide/index.rst @@ -0,0 +1,13 @@ + +.. _sysadmin-guide: + +========================== +System Administrator Guide +========================== +Welcome to the Fedora Infrastructure system administration guide. + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + sops/index diff --git a/docs/sysadmin-guide/sops/2-factor.rst b/docs/sysadmin-guide/sops/2-factor.rst new file mode 100644 index 0000000..c127df6 --- /dev/null +++ b/docs/sysadmin-guide/sops/2-factor.rst @@ -0,0 +1,102 @@ +.. title: Two Factor Auth +.. slug: fas-two-factor +.. date: 2013-09-19 updated: 2016-03-11 +.. taxonomy: Contributors/Infrastructure + +=============== +Two factor auth +=============== + +Fedora Infrastructure has implemented a form of two factor auth for people who +have sudo access on Fedora machines. In the future we may expand this to +include more than sudo but this was deemed to be a high value, low hanging +fruit. + +---------------- +Using two factor +---------------- + +http://fedoraproject.org/wiki/Infrastructure_Two_Factor_Auth + +To enroll a Yubikey, use the fedora-burn-yubikey script like normal. +To enroll using FreeOTP or Google Authenticator, go to +https://admin.fedoraproject.org/totpcgiprovision + +What's enough authentication? +============================= +FAS Password+FreeOTP or FAS Password+Yubikey +Note: don't actually enter a +, simple enter your FAS Password and press your +yubikey or enter your FreeOTP code. + +--------------------------------------------- +Administrating and troubleshooting two factor +--------------------------------------------- + +Two factor auth is implemented by a modified copy of the +https://github.com/mricon/totp-cgi project doing the authentication and +pam_url submitting the authentication tokens. + +totp-cgi runs on the fas servers (currently fas01.stg and fas01/fas02/fas03 in +production), listening on port 8443 for pam_url requests. + +FreeOTP, Google authenticator and yubikeys are supported as tokens to use with +your password. + +FreeOTP, Google authenticator: +============================== + +FreeOTP application is preferred, however Google authenticator works as well. +(Note that Google authenticator is not open source) + +This is handled via totpcgi. There's a command line tool to manage users, +totpprov. See 'man totpprov' for more info. Admins can use this tool to revoke +lost tokens (google authenticator only) with 'totpprov delete-user username' + +To enroll using FreeOTP or Google Authenticator for production machines, go to +https://admin.fedoraproject.org/totpcgiprovision + +To enroll using FreeOTP or Google Authenticator for staging machines, go to +https://admin.stg.fedoraproject.org/totpcgiprovision/ + +You'll be prompted to login with your fas username and password. + +Note that staging and production differ. + +YubiKeys: +========= + +Yubikeys are enrolled and managed in FAS. Users can self-enroll using the +fedora-burn-yubikey utility included in the fedora-packager package. + +What do I do if I lose my token? +================================ +Send an email to admin@fedoraproject.org that is encrypted/signed with your +gpg key from FAS, or otherwise identifies you are you. + +How to remove a token (so the user can re-enroll)? +================================================== +First we MUST verify that the user is who they say they are, using any of the +following: + +- Personal contact where the person can be verified by member of + sysadmin-main. + +- Correct answers to security questions. + +- Email request to admin@fedoraproject.org that is gpg encrypted by the key + listed for the user in fas. + +Then: + +1. For google authenticator, login to one of the fas machines and run: +sudo totpprov delete-user username + +2. For yubikey: login to one of the fas machines and run: +/usr/local/bin/yubikey-remove.py username + +The user can then go to https://admin.fedoraproject.org/totpcgiprovision/ +and reprovision a new device. + +If the user emails admin@fedoraproject.org with the signed request, make sure +to reply to all indicating that a reset was performed. This is so that other +admins don't step in and reset it again after its been reset once. diff --git a/docs/sysadmin-guide/sops/accountdeletion.rst b/docs/sysadmin-guide/sops/accountdeletion.rst new file mode 100644 index 0000000..7d31123 --- /dev/null +++ b/docs/sysadmin-guide/sops/accountdeletion.rst @@ -0,0 +1,278 @@ +.. title: Account Deletion SOP +.. slug: infra-fas-account-deletion +.. date: 2013-05-08 +.. taxonomy: Contributors/Infrastructure + +==================== +Account Deletion SOP +==================== + +For the most part we do not delete accounts. In the case that a deletion +is paramount, it will need to be coordinated with appropriate entities. + +Disabling accounts is another story but is limited to those with the +appropriate privileges. Reasons for accounts to be disabled can be one of +the following: + + * Person has placed SPAM on the wiki or other sites. + * It is seen that the account has been compromised by a third party. + * A person wishes to leave the Fedora Project and wants the account + disabled. + +Contents +-------- + +* Disabling + + - Disable Accounts + - 1.2 Disable Groups + +* User Requested disables + +* Renames + + - Rename Accounts + - Rename Groups + +* Deletion + + - Delete Accounts + - Delete Groups + + +Disable +======= + +Disabling accounts is the easiest to accomplish as it just blocks people +from using their account. It does not remove the account name and +associated UID so we don't have to worry about future, unintentional +collisions. + +Disable Accounts +---------------- + +To begin with, accounts should not be disabled until there is a ticket in +the Infrastructure ticketing system. After that the contents inside the +ticket need to be verified (to make sure people aren't playing pranks or +someone is in a crappy mood). This needs to be logged in the ticket (who +looked, what they saw, etc). Then the account can be disabled.:: + + ssh db02 + sudo -u postgres pqsql fas2 + + fas2=# begin; + fas2=# select * from people where username = 'FOOO'; + + +Here you need to verify that the account looks right, that there is only +one match, or other issues. If there are multiple matches you need to +contact one of the main sysadmin-db's on how to proceed.:: + + fas2=# update people set status = 'admin_disabled' where username = 'FOOO'; + fas2=# commit; + fas2=# /q + +Disable Groups +-------------- + +There is no explicit way to disable groups in FAS2. Instead, we close the +group for adding new members and optionally remove existing members from +it. This can be done from the web UI if you are an administrator of the +group or you are in the accounts group. First, go to the group info page. +Then click the (edit) link next to Group Details. Make sure that the +Invite Only box is checked. This will prevent other users from requesting +the group on their own. + +If you want to remove the existing users, View the Group info, then click +on the View Member List link. Click on All under the Results heading. Then +go through and click on Remove for each member. + +Doing this in the database instead can be quicker if you have a lot of +people to remove. Once again, this requires someone in sysadmin-db to do +the work:: + + ssh db02 + sudo -u postgres pqsql fas2 + + fas2=# begin; + fas2=# update group, set invite_only = true where name = 'FOOO'; + fas2=# commit; + fas2=# begin; + fas2=# select p.name, g.name, r.role_status from people as p, person_roles as r, groups as g + where p.id = r.person_id and g.id = r.group_id + and g.name = 'FOOO'; + fas2=# -- Make sure that the list of users in the groups looks correct + fas2=# delete from person_roles where person_roles.group_id = (select id from groups where g.name = 'FOOO'); + fas2=# -- number of rows in both of the above should match + fas2=# commit; + fas2=# /q + +User Requested Disables +======================= + +According to our Privacy Policy, a user may request that their personal +information from FAS if they want to disable their account. We can do this +but need to do some extra work over simply setting the account status to +disabled. + +Record User's CLA information +----------------------------- + +If the user has signed the CLA/FPCA, then they may have contributed something +to Fedora that we'll need to contact them about at a later date. For that, we +need to keep at least the following information: + +* Fedora username +* human name +* email address + +All of this information should be on the CLA email that is sent out when a +user signs up. We need to verify with spot (Tom Callaway) that he has that +record. If not, we need to get it to him. Something like:: + + select id, username, human_name, email, telephone, facsimile, postal_address from people where username = 'USERNAME'; + +and send it to spot to keep. + +Remove the personal information +------------------------------- + +The following sequence of db commands should do it:: + + fas2=# begin; + fas2=# select * from people where username = 'USERNAME'; + +Here you need to verify that the account looks right, that there is only +one match, or other issues. If there are multiple matches you need to +contact one of the main sysadmin-db's on how to proceed.:: + + fas2=# update people set human_name = '', gpg_keyid = null, ssh_key = null, unverified_email = null, comments = null, postal_address = null, telephone = null, facsimile = null, affiliation = null, ircnick = null, status = 'inactive', locale = 'C', timezone = null, latitude = null, longitude = null, country_code = null, email = 'disabled1@fedoraproject.org' where username = 'USERNAME'; + +Make sure only one record was updated:: + + fas2=# select * from people where username = 'USERNAME'; + +Make sure the correct record was updated:: + + fas2=# commit; + +.. note:: The email address is both not null and unique in the database. Due + to this, you need to set it to a new string for every user who requests + deletion like this. + +Renames +======= +In general, renames do not require as much work as deletions but they +still require coordination. This is because renames do not change the +UID/GID but some of our applications save information based on +username/groupname rather than UID/GID. + +Rename Accounts +--------------- + +.. warning:: Needs more eyes + This list may not be complete. + +* Check the databases for koji, pkgdb, and bodhi for occurrences of the + old username and update them to the new username. +* Check fedorapeople.org for home directories and yum repositories under + the old username that would need to be renamed +* Check (or ask the user to check and update) mailing list subscriptions + on fedorahosted.org and lists.fedoraproject.org under the old + username@fedoraproject.org email alias +* Check whether the user has a username@fedoraproject.org bugzilla + account in python-fedora and update that. Also ask the user to update + that in bugzilla. +* If the user is in a sysadmin-* group, check for home directories on + bastion and other infrastructure boxes that are owned byt them and + need to be renamed (Could also just tell the user to backup any files + there themselves b/c they're getting a new home directory). +* grep through ansible for occurrences of the username +* Check for entries in trac on fedorahosted.org for the username as an + "Assigned to" or "CC" entry. +* Add other places to check here + +Rename Groups +------------- + +.. warning:: Needs more eyes + This list may not be complete. + +* grep through ansible for occurrences of the group name. +* Check for group-members,group-admins,group-sponsors@fedoraproject.org + email alias presence in any fedorahosted.org or + lists.fedoraproject.org mailing list +* Check for entries in trac on fedorahosted.org for the username as an + "Assigned to" or "CC" entry. +* Add other places to check here + +Deletion +======== + +Deletion is the toughest one to audit because it requires that we look +through our systems looking for the UID and GID in addition to looking for +the username and password. The UID and GID are used on things like +filesystem permissions so we have to look there as well. Not catching +these places may lead to security issus should the UID/GID ever be reused. + +.. note:: Recommended to rename instead + When not strictly necessary to purge all traces of an account, it's + highlyrecommended to rename the user or group to something like + DELETED_oldusername instead of deleting. This avoids the problems and + additional checking that we have to do below. + +Delete Accounts +--------------- + +.. warning:: Needs more eyes + This list may be incomplete. Needs more people to look at this and find + places that may need to be updated + +* Check everything for the #Rename Accounts case. +* Figure out what boxes a user may have had access to in the past. This + means you need to look at all the groups a user may ever have been + approved for (even if they are not approved for those groups now). For + instance, any git*, svn*, bzr*, hg* groups would have granted access + to hosted03 and hosted04. packager would have granted access to + pkgs.fedoraproject.org. Pretty much any group grants access to + fedorapeople.org. +* For those boxes, run a find over the files there to see if the UID + owns any files on the system:: + + # find / -uid 100068 -print + + Any files owned by that uid must be reassigned to another user or + removed. + +.. warning:: What to do about backups? + Backups pose a special problem as they may contain the uid that's being + removed. Need to decide how to handle this + +* Add other places to check here + +Delete Groups +------------- + +.. warning:: Needs more eyes + This list may be incomplete. Needs more people to look at this and find + places that may need to be updated + +* Check everything for the #Rename Groups case. +* Figure out what boxes may have had files owned by that group. This + means that you'd need to look at the users in that group, what boxes + they have shell accounts on, and then look at those boxes. groups used + for hosted would also need to add hosted03 and hosted04 to that list + and the box that serves the hosted mailing lists. +* For those boxes, run a find over the files there to see if the GID + owns any files on the system:: + + # find / -gid 100068 -print + + Any files owned by that GID must be reassigned to another group or + removed. + +.. warning:: What to do about backups? + Backups pose a special problem as they may contain the gid that's being + removed. Need to decide how to handle this + +* Add other places to check here diff --git a/docs/sysadmin-guide/sops/anitya.rst b/docs/sysadmin-guide/sops/anitya.rst new file mode 100644 index 0000000..7e01e29 --- /dev/null +++ b/docs/sysadmin-guide/sops/anitya.rst @@ -0,0 +1,155 @@ +.. title: Anitya Infrastructure SOP +.. slug: infra-anitya +.. date: 2016-11-30 +.. taxonomy: Contributors/Infrastructure + +========================= +Anitya Infrastructure SOP +========================= + +Anitya is used by Fedora to track upstream project releases and maps them +to downstream distribution packages, including (but not limited to) Fedora. + +Anitya production instance: https://release-monitoring.org + +Anitya project page: https://github.com/fedora-infra/anitya + +Contents +======== + +1. Contact Information +2. Building and Deploying a Release +3. Administrating release-monitoring.org + + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, #fedora-apps +Persons + pingou, jcline +Location + ? +Servers + anitya-backend01.vpn.fedoraproject.org + anitya-frontend01.vpn.fedoraproject.org +Purpose + Map upstream releases to Fedora packages. + +Hosts +===== +The current deployment is made up of two hosts, anitya-backend01 and +anitya-frontend01. + +anitya-frontend01 +----------------- +This host runs: + +- The apache/mod_wsgi application for release-monitoring.org + +- A fedmsg-relay instance for anitya's local fedmsg bus + +This host relies on: + +- A postgres db server running on anitya-backend01 + +- Lots of external third-party services. The anitya webapp can scrape + pypi, rubygems.org, sourceforge and many others on command. + +Things that rely on this host: +- The Fedora Infrastructure bus subscribes to the anitya bus published + here by the local fedmsg-relay daemon at tcp://release-monitoring.org:9940 + +- the-new-hotness is a fedmsg-hub plugin running in FI on hotness01. It + listens for anitya messages from here and performs actions on koji and + bugzilla. + +- anitya-backend01 expects to publish fedmsg messages via + anitya-frontend01's fedmsg-relay daemon. Access should be restricted by + firewall. + +anitya-backend01 +---------------- +This is responsible for running the anitya backend cronjobs. It also is +the host for the Anitya PostgreSQL database server. + +The services and jobs on this host are: +- A cronjob that retrieves all projects from the PostgreSQL database and + checks the upstream project to see if there's a new version. This is run + every 12 hours. + +- A PostgreSQL database server to be used by that cron job and by + anitya-frontend01. + +- A database backup job that runs daily. Database dumps are available at + `the normal database dump location + `_. + +This host relies on: +- The fedmsg-relay daemon running on anitya-frontend01. + +- Lots of external third-party services. The cronjob makeall kinds of + requests out to the Internet that can fail in various ways. + +Things that rely on this host: +- The webapps running on anitya-frontend01 relies on the postgres db + server running on this node. + + +Releasing +========= + +The first step to making a new release is creating a Git tag for the release. + +Building +-------- +After `upstream `_ tags a new release in Git, a new +release can be built. The specfile is stored in the `Anitya repository +`_. Refer to the +`Infrastructure repo SOP `_ +to learn how to build the RPM. + +Deploying +--------- +At the moment, there is no staging deployment of Anitya. + +Once the new version is built, it needs to be deployed. To deploy the new version, you need +`ssh access `_ to +batcave01.phx2.fedoraproject.org and `permissions to run the Ansible playbook +`_. + +All the following commands should be run from batcave01. + +Configuration +^^^^^^^^^^^^^ +First, ensure there are no configuration changes required for the new update. If there are, +update the Ansible anitya role(s) and run the deployment playboook:: + + $ sudo rbac-playbook groups/anitya.yml + +Packages +^^^^^^^^ +Both anitya-backend01 and anitya-frontend01 need the new package. To upgrade, run +the upgrade playbook:: + + $ sudo rbac-playbook manual/upgrade/anitya.yml + +This will upgrade the anitya package, perform any database migrations with Alembic, +and restart the Apache web server. + +Congratulations! The new version should now be deployed. + + +Administrating release-monitoring.org +===================================== +Anitya offers some tools to administer the web application itself. These are useful +for when users accidentally create duplicate projects, versions found get messed up, +etc. + +Flags +^^^^^ +Anitya lets users flag projects for administrator attention. This is accessible to +administrators in the `flags tab `_. diff --git a/docs/sysadmin-guide/sops/ansible.rst b/docs/sysadmin-guide/sops/ansible.rst new file mode 100644 index 0000000..f3ecc96 --- /dev/null +++ b/docs/sysadmin-guide/sops/ansible.rst @@ -0,0 +1,175 @@ +.. title: Ansible Infrastructure SOP +.. slug: infra-ansible +.. date: 2015-03-03 +.. taxonomy: Contributors/Infrastructure + +======================================= +Ansible infrastructure SOP/Information. +======================================= + +Background +========== + +Fedora infrastructure used to use func and puppet for system change management. +We are now using ansible for all system change mangement and ad-hoc tasks. + +Overview +======== + +Ansible runs from batcave01 or backup01. These hosts run a ssh-agent that +has unlocked the ansible root ssh private key. (This is unlocked manually +by a human with the passphrase each reboot, the passphrase itself is not +stored anywhere on the machines). Using 'sudo -i' sysadmin-main members +can use this agent to access any machines with the ansible root ssh public +key setup, either with 'ansible' for one-off commands or 'ansible-playbook' +to run playbooks. + +Playbooks are idempotent (or should be). Meaning you should be able to re-run +the same playbook over and over and it should get to a state where 0 items +are changing. + +Additionally (see below) there is a rbac wrapper that allows members of some +other groups to run playbooks against specific hosts. + +git repo(s) +----------- + +There are 2 git repositories associated with ansible: + +/git/ansible on batcave01. + This is a public repository. Never commit private data to this repo. + You can access it also via a cgit web interface at: + https://infrastructure.fedoraproject.org/cgit/ansible.git/ + You can check it out on batcave01 with: 'git clone /git/ansible' + You can also use it remotely if you have your ssh set to proxy your access + via bastion01: ``git clone ssh://batcave01/git/ansible`` + + Users in the 'sysadmin' group have commit access to this repo. + All commits are emailed to 'sysadmin-members' as well as announced + on IRC in #fedora-noc. + +/git/ansible-private on batcave01. + This is a private repository for passwords and other sensitive data. + It is not available in cgit, nor should it be cloned or copied remotely. + It's only available to members of 'sysadmin-main'. + +Cron job/scheduled runs +----------------------- + +With use of run_ansible-playbook_cron.py that is run daily via cron we walk through +playbooks and run them with `--check --diff` params to perform a dry-run. + +This way we make sure all the playbooks are idempotent and there is no +unexpected changes on servers (or playbooks). + +Logging +------- + +We have in place a callback plugin that stores history for any ansible-playbook runs +and then sends a report each day to sysadmin-logs-members with any CHANGED or FAILED +actions. Additionally, there's a fedmsg plugin that reports start and end of ansible +playbook runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of when +and what commands and playbooks were run. + +role based access control for playbooks +--------------------------------------- + +There's a wrapper script on batcave01 called 'rbac-playbook' that allows non sysadmin-main +members to run specific playbooks against specific groups of hosts. This is part of the +ansible_utils package. The upstream for ansible_utils is: https://bitbucket.org/tflink/ansible_utils + +To add a new group: + +1. add the playbook name and sysadmin group to the rbac-playbook (ansible-private repo) +2. add that sysadmin group to sudoers on batcave01 (also in ansible-private repo) + +To use the wrapper:: + +sudo rbac-playbook playbook.yml + +Directory setup +================ + +Inventory +--------- + +The inventory directory tells ansible all the hosts that are managed by it and +the groups they are in. All files in this dir are concatenated together, so you +can split out groups/hosts into separate files for readability. They are in ini +file format. + +Additionally under the inventory directory are host_vars and group_vars subdirectories. +These are files named for the host or group and containing variables to set +for that host or group. You should strive to set variables in the highest level +possible, and precedence is in: global, group, host order. + +Vars +---- + +This directory contains global variables as well as OS specific variables. Note that +in order to use the OS specific ones you must have 'gather_facts' as 'True' or ansible +will not have the facts it needs to determine the OS. + +Roles +----- + +Roles are a collection of tasks/files/templates that can be used on any host or group +of hosts that all share that role. In other words, roles should be used except in cases +where configuration only applies to a single host. Roles can be reused between hosts and +groups and are more portable/flexable than tasks or specific plays. + +Scripts +------- + +In the ansible git repo under scripts are a number of utilty scripts for sysadmins. + +Playbooks +--------- + +In the ansible git repo there's a directory for playbooks. The top level contains +utility playbooks for sysadmins. These playbooks perform one-off functions or gather +information. Under this directory are hosts and groups playbooks. These playbooks are +for specific hosts and groups of hosts, from provision to fully configured. You should +only use a host playbook in cases where there will never be more than one of that thing. + +Tasks +----- + +This directory contains one-off tasks that are used in playbooks. Some of these should +be migrated to roles (we had this setup before roles existed in ansible). Those that +are truely only used on one host/group could stay as isolated tasks. + +Syntax +------ + +Ansible now warns about depreciated syntax. Please fix any cases you see related to +depreciation warnings. + +Templates use the jinja2 syntax. + +Libvirt virtuals +================ +* TODO: add steps to make new libvirt virtuals in staging and production +* TODO: merge in new-hosts.txt + +Cloud Instances +=============== +* TODO: add how to make new cloud instances +* TODO: merge in from ansible README file. + +rdiff-backups +============= +see: https://infrastructure.fedoraproject.org/infra/docs/rdiff-backup.rst + +Additional Reading/Resources +============================ + +Upstream docs: + https://docs.ansible.com/ + +Example repo with all kinds of examples: + * https://github.com/ansible/ansible-examples + * https://gist.github.com/marktheunissen/2979474 + +Jinja2 docs: + http://jinja.pocoo.org/docs/ diff --git a/docs/sysadmin-guide/sops/apps-fp-o.rst b/docs/sysadmin-guide/sops/apps-fp-o.rst new file mode 100644 index 0000000..571d2cf --- /dev/null +++ b/docs/sysadmin-guide/sops/apps-fp-o.rst @@ -0,0 +1,38 @@ +.. title: apps.fedoraproject.org SOP +.. slug: infra-apps-fp-o +.. date: 2014-06-29 +.. taxonomy: Contributors/Infrastructure + +apps-fp-o SOP +============= + +Updating and maintaining the landing page at https://apps.fedoraproject.org/ + +Contact Information +------------------- + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-apps, #fedora-admin +Servers: + proxy0* +Purpose: + Have a nice landing page for all our webapps. + +Description +----------- + +We have a number of webapps, many of which our users don't know about. This +page was created so there was a central place where users could stumble +through them and learn. + +The page is generated by a ansible role in ansible/roles/apps-fp-o/ +It makes use of an RPM package, the source code for which is at +https://github.com/fedora-infra/apps.fp.o + +You can update the page by updating the apps.yaml file in that ansible +module. + +When ansible is run next, the two ansible handlers should see your +changes and regenerate the static html and json data for the page. diff --git a/docs/sysadmin-guide/sops/archive-old-fedora.rst b/docs/sysadmin-guide/sops/archive-old-fedora.rst new file mode 100644 index 0000000..f23f363 --- /dev/null +++ b/docs/sysadmin-guide/sops/archive-old-fedora.rst @@ -0,0 +1,81 @@ +.. title: How to Archive Old Fedora Releases. +.. slug: archive-old-fedora +.. date: 2016-04-08 updated: 2016-04-08 +.. taxonomy: Releng/Infrastructure + +==================================== + How to Archive Old Fedora Releases +==================================== + +The Fedora download servers contain terabytes of data, and to allow +for mirrors to not have to take all of that data, infrastructure +regularly moves data of end of lifed releases (from /pub/fedora/linux) +to the archives section (/pub/archive/fedora/linux) + +Steps Involved +============== + +1. log into batcave01.phx2.fedoraproject.org and ssh to bodhi-backend01 + +2. Then change into the releases directory. + + cd /pub/fedora/linux/releases + +4. Check to see that the target directory doesn't already exist. + + ls /pub/archive/fedora/linux/releases/ + +5. If the target directory does not already exist, do a recursive link + copy of the tree you want to the target + + cp -lvpnr 21 /pub/archive/fedora/linux/releases/21 + +6. If the target directory already exists, then we need to do a + recursive rsync to update any changes in the trees since the + previous copy. + + rsync -avSHP --delete ./21/ /pub/archive/fedora/linux/releases/21/ + +7. We now do the updates and updates/testing in similar ways. + + cd ../updates/ + cp -lpnr 21 /pub/archive/fedora/linux/updates/21 + cd testing + cp -lpnr 21 /pub/archive/fedora/linux/updates/testing/21 + + cd ../updates/ + rsync -avSHP 21/ /pub/archive/fedora/linux/updates/21/ + cd testing + rsync -avSHP 21/ /pub/archive/fedora/linux/updates/testing/21/ + +8. Announce to the mirror list this has been done and that in 2 weeks + you will move the old trees to archives. + +9. In two weeks, log into mm-backend01 and run the archive script + + sudo -u mirrormanager mm2_move-to-archive --originalCategory="Fedora Linux" + \--archiveCategory="Fedora Archive" --directoryRe='/21/Everything' + +10. If there are problems, the postgres DB may have issues and so you need to + get a DBA to update the backend to fix items. + +11. Wait an hour or so then you can remove the files from the main tree. + + ssh bodhi-backend01 + cd /pub/fedora/linux + cd releases/21 + ls # make sure you have stuff here + rm -rf * + ln ../20/README . + cd ../../updates/21 + ls # make sure you have stuff here + rm -rf * + ln ../20/README . + cd ../testing/21 + ls # make sure you have stuff here + rm -rf * + ln ../20/README . + + + + diff --git a/docs/sysadmin-guide/sops/arm.rst b/docs/sysadmin-guide/sops/arm.rst new file mode 100644 index 0000000..beb3118 --- /dev/null +++ b/docs/sysadmin-guide/sops/arm.rst @@ -0,0 +1,199 @@ +.. title: Fedora ARM Infrastructure +.. slug: infra-arm +.. date: 2015-03-24 +.. taxonomy: Contributors/Infrastructure + +========================= +Fedora ARM Infrastructure +========================= + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main, sysadmin-releng +Location + Phoenix +Servers + arm01, arm02, arm03, arm04 +Purpose + Information on working with the arm SOCs + +Description +=========== + +We have 4 arm chassis in phx2, each containing 24 SOCs (System On Chip). + +Each chassis has 2 physical network connections going out from it. +The first one is used for the management interface on each SOC. +The second one is used for eth0 for each SOC. + +Current allocations (2016-03-11): + +arm01 + one retrace instance, the rest primary builders attached to koji.fedoraproject.org +arm02 + primary arch builders attached to koji.fedoraproject.org +arm03 + In cloud network, public qa/packager and copr instances +arm04 + primary arch builders attached to koji.fedoraproject.org + +Hardware Configuration +======================= + +Each SOC: + +* Has eth0 and eth1 (unused) and a management interface. +* has 4 cores +* Has 4GB ram +* Has a 300GB disk + +SOCs are addressed by:: + + arm{Chassisnumber}-builder{number}.arm.fedoraproject.org + +Where Chassisnumber is 01 to 04 +and +number is 00-23 + +PXE installs +============ +Kickstarts for the machines are in the kickstarts repo. + +PXE config is on noc01. (or cloud-noc01.cloud.fedoraproject.org for arm03) + +The kickstart installs the latests Fedora and sets them up with a base package set. + +IPMI tool Management +==================== + +The SOCs are managed via their mgmt interfaces using a custom ipmitool +as well as a custom python script called 'cxmanage'. The ipmitool changes +have been submitted upstream and cxmanage is under review in Fedora. + +The ipmitool is currently installed on noc01 and it has ability to +talk to them on their management interface. noc01 also serves dhcp and +is a pxeboot server for the SOCs. + +However you will need to add it to your path:: + + export PATH=$PATH:/opt/calxeda/bin/ + +Some common commands: + +To set the SOC to boot the next time only with pxe:: + + ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org chassis bootdev pxe + +To set the SOC power off:: + + ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org power off + +To set the SOC power on:: + + ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org power on + +To get a serial over lan console from the SOC:: + + ipmitool -U admin -P thepassword -H arm03-builder11-mgmt.arm.fedoraproject.org -I lanplus sol activate + +DISK mapping +============ + +Each SOC has a disk. They are however mapped to the internal 00-23 in a non +direct manner:: + + HDD Bay EnergyCard SOC (Port 1) SOC Num + 0 0 3 03 + 1 0 0 00 + 2 0 1 01 + 3 0 2 02 + 4 1 3 07 + 5 1 0 04 + 6 1 1 05 + 7 1 2 06 + 8 2 3 11 + 9 2 0 08 + 10 2 1 09 + 11 2 2 10 + 12 3 3 15 + 13 3 0 12 + 14 3 1 13 + 15 3 2 14 + 16 4 3 19 + 17 4 0 16 + 18 4 1 17 + 19 4 2 18 + 20 5 3 23 + 21 5 0 20 + 22 5 1 21 + 23 5 2 22 + +Looking at the system from the front, the bay numbering starts from left to +right. + +cxmanage +======== + +The cxmanage tool can be used to update firmware or gather diag info. + +Until cxmanage is packaged, you can use it from a python virtualenv:: + + virtualenv --system-site-packages cxmanage + cd cxmanage + source bin/activate + pip install --extra-index-url=http://sources.calxeda.com/python/packages/ cxmanage + + deactivate + +Some cxmanage commands + +:: + + cxmanage sensor arm03-builder00-mgmt.arm.fedoraproject.org + Getting sensor readings... + 1 successes | 0 errors | 0 nodes left | . + + MP Temp 0 + arm03-builder00-mgmt.arm.fedoraproject.org: 34.00 degrees C + Minimum : 34.00 degrees C + Maximum : 34.00 degrees C + Average : 34.00 degrees C + ... (and about 20 more sensors)... + +:: + + cxmanage info arm03-builder00-mgmt.arm.fedoraproject.org + Getting info... + 1 successes | 0 errors | 0 nodes left | . + + [ Info from arm03-builder00-mgmt.arm.fedoraproject.org ] + Hardware version : EnergyCard X04 + Firmware version : ECX-1000-v2.1.5 + ECME version : v0.10.2 + CDB version : v0.10.2 + Stage2boot version : v1.1.3 + Bootlog version : v0.10.2 + A9boot version : v2012.10.16-3-g66a3bf3 + Uboot version : v2013.01-rc1_cx_2013.01.17 + Ubootenv version : v2013.01-rc1_cx_2013.01.17 + DTB version : v3.7-4114-g34da2e2 + +firmware update:: + + cxmanage --internal-tftp 10.5.126.41:6969 --all-nodes fwupdate package ECX-1000_update-v2.1.5.tar.gz arm03-builder00-mgmt.arm.fedoraproject.org + +(note that this runs against the 00 management interface for the chassis and +updates all the nodes), and that we must run a tftpserver on port 6969 for +firewall handling. + +Links +====== +http://sources.calxeda.com/python/packages/cxmanage/ + +Contacts +========= +help.desk@boston.co.uk is the contact to send repair requests to. diff --git a/docs/sysadmin-guide/sops/askbot.rst b/docs/sysadmin-guide/sops/askbot.rst new file mode 100644 index 0000000..29742a1 --- /dev/null +++ b/docs/sysadmin-guide/sops/askbot.rst @@ -0,0 +1,315 @@ +.. title: Ask Fedora SOP +.. slug: infra-ask-fedora +.. date: 2015-03-28 +.. taxonomy: Contributors/Infrastructure + +============== +Ask Fedora SOP +============== + +To set up http://ask.fedoraproject.org based on Askbot as a question and +answer support forum for the Fedora community. A devel instance could be +seen at http://ask01.dev.fedoraproject.org and the staging instance is at +http://ask.stg.fedoraproject.org/ + +This page describes how to set up and customize it from scratch. + +Contents +======== + +1. Contact Information +2. Creating database +3. Setting up the forum +4. Adding administrators +5. Change settings within the forum +6. Database tweaks +7. Debugging + + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Persons + mether pjp +Sponsor + nirik +Location + phx2 +Servers + ask01 , ask01.stg, ask01.dev +Purpose + To host Ask Fedora + + +Creating database +================= + +We use the postgresql database backend. To add the database to a +postgresql server:: + + # psql -U postgres + postgres# create user askfedora with password 'xxx'; + postgres# create database askfedora; + postgres# ALTER DATABASE askfedora owner to askfedora; + postgres# \q; + +Now setup the db tables if this is a new install:: + + python manage.py syncdb + python manage.py migrate askbot + python manage.py migrate django_authopenid #embedded login application + + +Setting up the forum +==================== + +Askbot is packaged and available in Rawhide, Fedora 16 and EPEL 6. On a +RHEL 6 system, you need to install EPEL 6 repo first.:: + + # yum install askbot + +The /etc/askbot/sites/ask/conf/settings.py file should look something +like:: + + DATABASE_ENGINE = 'postgresql_psycopg2' + DATABASE_NAME = 'testaskbot' + DATABASE_USER = 'askbot' + DATABASE_PASSWORD = 'xxxxx' + DATABASE_HOST = '127.0.0.1' + DATABASE_PORT = '5432' + + # Outgoing mail server settings + # + DEFAULT_FROM_EMAIL = 'askfedora@fedoraproject.org' + EMAIL_SUBJECT_PREFIX = '[Askfedora]' + EMAIL_HOST='127.0.0.1' + EMAIL_PORT='25' + + # This variable points to the Askbot plugin which will be used for user + # authentication. Not enabled yet because we don't need FAS auth but use + # Fedora id as a openid provider. + # + # ASKBOT_CUSTOM_AUTH_MODULE = 'authfas' + + Now Ask Fedora website should be accessible from the browser. + + +Adding administrators +===================== + +As of Askbot version 0.7.21, the first user who logs in automatically +becomes the administrator. In previous versions, you have to do the +following.:: + + # cd /etc/askbot/sites/ask/conf/ + # python manage.py add_admin 1 + Do you really wish to make user (id=1, name=pjp) a site administrator? + yes/no: yes + +Once a user is marked as a administrator, he or she can go into anyone's +profile, go the "moderation" tab in the end and mark them as administrator +or moderator as well as block or suspend a user. + + +Change settings within the forum +================================ + +* Data entry and display: + - Disable "Allow asking questions anonymously" + - Enable "Force lowercase the tags" + - Change "Format of tag list" to "cloud" + - Change "Minimum length of search term for Ajax search" to "3" + - Change "Number of questions to list by default" to "50" + - Change "What should "unanswered question" mean?" to "Question has no + - answers" + +* Email and email alert settings + - Change "Default news notification frequency" to "Instantly" + +* Flatpages - about, privacy policy, etc. + Change "Text of the Q&A forum About page (html format)" to the following:: + + Ask Fedora provides a community edited knowledge base and support forum + for the Fedora community. Make sure you read the FAQ and search for + existing questions before asking yours. If you want to provide feedback, + just a question in this site! Tag your questions "meta" to highlight your + questions to the administrators of Ask Fedora. + +* Login provider settings + - Disable "Activate local login" + +* Q&A forum website parameters and urls + - Change "Site title for the Q&A forum" to "Ask Fedora: Community Knowledge + Base and Support Forum" + - Change "Comma separated list of Q&A site keywords" to "Ask Fedora, forum, + community, support, help" + - Change "Copyright message to show in the footer" to "All content is under + Creative Commons Attribution Share Alike License. Ask Fedora is community + maintained and Red Hat or Fedora Project is not responsible for content" + - Change "Site description for the search engines" to "Ask Fedora: Community + Knowledge Base and Support Forum" + - Change "Short name for your Q&A forum" to "Ask Fedora" + - Change "Base URL for your Q&A forum, must start with http or https" to + "http://ask.fedoraproject.org" + +* Sidebar widget settings - main page + - Disable "Show avatar block in sidebar" + - Disable "Show tag selector in sidebar" + +* Skin and User Interface settings + - Upload "Q&A site logo" + - Upload "Site favicon". Must be a ICO format file because that is the only one IE supports as a fav icon. + - Enable "Apply custom style sheet (CSS)" + - Upload the following custom CSS:: + + #ab-main-nav a { + color: #333333; + background-color: #d8dfeb; + border: 1px solid #888888; + border-bottom: none; + padding: 0px 12px 3px 12px; + height: 25px; + line-height: 30px; + margin-right: 10px; + font-size: 18px; + font-weight: 100; + text-decoration: none; + display: block; + float: left; + } + + #ab-main-nav a.on { + height: 24px; + line-height: 28px; + border-bottom: 1px solid #0a57a4; + border-right: 1px solid #0a57a4; + border-top: 1px solid #0a57a4; + border-left: 1px solid #0a57a4; /*background:#A31E39; */ + background: #0a57a4; + color: #FFF; + font-weight: 800; + text-decoration: none + } + + #ab-main-nav a.special { + font-size: 18px; + color: #072b61; + font-weight: bold; + text-decoration: none; + } + + /* tabs stuff */ + .tabsA { float: right; } + .tabsC { float: left; } + + .tabsA a.on, .tabsC a.on, .tabsA a:hover, .tabsC a:hover { + background: #fff; + color: #072b61; + border-top: 1px solid #babdb6; + border-left: 1px solid #babdb6; + border-right: 1px solid #888a85; + border-bottom: 1px solid #888a85; + height: 24px; + line-height: 26px; + margin-top: 3px; + } + + .tabsA a.rev.on, tabsA a.rev.on:hover { + padding: 0px 2px 0px 7px; + } + + .tabsA a, .tabsC a{ + background: #f9f7eb; + border-top: 1px solid #eeeeec; + border-left: 1px solid #eeeeec; + border-right: 1px solid #a9aca5; + border-bottom: 1px solid #888a85; + color: #888a85; + display: block; + float: left; + height: 20px; + line-height: 22px; + margin: 5px 0 0 4px; + padding: 0 7px; + text-decoration: none; + } + + .tabsA .label, .tabsC .label { + float: left; + font-weight: bold; + color: #777; + margin: 8px 0 0 0px; + } + + .tabsB a { + background: #eee; + border: 1px solid #eee; + color: #777; + display: block; + float: left; + height: 22px; + line-height: 28px; + margin: 5px 0px 0 4px; + padding: 0 11px 0 11px; + text-decoration: none; + } + + a { + color: #072b61; + text-decoration: none; + cursor: pointer; + } + + div.side-box + { + width:200px; + padding:10px; + border:3px solid #CCCCCC; + margin:0px; + background: -moz-linear-gradient(top, #DDDDDD, #FFFFFF); + } + +Database tweaks +=============== + +To automatically delete expired sessions, we run a trigger +that makes PostgreSQL delete them upon inserting a new one. + +The code used to create this trigger was:: + + askfedora=# CREATE FUNCTION delete_old_sessions() RETURNS trigger + askfedora-# LANGUAGE plpgsql + askfedora-# AS $$ + askfedora$# BEGIN + askfedora$# DELETE FROM django_session WHERE expire_date>> execfile('shelldb.py') + +At this point you have access to a `db` SQLAlchemy Session instance, a `t` +`transaction` module, and `m` for the `bodhi.models`. + + +:: + # Fetch an update, and tweak it as necessary. + >>> up = m.Update.get(u'u'FEDORA-2016-4d226a5f7e', db) + + # Commit the transaction + >>> t.commit() + + +Here is an example of merging two updates together and deleting the original. + +:: + >>> up = m.Update.get(u'FEDORA-2016-4d226a5f7e', db) + >>> up.builds + [, ] + >>> b = up.builds[0] + >>> up2 = m.Update.get(u'FEDORA-2016-5f63a874ca', db) + >>> up2.builds + [] + >>> up.builds.remove(b) + >>> up.builds.append(up2.builds[0]) + >>> delete_update(up2) + >>> t.commit() + + +Troubleshooting and Resolution +============================== + +Atomic OSTree compose failure +----------------------------- + +If the Atomic OSTree compose fails with some sort of `Device or Resource busy` error, then run `mount` to see if there +are any stray `tmpfs` mounts still active:: + + tmpfs on /var/lib/mock/fedora-22-updates-testing-x86_64/root/var/tmp/rpm-ostree.bylgUq type tmpfs (rw,relatime,seclabel,mode=755) + +You can then `umount /var/lib/mock/fedora-22-updates-testing-x86_64/root/var/tmp/rpm-ostree.bylgUq` and resume the push again. + + +nfs repodata cache IOError +-------------------------- + +Sometimes you may hit an IOError during the updateinfo.xml generation +process from createrepo_c:: + + IOError: Cannot open /mnt/koji/mash/updates/epel7-160228.1356/../epel7.repocache/repodata/repomd.xml: File /mnt/koji/mash/updates/epel7-160228.1356/../epel7.repocache/repodata/repomd.xml doesn't exists or not a regular file + +This issue will be resolved with NFSv4, but in the mean time it can be worked +around by removing the `.repocache` directory and resuming the push:: + + rm -fr /mnt/koji/mash/updates/epel7.repocache diff --git a/docs/sysadmin-guide/sops/bugzilla.rst b/docs/sysadmin-guide/sops/bugzilla.rst new file mode 100644 index 0000000..40a670a --- /dev/null +++ b/docs/sysadmin-guide/sops/bugzilla.rst @@ -0,0 +1,124 @@ +.. title: Bugzilla Sync SOP +.. slug: infra-bugzilla-sync +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +================================ +Bugzilla Sync Infrastructure SOP +================================ + +We do not run bugzilla.redhat.com. If bugzilla itself is down we need to +get in touch with Red Hat IT or one of the bugzilla hackers (for instance, +Dave Lawrence (dkl)) in order to fix it. + +Infrastructure has some scripts that perform administrative functions on +bugzilla.redhat.com. These scripts sync information from FAS and the +Package Database into bugzilla. + +Contents +======== + +1. Contact Information +2. Description +3. Troubleshooting and Resolution + + 1. Errors while syncing bugzilla with the PackageDB + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Persons + abadger1999 +Location + Phoenix, Denver (Tummy), Red Hat Infrastructure +Servers + (fas1, app5) => Need to migrate these to bapp1, bugzilla.redhat.com +Purpose + Sync Fedora information to bugzilla.redhat.com + +Description +=========== + +At present there are two scripts that sync information from Fedora into +bugzilla. + +export-bugzilla.py +------------------ + +``export-bugzilla.py`` is the first script. It is responsible for syncing +Fedora Accounts into bugzilla. It adds Fedora packages and bug triagers +into a bugzilla group that gives the users extra permissions within +bugzilla. This script is run off of a cron job on FAS1. The source code +resides in the FAS git repo in ``fas/scripts/export-bugzilla.*`` however the +code we run on the servers presently lives in ansible:: + + roles/fas_server/files/export-bugzilla + +pkgdb-sync-bugzilla +------------------- + +The other script is pkgdb-sync-bugzilla. It is responsible for syncing the +package owners and cclists to bugzilla from the pkgdb. The script runs off +a cron job on app5. The source code is in the packagedb bzr repo is +``packagedb/fedora-packagedb-stable/server-scripts/pkgdb-sync-bugzilla.*``. +Just like FAS, a separate copy is presently installed from ansbile to +``/usr/local/bin/pkgdb-sync-bugzilla`` but that should change ASAP as the +present fedora-packagedb package installs ``/usr/bin/pkgdb-sync-bugzilla``. + +Troubleshooting and Resolution +============================== + +Errors while syncing bugzilla with the PackageDB +------------------------------------------------ + +One frequent problem is that people will sign up to watch a package in the +packagedb but their email address in FAS isn't a bugzilla email address. +When this happens the scripts that try to sync the packagedb information +to bugzilla encounter an error and send an email like this:: + + Subject: Errors while syncing bugzilla with the PackageDB + + The following errors were encountered while updating bugzilla with information + from the Package Database. Please have the problems taken care of: + + ({'product': u'Fedora', 'component': u'aircrack-ng', 'initialowner': u'baz@zardoz.org', + 'initialcclist': [u'foo@bar.org', u'baz@zardoz.org']}, 504, 'The name foo@bar.org is not a + valid username. \n Either you misspelled it, or the person has not\n registered for a + Red Hat Bugzilla account.') + +When this happens we attempt to contact the person with the problematic +mail address and get them to change it. Here's a boilerplate message:: + + To: foo@bar.org + Subject: Fedora Account System Email vs Bugzilla Email + + Hello, + + You are signed up to receive bug reports against the aircrack-ng package + in Fedora. Unfortunately, the email address we have for you in the + Fedora Account System is not a valid bugzilla email address. That means + that bugzilla won't send you mail and we're getting errors in the script + that syncs the cclist into bugzilla. + + There's a few ways to resolve this: + + 1) Create a new bugzilla account with the email foo@bar.org as + an account at https://bugzilla.redhat.com. + + 2) Change an existing account on https://bugzilla.redhat.com to use the + foo@bar.org email address. + + 3) Change your email address in https://admin.fedoraproject.org/accounts + to use an email address that matches with an existing bugzilla email + address. + + Please let me know what you want to do! + + Thank you, + +If the user does not reply someone in the cvsadmin group needs to go into +the pkgdb and remove the user from the cclist for the package. diff --git a/docs/sysadmin-guide/sops/bugzilla2fedmsg.rst b/docs/sysadmin-guide/sops/bugzilla2fedmsg.rst new file mode 100644 index 0000000..ad8bebe --- /dev/null +++ b/docs/sysadmin-guide/sops/bugzilla2fedmsg.rst @@ -0,0 +1,74 @@ +.. title: bugzilla2fedmsg SOP +.. slug: infra-bugzilla2fedmsg +.. date: 2016-04-07 +.. taxonomy: Contributors/Infrastructure + +=================== +bugzilla2fedmsg SOP +=================== + +Receive events from bugzilla over the RH "unified messagebus" and rebroadcast +them over our own fedmsg bus. + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc +Servers + bugzilla2fedmsg01 +Purpose + Rebroadcast bugzilla events on our bus. + +Description +----------- + +bugzilla2fedmsg is a small service running as the 'moksha-hub' process which +receives events from bugzilla via the RH "unified messagebus" and rebroadcasts +them to our fedmsg bus. + +.. note:: Unlike *all* of our other fedmsg services, this one runs as the + 'moksha-hub' process and not as the 'fedmsg-hub'. + +The bugzilla2fedmsg package provides a plugin to the moksha-hub that +connects out over the STOMP protocol to a 'fabric' of JBOSS activemq FUSE +brokers living in the Red Hat DMZ. We authenticate with a cert/key pair that is +kept in /etc/pki/fedmsg/. Those brokers should push bugzilla events over +STOMP to our moksha-hub daemon. When a message arrives, we query bugzilla +about the change to get some 'more interesting' data to stuff in our +payload, then we sign the message using a fedmsg cert and fire it off to the +rest of our bus. + +This service has no database, no memcached usage. It depends on those STOMP +brokers and being able to query bugzilla.rh.com. + +Relevant Files +-------------- + +All managed by ansible, of course: + + STOMP config: /etc/moksha/production.ini + fedmsg config: /etc/fedmsg.d/ + certs: /etc/pki/fedmsg + code: /usr/lib/python2.7/site-packages/bugzilla2fedmsg.py + +Useful Commands +--------------- + +To look at logs, run:: + + $ journalctl -u moksha-hub -f + +To restart the service, run:: + + $ systemctl restart moksha-hub + +Internal Contacts +------------------- + +If we need to contact someone from the RH internal "unified messagebus" team, +search for "unified messagebus" in mojo. It is operated as a joint project +between RHIT and PnT Devops. See also the ``#devops-message`` IRC channel, +internally. diff --git a/docs/sysadmin-guide/sops/cloud.rst b/docs/sysadmin-guide/sops/cloud.rst new file mode 100644 index 0000000..c40deb7 --- /dev/null +++ b/docs/sysadmin-guide/sops/cloud.rst @@ -0,0 +1,138 @@ +.. title: Fedora OpenStack Cloud +.. slug: infra-openstack +.. date: 2015-04-28 +.. taxonomy: Contributors/Infrastructure + +================ +Fedora OpenStack +================ + +Quick Start +=========== + +Controller:: + + sudo rbac-playbook hosts/fed-cloud09.cloud.fedoraproject.org.yml + +Compute nodes:: + + sudo rbac-playbook groups/openstack-compute-nodes.yml + +Description +=========== + +If you need to install OpenStack install, either make sure the machine is clean. +Or use ``ansible.git/files/fedora-cloud/uninstall.sh`` script to brute force wipe off. + +.. note:: by default, the script does not wipe LVM group with VM, you have to clean + them manually. There is commented line in that script. + +On fed-cloud09, remove the file ``/etc/packstack_sucessfully_finished`` to enforce run of packstack and few other commands. + +After that wipe, you have to:: + + ifdown eth1 + configure eth1 to become normal Ethernet with ip + yum install openstack-neutron-openvswitch + /usr/bin/systemctl restart neutron-ovs-cleanup + ifup eth1 + +Additionally when reprovision OpenStack, all volumes on DellEqualogic are +preserved and you have to manually remove them (or remove them from OS before +it is reprovision). SSH to DellEqualogic (credentials are at the bottom of +``/etc/cinder/cinder.conf``) and run:: + + show (to get list of volumes) + volume select offline + volume delete + +Before installing make sure: + + * make sure rdo repo is enabled + * ``yum install openstack-packstack openstack-packstack-puppet openstack-puppet-modules`` + * ``vim /usr/lib/python2.7/site-packages/packstack/plugins/dashboard_500.py`` + and missing parentheses:: + + ``host_resources.append((ssl_key, 'ssl_ps_server.key'))`` + +Now you can run playbook:: + + sudo rbac-playbook hosts/fed-cloud09.cloud.fedoraproject.org.yml + +If you run it after wipe (i.e. db has been reset), you have to: + + * import ssh keys of users (only possible via webUI - RHBZ 1128233 + * reset user passwords + + +Compute nodes +============= + +Compute node is much easier and is written as role. Use:: + + vars_files: + - ... SNIP + - /srv/web/infra/ansible/vars/fedora-cloud.yml + - "{{ private }}/files/openstack/passwords.yml" + + roles: + ... SNIP + - cloud_compute + +Define a host variable in ``inventory/host_vars/FQDN.yml``:: + + compute_private_ip: 172.23.0.10 + +You should also add IP to ``vars/fedora-cloud.yml`` + +And when adding new compute node, please update ``files/fedora-cloud/hosts`` + +.. important:: When reinstalling make sure you removed all members on Dell Equalogic + (credentials are in /etc/cinder/cinder.conf on compute node) otherwise the + space will be blocked!!! + +Updates +======= +Our openstack cloud should have updates applied and reboots when the rest of our servers +are updated and rebooted. This will cause an outage, please make sure to schedule it. + +1. Stop copr-backend process on copr-be.cloud.fedoraproject.org +2. Kill all copr-builder instances. +3. Kill all transient/scratch instances. +4. Update all instances we control. copr, persistent, infrastructure, qa etc. +5. Shutdown all instances +6. Update and reboot fed-cloud09 +7. Update and reboot all compute nodes +8. Start up all instances that are shutdown in step 5. + +TODO: add commands for above as we know them. + +Troubleshooting +=============== + +* could not connect to VM? - check your security group, default SG does not + allow any connection. +* packstack end up with error, it is likely race condition in puppet - BZ 1135529. Just run it again. + +* ERROR : append() takes exactly one argument (2 given + ``vi /usr/lib/python2.7/site-packages/packstack/plugins/dashboard_500.py`` + and add one more surrounding () + +* Local ip for ovs agent must be set when tunneling is enabled + restart fed-cloud09 or: + ssh to fed-cloud09; ifdown eth1; ifup eth1; ifup br-ex + +* mongodb problem? follow + https://ask.openstack.org/en/question/54015/mongodbpp-error-when-installing-rdo-on-centos-7/?answer=54076#post-id-54076 + +* ``WARNING:keystoneclient.httpclient:Failed to retrieve management_url from token``:: + + keystone --os-token $ADMIN_TOKEN --os-endpoint \ + https://fedorainfracloud.org:35357/v2.0/ endpoint-create --region 'RegionOne' \ + --service 91358b81b1aa40d998b3a28d0cfc86e7 --region 'RegionOne' --publicurl \ + 'https://fedorainfracloud.org:5000/v2.0' --adminurl 'http://172.24.0.9:35357/v2.0' \ + --internalurl 'http://172.24.0.9:5000/v2.0' + +Fedora Classroom about our instance +=================================== +http://meetbot.fedoraproject.org/fedora-classroom/2015-05-11/fedora-classroom.2015-05-11-15.02.log.html diff --git a/docs/sysadmin-guide/sops/collectd.rst b/docs/sysadmin-guide/sops/collectd.rst new file mode 100644 index 0000000..2147325 --- /dev/null +++ b/docs/sysadmin-guide/sops/collectd.rst @@ -0,0 +1,72 @@ +.. title: Collectd SOP +.. slug: collectd +.. date: 2016-03-22 +.. taxonomy: Contributors/Infrastructure + +============ +Collectd SOP +============ + +Collectd ( https://collectd.org/ ) is a client/server setup that gathers system information +from clients and allows the server to display that information over various time periods. + +Our server instance runs on log01.phx2.fedoraproject.org and most other servers run +clients that connect to the server and provide it with data. + +======== + +1. Contact Information +2. Collectd info + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + https://admin.fedoraproject.org/collectd/ +Servers + log01 and all/most other servers as clients +Purpose + provide load and system information on servers. + +Configuration +============= + +The collectd roles configure collectd on the various machines: + +collectd/base - This is the base client role for most servers. +collectd/server - This is the server for use on log01. +collectd/other - There's various other subroles for different types of clients. + +Web interface +============= + +The server web interface is available at: + +https://admin.fedoraproject.org/collectd/ + +Restarting +========== + +collectd runs as a normal systemd or sysvinit service, so you can: +systemctl restart collectd or service collectd restart +to restart it. + +Removing old hosts +================== + +Collectd keeps information around until it's deleted, so you may need to +sometime go remove data from a host or hosts thats no longer used. +To do this: + +1. Login to log01 +2. cd /var/lib/collectd/rrd +3. sudo rm -rf oldhostname + +Bug reporting +============= + +Collectd is in Fedora/EPEL and we use their packages, so report bugs to bugzilla.redhat.com. diff --git a/docs/sysadmin-guide/sops/contenthosting.rst b/docs/sysadmin-guide/sops/contenthosting.rst new file mode 100644 index 0000000..76743f5 --- /dev/null +++ b/docs/sysadmin-guide/sops/contenthosting.rst @@ -0,0 +1,142 @@ +.. title: Content Hosting Infrastructure SOP +.. slug: infra-content-hosting +.. date: 2012-07-17 +.. taxonomy: Contributors/Infrastructure + +================================== +Content Hosting Infrastructure SOP +================================== + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main, fedora-infrastructure-list + +Location + Phoenix + +Servers + secondary1, netapp[1-3], torrent1 + +Purpose + Policy regarding hosting, removal and pruning of content. + +Scope + download.fedora.redhat.com, alt.fedoraproject.org, + archives.fedoraproject.org, secondary.fedoraproject.org, + torrent.fedoraproject.org + +Description +=========== + +Fedora hosts both Fedora content and some non-Fedora content. Our +resources are finite and as such we have to have some policy around when +to remove old content. This SOP describes the test to remove content. The +spirit of this SOP is to allow more people to host content and give it a +try, prove that it's useful. If it's not popular or useful, it will get +removed. Also out of date or expired content will be removed. + +What hosting options are available +---------------------------------- + +Aside from the hosting at http://fedorahosted.org/ we have a series of +mirrors we're allowing people to use. They are located at: + +* http://archive.fedoraproject.org/pub/archive/ - For archives of historical Fedora releases +* http://secondary.fedoraproject.org/pub/fedora-secondary/ - For secondary architectures +* http://alt.fedoraproject.org/pub/alt/ - For misc content / catchall +* http://torrent.fedoraproject.org/ - For torrent hosting +* http://spins.fedoraproject.org/ - For official Fedora Spins hosting, mirrored somewhat +* http://download.fedoraproject.com/pub/ - For official Fedora Releases, mirrored widely + +Who can host? What can be hosted? +--------------------------------- +Any official Fedora content can hosted and made available for mirroring. +Official content is determined by the Council by virtue of allowing people +to use the Fedora trademark. People representing these teams will be +allowed to host. + +Non Official Hosting +-------------------- + +People wanting to host unofficial bits may request approval for hosting. +Create a ticket at https://fedorahosted.org/fedora-infrastructure/ +explaining what and why Fedora should host it. Such will be reviewed by +the Fedora Infrastructure team. + +Requests for non-official hosting that may conflict with existing Fedora +policies will be escalated to the Council for approval. + +Licensing +--------- +Anything hosted with Fedora must come with a Free software license that is +approved by Fedora. See http://fedoraproject.org/wiki/Licensing for +more. + +Requesting Space +================ + +* Make sure you have a Fedora account - + https://admin.fedoraproject.org/accounts/ +* Ensure you have signed the Fedora Project Contributor Agreement (FPCA) +* Submit a hosting request - + https://fedorahosted.org/fedora-infrastructure/ + + * Include who you are, and any group you are working with (e.g. a SIG) + * Include Space requirements + * Include an estimate of the number of downloads expected (if you can). + * Include the nature of the bits you want to host. + +* Apply for group hosted-content - + https://admin.fedoraproject.org/accounts/group/view/hosted-content + +Using Space +=========== + +A dedicated namespace in the mirror will be assigned to you. It will be +your responsibility to upload content, remove old content, stay within +your quota, etc. If you have any questions or concerns about this please +let us know. Generally you will use rsync. For example:: + + rsync -av --progress ./my.iso secondary01.fedoraproject.org:/srv/pub/alt/mySpace/ + +.. important:: + None of our mirrored content is backed up. Ensure that you keep backups of + your content. + +Content Pruning / Purging / Removal +=================================== + +The following guidelines / tests will be used to determine whether or not +to remove content from the mirror. + +Expired / Old Content +---------------------- + +If content meets any of the following criteria it may be removed: + +* Content that has reached the end of life (is no longer receiving updates). +* Pre-release content that has been superceded. +* EOL releases that have been moved to archives. +* N-2 or greater releases. If more than 3 versions of a piece of content + are on the mirror, the oldest may be removed. + +Limited Use Content +------------------- +If content meets any of the following criteria it may be removed: + +* Content with exceedingly limited seeders or downloaders, with little + prospect of increasing those numbers and which is older then 1 year. + +* Content such as videos or audio which are several years old. + +Catch All Removal +------------------ + +Fedora reserves the right to remove any content for any reason at any +time. We'll do our best to host things but sometimes we'll need space or +just need to remove stuff for legal or policy reasons. diff --git a/docs/sysadmin-guide/sops/copr.rst b/docs/sysadmin-guide/sops/copr.rst new file mode 100644 index 0000000..dc98340 --- /dev/null +++ b/docs/sysadmin-guide/sops/copr.rst @@ -0,0 +1,195 @@ +.. title: Copr +.. slug: infra-copr +.. date: 2015-01-13 +.. taxonomy: Contributors/Infrastructure +==== +Copr +==== + +Copr is build system for 3rd party packages. + +Frontend: + - http://copr.fedorainfracloud.org/ +Backend: + - http://copr-be.cloud.fedoraproject.org/ +Package signer: + - copr-keygen.cloud.fedoraproject.org +Dist-git + - copr-dist-git.fedorainfracloud.org + +Devel instances (NO NEED TO CARE ABOUT THEM, JUST THOSE ABOVE): + - http://copr-fe-dev.cloud.fedoraproject.org/ + - http://copr-be-dev.cloud.fedoraproject.org/ + - copr-keygen-dev.cloud.fedoraproject.org + - copr-dist-git-dev.fedorainfracloud.org + +Contact Information +==================== +Owner + msuchy (mirek) +Contact + #fedora-admin, #fedora-buildsys +Location + Fedora Cloud +Purpose + Build system + +TROUBLESHOOTING +================ + +Almost every problem with Copr is due problem in OpenStack, in such case:: + + $ ssh root@copr-be.cloud.fedoraproject.org + # copr-backend-service stop + # source /home/copr/cloud/ec2rc.sh + # /home/copr/delete-forgotten-instances.pl + # # wait a minute and check + # euca-describe-instances + # # sometimes you have to run delete-forgotten-instances.pl as openstack is sometimes stuborn. + # copr-backend-service start + +If this does not help you, then stop and kill all OpenStack VM builders and:: + + $ ssh root@fed-cloud02.cloud.fedoraproject.org + # source keystonerc + # for i in $(nova-manage floating list | grep 7ed4d | grep None | sort | awk '{ print $2}') + do nova-manage floating delete $i + nova-manage floating create $i + done + +or even (USUALLY NOT NEEDED):: + + for i in /etc/init.d/openstack-*; do $i condrestart; done + +and then start copr backend service again. + + # copr-backend-service restart + +Sometimes OpenStack can not handle spawning too much VM at the same time. +So it is safer to edit on copr-be.cloud.fedoraproject.org:: + + vi /etc/copr/copr-be.conf + +and change:: + + group0_max_workers=12 + +to "6". Start copr-backend service and some time later increase it to +original value. Copr automaticaly detect change in script and increase +number of workers. + +Backend Troubleshoting +---------------------- + +Information about status of Copr backend services: + + # copr-backend-service status + + +Utilization of workers: + + # ps axf + +Worker process change $0 to list which task they are working on and on which builder. + +To list which VM builders are tracked by copr-vmm service: + + # /usr/bin/copr_get_vm_info.py + + +Deploy information +================== + +Using playbooks and rbac:: + + $ sudo rbac-playbook groups/copr-backend.yml + $ sudo rbac-playbook groups/copr-frontend.yml + $ sudo rbac-playbook groups/copr-keygen.yml + $ sudo rbac-playbook groups/copr-dist-git.yml + +https://git.fedorahosted.org/cgit/copr.git/plain/copr-setup.txt + +On backend should run copr-backend service (which spawns several processes). +Backend spawns VM from Fedora Cloud. You could not login to those machines directly. +You have to:: + + $ ssh root@copr-be.cloud.fedoraproject.org + # su - copr + $ source /home/copr/cloud/ec2rc.sh + $ euca-describe-instances + # # instance type m1.builder are those spawned by backend, check 18th column with internal IP + # # log there if you want + $ ssh root@172.16.3.3 + # or terminate that instance (ID is in 2nd column) + # euca-terminate-instances i-000003b3 + # #you can delete all instances in error state or simply forgotten by: + # /home/copr/delete-forgotten-instances.pl + +Order of start up +----------------- + +When reprovision you should start first: copr-keygen and copr-dist-git machines (in any order). +Then you can start copr-be. Well you can start it sooner, but make sure that copr-* services are stopped. + +Copr-fe machine is completly independent and can be start any time. If backend is stopped it will just queue jobs. + +Logs +==== + +For backend + /var/log/copr/backend.log /var/log/copr/workers/worker-* + /var/log/copr/spawner.log /var/log/copr/job_grab.log + /var/log/copr/actions.log /var/log/copr/vmm.log + +For frontend: + httpd logs: /var/log/httpd/{error,access}_log + +For keygen: + /var/log/copr-keygen/main.log + +For dist-git: + /var/log/copr-dist-git/main.log + +httpd logs: + /var/log/httpd/{error,access}_log + +Services +======== + +For backend use script + copr-backend-service {start|stop|restart} + - this handle all copr* services (job grabber, vmm, workers, ...) + logstash + redis + lighttpd + +For frontend: + httpd + logstash + postgresql + +For keygen: + signd + +For dist-git: + httpd + copr-dist-git + +PPC64LE Builders +================ + +Builders for PPC64 are located at rh-power2.fit.vutbr.cz and anyone with access to buildsys ssh key can get there using keys as + msuchy@rh-power2.fit.vutbr.cz + +There are commands: +$ ls bin/ +destroy-all.sh reinit-vm26.sh reinit-vm28.sh virsh-destroy-vm26.sh virsh-destroy-vm28.sh virsh-start-vm26.sh virsh-start-vm28.sh +get-one-vm.sh reinit-vm27.sh reinit-vm29.sh virsh-destroy-vm27.sh virsh-destroy-vm29.sh virsh-start-vm27.sh virsh-start-vm29.sh + +bin/destroy-all.sh destroy all VM and reinit them +reinit-vmXX.sh copy VM image from template +virsh-destroy-vmXX.sh destroys VM +virsh-start-vmXX.sh starts VM +get-one-vm.sh start one VM and return its IP - this is used in Copr playbooks. + +In case of big queue of PPC64 tasks simply call bin/destroy-all.sh and it will destroy stuck VM and copr backend will spawn new VM. diff --git a/docs/sysadmin-guide/sops/cyclades.rst b/docs/sysadmin-guide/sops/cyclades.rst new file mode 100644 index 0000000..5c84381 --- /dev/null +++ b/docs/sysadmin-guide/sops/cyclades.rst @@ -0,0 +1,33 @@ +.. title: cyclades +.. slug: infra-cyclades +.. date: 2011-12-12 +.. taxonomy: Contributors/Infrastructure +======== +Cyclades +======== + +cyclades notes + +1. login as root - default password is tslinux +2. change password for root and admin to our password from the + phx2-access.txt file in the private repo +3. port forward to the web browser for the cyclades + ``ssh -L 8080:rack47-serial.phx2.fedoraproject.org:80`` +4. connect to localhost:8080 in your web browser +5. login with root and the password you set above +6. click on 'security' +7. click on 'moderate' +8. logout, port forward port 443 as above: + ``ssh -L 8080:rack47-serial.phx2.fedoraproject.org:443`` +9. click on the 'wizard' button at lower left +10. proceed through the wizard + Info needed: + - serial ports are set to 115200 8N1 by default + - do not setup buffering + - give it the ip of our syslog server + +11. click 'apply changes' +12. hope +13. log back in +14. name/setup the port aliases + diff --git a/docs/sysadmin-guide/sops/darkserver.rst b/docs/sysadmin-guide/sops/darkserver.rst new file mode 100644 index 0000000..22ca7cf --- /dev/null +++ b/docs/sysadmin-guide/sops/darkserver.rst @@ -0,0 +1,109 @@ +.. title: Darkserver SOP +.. slug: infra-darkserver +.. date: 2012-03-22 +.. taxonomy: Contributors/Infrastructure + +============== +Darkserver SOP +============== + +To setup a http://darkserver.fedoraproject.org based on Darkserver project +to provide GNU_BUILD_ID information for packages. A devel instance can be +seen at http://darkserver01.dev.fedoraproject.org and staging instance is +http://darkserver01.stg.phx2.fedoraproject.org/. + +This page describes how to set up the server. + +Contents +======== + +1. Contact Information +2. Installing the server +3. Setting up the database +4. SELinux Configuration +5. Koji plugin setup +6. Debugging + + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin +Persons: + kushal mether +Sponsor: + nirik +Location: + phx2 +Servers: + darkserver01 , darkserver01.stg, darkserver01.dev +Purpose: + To host Darkserver + + +Installing the Server +===================== +:: + + root@localhost# yum install darkserver + + +Setting up the database +======================= +We are using MySQL as database. We will need two users, one for +koji-plugin and one for darkserver.:: + + root@localhost# mysql -u root + mysql> CREATE DATABASE darkserver; + mysql> GRANT INSERT ON darkserver.* TO kojiplugin@'koji-hub-ip' IDENTIFIED BY 'XXX'; + mysql> GRANT SELECT ON darkserver.* TO dark@'darkserver-ip' IDENTIFIED BY 'XXX'; + +Setup this db configuration in the conf file under ``/etc/darkserver/darkserverweb.conf``:: + + [darkserverweb] + host=db host name + user=dark + password=XXX + database=darkserver + +Now setup the db tables if it is a new install. + +(For this you may need to ``'GRANT * ON darkserver.*'`` to the web user, and +then ``'REVOKE * ON darkserver.*'`` after running.) + +:: + + root@localhost# python /usr/lib/python2.6/site-packages/darkserverweb/manage.py syncdb + +SELinux Configuration +===================== + +Do the follow to allow the webserver to connect to the database.:: + + root@localhost# setsebool -P httpd_can_network_connect_db 1 + +Setting up the Koji plugin +========================== + +Install the package.:: + + root@localhost# yum install darkserver-kojiplugin + +Then fill up the configuration file under ``/etc/koji-hub/plugins/darkserver.conf``:: + + [darkserver] + host=db host name + user=kojiplugin + password=XXX + database=darkserver + port=3306 + +Then enable the plugin in the koji hub configuration. + +Debugging +========= +Set DEBUG to True in ``/etc/darkserver/settings.py`` file and restart Apache. + diff --git a/docs/sysadmin-guide/sops/database.rst b/docs/sysadmin-guide/sops/database.rst new file mode 100644 index 0000000..c2b2574 --- /dev/null +++ b/docs/sysadmin-guide/sops/database.rst @@ -0,0 +1,237 @@ +.. title: Database Infrastructure SOP +.. slug: infra-database +.. date: 2016-09-24 +.. taxonomy: Contributors/Infrastructure + +=========================== +Database Infrastructure SOP +=========================== + +Our database servers provide database storage for many of our apps. + +Contents + +1. Contact Information +2. Description +3. Creating a New Postgresql Database +4. Troubleshooting and Resolution + + 1. Connection issues + 2. Some useful queries + + 1. What queries are running + 2. Seeing how "dirty" a table is + 3. XID Wraparound + + 3. Restart Procedure + + 1. Koji + + 2. Bodhi + +5. Note about TurboGears and MySQL +6. Restoring from backups or specific dbs + + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main, sysadmin-dba group + +Location + Phoenix + +Servers + sb01, db03, db-fas01, db-datanommer02, db-koji01, db-s390-koji01, db-arm-koji01, db-ppc-koji01, db-qa01, dbqastg01 + +Purpose + Provides database connection to many of our apps. + +Description +=========== + +db01, db03 and db-fas01 are our primmary servers. +db01 and db-fas01 run PostgreSQL. +db03 contain mariadb. +db-koji01, db-s390-koji01, db-arm-koji01, db-ppc-koji01 contain secondary kojis. +db-qa01 and db-qastg01 contain taskotron and resultsdb. +db-datanommer02 contains all storage messages from postgresql database. + + +Creating a New Postgresql Database +================================== + +Creating a new database on our postgresql server isn't hard but there's +several steps that should be taken to make the database server as secure +as possible. + +We want to separate the database permissions so that we don't have the +user/password combination that can do anything it likes to the database on +every host (the webapp user can usually do a lot of things even without those +extra permissions but every little bit helps). + +Say we have an app called "raffle". We'd have three users: + +* raffleadmin: able to make any changes they want to this particular + database. It should not be used in day to day but only for things + like updating the database schema when an update occurs. + We could very likely disable this account in the db whenever we are not + using it. +* raffleapp: the database user that the web application uses. This will + likely need to be able to insert and select from all tables. It will + probably need to update most tables as well. There may be some tables + that it does *not* need delete on. It should almost certainly not + need schema modifying permissions. (With postgres, it likely also + needs permission to insert/select on sequences as well). +* rafflereadonly: Only able to read data from tables, not able to modify + anything. Sadly, we aren't using this often but it can be useful for + scripts that need to talk directly to the database without modifying it. + +:: + + db2 $ sudo -u postgres createuser -P -E NEWDBadmin + Password: + db2 $ sudo -u postgres createuser -P -E NEWDBapp + Password: + db2 $ sudo -u postgres createuser -P -E NEWDBreadonly + Password: + db2 $ sudo -u postgres createdb -E utf8 NEWDB -O NEWDBadmin + db2 $ sudo -u postgres psql NEWDB + NEWDB=# revoke all on database NEWDB from public; + NEWDB=# revoke all on schema public from public; + NEWDB=# grant all on schema public to NEWDBadmin; + NEWDB=# [grant permissions to NEWDBapp as appropriate for your app] + NEWDB=# [grant permissions to NEWDBreadonly as appropriate for a user that + is only trusted enough to read information] + NEWDB=# grant connect on database NEWDB to nagiosuser; + + +If your application needs to have the NEWDBapp and password to connect to +the database, you probably want to add these to ansible as well. Put the +password in the private repo in batcave01. Then use a templatefile to +incorporate it into the config file. See fas.pp for an example. + +Troubleshooting and Resolution +============================== + +Connection issues +----------------- + +There are no known outstanding issues with the database itself. Remember +that every time either database is restarted, services will have to be +restarted (see below). + +Some useful queries +------------------- + +What queries are running +```````````````````````` + +This can help you find out what queries are cuurently running on the +server:: + + select datname, pid, query_start, backend_start, query from + pg_stat_activity where state<>'idle' order by query_start; + +This can help you find how many connections to the db server are for each +individual database:: + + select datname, count(datname) from pg_stat_activity group by datname + order by count desc; + +Seeing how "dirty" a table is +````````````````````````````` + +We've added a function from postgres's contrib directory to tell how dirty +a table is. By dirty we mean, how many tuples are active, how many have +been marked as having old data (and therefore "dead") and how much free +space is allocated to the table but not used.:: + + \c fas2 + \x + select * from pgstattuple('visit_identity'); + table_len | 425984 + tuple_count | 580 + tuple_len | 46977 + tuple_percent | 11.03 + dead_tuple_count | 68 + dead_tuple_len | 5508 + dead_tuple_percent | 1.29 + free_space | 352420 + free_percent | 82.73 + \x + +Vacuum should clear out dead_tuples. Only a vacuum full, which will lock +the table and therefore should be avoided, will clear out free space. + +XID Wraparound +`````````````` +Find out how close we are to having to perform a vacuum of a database (as +opposed to individual tables of the db). We should schedule a vacuum when +about 50% of the transaction ids have been used (approximately 530,000,000 +xids):: + + select datname, age(datfrozenxid), pow(2, 31) - age(datfrozenxid) as xids_remaining + from pg_database order by xids_remaining; + +Information on [61]wraparound + +Restart Procedure +================= + +If the database server needs to be restarted it should come back on it's +own. Otherwise each service on it can be restarted:: + + service mysqld restart + service postgresql restart + +Koji +---- + +Any time postgreql is restarted, koji needs to be restarted. Please also +see [62]Restarting Koji + +Bodhi +----- + +Anytime postgresql is restarted Bodhi will need to be restarted no sop +currently exists for this. + +TurboGears and MySQL +==================== + +.. note:: about TurboGears and MySQL + + There's a known bug in TurboGears that causes MySQL clients not to + automatically reconnect when lost. Typically a restart of the TurboGears + application will correct this issue. + +Restoring from backups or specific dbs. +======================================= + +Our backups store the latest copy in /backups/ on each db server. +These backups are created automatically by the db-backup script run fron cron. +Look in /usr/local/bin for the backup script. + +To restore partially or completely you need to: + +1. setup postgres on a system + +2. start postgres/run initdb + - if this new system running postgres has already run ansible then it will + have wrong config files in /var/lib/pgsql/data - clear them out before + you start postgres so initdb can work. +3. grab the backups you need from /backups - also grab global.sql + edit up global.sql to only create/alter the dbs you care about + +4. as postgres run: ``psql -U postgres -f global.sql`` + +5. when this completes you can restore each db with (as postgres user):: + createdb $dbname + pg_restore -d dbname dbname_backup_file.db + +6. restart postgres and check your data. diff --git a/docs/sysadmin-guide/sops/datanommer.rst b/docs/sysadmin-guide/sops/datanommer.rst new file mode 100644 index 0000000..ab4c9d0 --- /dev/null +++ b/docs/sysadmin-guide/sops/datanommer.rst @@ -0,0 +1,117 @@ +.. title: Datanommer SOP +.. slug: infra-datanommer +.. date: 2013-02-08 +.. taxonomy: Contributors/Infrastructure + +datanommer SOP +============== + +Consume fedmsg bus activity and stuff it in a postgresql db. + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc +Servers + busgateway01 +Purpose + Save fedmsg bus activity + +Description +----------- + +datanommer is a set of three modules: + +python-datanommer-models + Schema definition and API for storing new items + and querying existing items + +python-datanommer-consumer + A plugin for the fedmsg-hub that actively + listens to the bus and stores events. + +datanommer-commands + A set of CLI tools for querying the DB. + +datanommer will one day serve as a backend for future web services like +datagrepper and dataviewer. + +Source: https://github.com/fedora-infra/datanommer/ +Plan: https://fedoraproject.org/wiki/User:Ianweller/statistics_plus_plus + +CLI tools +--------- + +Dump the db into a file as json:: + + $ datanommer-dump > datanommer-dump.json + +When was the last bodhi message?:: + + $ # It was 678 seconds ago + $ datanommer-latest --category bodhi --timesince + [678] + +When was the last bodhi message in more readable terms?:: + + $ # It was 12 minutes and 43 seconds ago + $ datanommer-latest --category bodhi --timesince --human + [0:12:43.087949] + +What was that last bodhi message?:: + + $ datanommer-latest --category bodhi + [{"bodhi": { + "topic": "org.fedoraproject.stg.bodhi.update.comment", + "msg": { + "comment": { + "group": null, + "author": "ralph", + "text": "Testing for latest datanommer.", + "karma": 0, + "anonymous": false, + "timestamp": 1360349639.0, + "update_title": "xmonad-0.10-10.fc17" + }, + "agent": "ralph" + }, + }}] + +Show me stats on datanommer messages by topic:: + + $ datanommer-stats --topic + org.fedoraproject.stg.fas.group.member.remove has 10 entries + org.fedoraproject.stg.logger.log has 76 entries + org.fedoraproject.stg.bodhi.update.comment has 5 entries + org.fedoraproject.stg.busmon.colorized-messages has 10 entries + org.fedoraproject.stg.fas.user.update has 10 entries + org.fedoraproject.stg.wiki.article.edit has 106 entries + org.fedoraproject.stg.fas.user.create has 3 entries + org.fedoraproject.stg.bodhitest.testing has 4 entries + org.fedoraproject.stg.fedoratagger.tag.create has 9 entries + org.fedoraproject.stg.fedoratagger.user.rank.update has 5 entries + org.fedoraproject.stg.wiki.upload.complete has 1 entries + org.fedoraproject.stg.fas.group.member.sponsor has 6 entries + org.fedoraproject.stg.fedoratagger.tag.update has 1 entries + org.fedoraproject.stg.fas.group.member.apply has 17 entries + org.fedoraproject.stg.__main__.testing has 1 entries + +Upgrading the DB Schema +----------------------- + +datanommer uses "python-alembic" to manage its schema. When developers want +to add new columns or features, these should/must be tracked in alembic and +shipped with the RPM. + +In order to run upgrades on our stg/prod dbs: + +1) ssh to busgateway01{.stg} +2) ``cd /usr/share/datanommer.models/`` +3) Run:: + + $ alembic upgrade +1 + + Over and over again until the db is fully upgraded. diff --git a/docs/sysadmin-guide/sops/denyhosts.rst b/docs/sysadmin-guide/sops/denyhosts.rst new file mode 100644 index 0000000..8e3e362 --- /dev/null +++ b/docs/sysadmin-guide/sops/denyhosts.rst @@ -0,0 +1,62 @@ +.. title: Denyhosts Infrastructure SOP +.. slug: infra-denyhosts +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +============================ +Denyhosts Infrastructure SOP +============================ + +Denyhosts provides a protection against brute force attacks. + +Contents +======== + +1. Contact Information +2. Description +3. Troubleshooting and Resolution + + 1. Connection issues + +Contact Information +==================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main group + +Location + Anywhere + +Servers + All + +Purpose + Denyhosts provides a protection against brute force attacks. + +Description +=========== + +All of our servers now implement denyhosts to protect against brute force +attacks. Very few boxes should be in the 'allowed' list. Especially +internally. + +Troubleshooting and Resolution +============================== + +Connection issues +----------------- +The most common issue will be legitimate logins failing. First, try to +figure out why a host ended up on the deny list (tcptraceroute, failed +login attempts, etc are all good candidates). Next do the following +directions. The below example is for a host (10.0.0.1) being banned. Login +to the box from a different host and as root do the following.:: + + cd /var/lib/denyhosts + sed -si '/10.0.0.1/d' * /etc/hosts.deny + /etc/init.d/denyhosts restart + +That should correct the problem. + diff --git a/docs/sysadmin-guide/sops/departing-admin.rst b/docs/sysadmin-guide/sops/departing-admin.rst new file mode 100644 index 0000000..ab61135 --- /dev/null +++ b/docs/sysadmin-guide/sops/departing-admin.rst @@ -0,0 +1,64 @@ +.. title: Departing Admin SOP +.. slug: infra-departing-admin +.. date: 2013-07-15 +.. taxonomy: Contributors/Infrastructure + +=================== +Departing admin SOP +=================== + +From time to time admins depart the project, this SOP checks any access they may no longer need. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + Everywhere +Servers + all + +Description +=========== + +From time to time people with admin access to various parts of the project may +leave the project or no longer wish to contribute. This SOP attempts to list +the process for removing access they no longer need. + +0. First, make sure that this SOP is needed. Verify the person has left the project + and what areas they might wish to still contibute to. + +1. Gather info: fas username, email address, knowledge of passwords. + +2. Check the following areas with the following commands: + + email address in ansible + - Check: ``git grep email@address`` + - Remove: ``git commit`` + + koji admin + - Check: ``koji list-permissions --user=username`` + - Remove: ``koji revoke-permission permissionname username`` + + wiki pages + - Check: look for https://fedoraproject.org/wiki/User:Username + - Remove: delete page, or modify with info they are no longer contributing. + + packages + - Check: Download https://admin.fedoraproject.org/pkgdb/lists/bugzilla?tg_format=plain and grep + - Remove: remove from cc, orphan packages or reassign. + + fas account + - Check: check username in fas + - Remove: set user inactive + + .. note:: If there are scripts or files needed, save homedir of user. + + passwords + - Check: if departing admin knew sensitive passwords. + - Remove: Change passwords. + + .. note:: root pw, management interfaces, etc diff --git a/docs/sysadmin-guide/sops/dns.rst b/docs/sysadmin-guide/sops/dns.rst new file mode 100644 index 0000000..e8fc054 --- /dev/null +++ b/docs/sysadmin-guide/sops/dns.rst @@ -0,0 +1,328 @@ +.. title: DNS Infrastructure SOP +.. slug: infra-dns +.. date: 2015-06-03 +.. taxonomy: Contributors/Infrastructure + +================================ +DNS repository for fedoraproject +================================ + +We've set this up so we can easily (and quickly) edit and deploy dns changes +with a record of who changed what and why. This system also lets us edit out +proxies from rotation for our many and varied websites quickly and with a +minimum of opportunity for error. Finally, it checks to make sure that all +of the zone changes will actually work before they are allowed. + +DNS Infrastructure SOP +====================== + +We have 5 DNS servers: + +ns-sb01.fedoraproject.org + hosted at Serverbeach +ns02.fedoraproject.org + hosted at ibiblio (ipv6 enabled) +ns03.phx2.fedoraproject.org + in phx2, internal to phx2. +ns04.fedoraproject.org + in phx2, external. +ns05.fedoraproject.org + hosted at internetx (ipv6 enabled) + +Contents +======== + +1. Contact Information +2. Troubleshooting, Resolution and Maintenance + + 1. DNS update + 2. Adding a new zone + +3. GeoDNS + + 1. Non geodns fedoraproject.org IPs + 2. Adding and removing countries + 3. IP Country Mapping + +4. resolv.conf + + 1. Phoenix + 2. Non-Phoenix + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin-main, sysadmin-dns +Location: + ServerBeach and ibiblio and internetx and phx2. +Servers: + ns01, ns02, ns03.phx2, ns04, ns05 +Purpose: + Provides DNS to our users + +Troubleshooting, Resolution and Maintenance + +Adding a new Host +================= + +Adding a new host requires to add it to DNS and to ansible, see new-hosts.rst for +the details. + +Editing the domain(s) +===================== + +We have three domains which needs to be able to change on demand for proxy +rotation/removal: + fedoraproject.org. + getfedora.org. + cloud.fedoraproject.org. + +The other domains are edited only when we add/subtract a host or move it to +a new ip. Not much else. + +If you need to edit a domain that is NOT In the above list: + +- change to the 'master' subdir, edit the domain as usual + (remember to update the serial), save it. + +If you need to edit one of the domains in the above list: +(replace fedoraproject.org with the domain from above) + +- if you need to add/change a host in fedoraproject.org that is not '@' or + 'wildcard' then: + + - edit fedoraproject.org.template + - make your changes + - do not edit the serial or anything surrounded by {{ }} unless you + REALLY know what you are doing. + +- if you need to only add/remove a proxy during an outage or due to + networking issue then run: + + - ``./zone-template fedoraproject.org.cfg disable ip [ip] [ip]`` + to disable the ip of the proxy you want removed. + - ``./zone-template fedoraproject.org.cfg enable ip [ip] [ip]`` + reverses the disable + - ``./zone-template fedoraproject.org.cfg reset`` + will reset to all ips enabled. + +- if you want to add an all new proxy as '@' or 'wildcard' for + fedoraproject.org: + + - edit fedoraproject.org.cfg + - add the ip to the correct section of the ipv4 or ipv6 in the config. + - save the file + - check the file for validity by running: ``python fedoraproject.org.cfg`` + looking for errors or tracebacks. + +In all cases then run: + +- ``./do-domains`` + +- if that completes successfully then run:: + + git add . + git commit -a -m 'description of your change here' + git push + +and then run this on all of the nameservers (as root):: + + /usr/local/bin/update-dns + + +To run this via ansible from batcave do:: + + sudo -i ansible ns\* -a "/usr/local/bin/update-dns" + + +this will pull from the git tree, update all of the zones and reload the +name server. + + + +DNS update +========== + +DNS config files are ansible managed on batcave01. + +From batcave01:: + + git clone /git/ansible + cd ansible/roles/dns/files/ + ...make changes needed... + git commit -m "What you did" + git push + +It should update within a half hour. You can test the new configs with dig:: + + dig @ns01.fedoraproject.org fedoraproject.org + +Adding a new zone +================= + +First name the zone and generate new set of keys for it. Run this on ns01. +Note it could take SEVERAL minutes to run:: + + /usr/sbin/dnssec-keygen -a RSASHA1 -b 1024 -n ZONE c.fedoraproject.org + /usr/sbin/dnssec-keygen -a RSASHA1 -b 2048 -n ZONE -f KSK c.fedoraproject.org + +Then copy the created .key and .private files to the private git repo (You +need to be sysadmin-main to do this). The directory is ``private/private/dnssec``. + +- add the zone in zones.conf in ``ansible/roles/dns/files/zones.conf`` +- save and commit - but do not push +- Add zone file to the master subdir in this repo +- git add and commit the file +- check the zone by running check-domains +- if you intend to have this be a dnssec signed zone then you must + - create a new key:: + + /usr/sbin/dnssec-keygen -a RSASHA1 -b 1024 -n ZONE $domain.org + /usr/sbin/dnssec-keygen -a RSASHA1 -b 2048 -n ZONE -f KSK $domain.org + + - put the files this generates into /srv/privatekeys/dnssec on batcave01 + - edit the do-domains file in this dir and your domain to the + signed_domains entry at the top + - edit the zone you just created and add the contents of the .key files + to the bottom of the zone + +If this is a subdomain of fedoraproject.org: + +- run dnssec-dsfromkey on each of the .key files generated +- paste that output into the bottom of fedoraproject.org.template +- commit everything to the dns tree +- push your changes +- push your changes to the ansible repo +- test + +If you add a new child zone, such as c.fedoraproject.org or +vpn.fedoraproject.org you will also need to add the contents of +dsset-childzone.fedoraproject.org (for example), to the main +fedoraproject.org zonefile, so that DNSSEC has a valid trust path to that +zone. + +You also must set the NS delegation entries near the top of fedoraproject.org zone file +these are necessary to keep dnssec-signzone from whining with this error msg:: + + dnssec-signzone: fatal: 'xxxxx.example.com': found DS RRset without NS RRset + +Look for the: "vpn IN NS" records at the top of fedoraproject.org and copy them for the new child zone. + + +fedorahosted.org template +========================= +we want to create a separate entry for each fedorahosted project - but we +do not want to have to maintain it later. So we have a simple map that +let's us put the ones which are different in there and know where they +should go. The map's format is:: + + projectname short_hostname-in-fedorahosted where it lives + +examples:: + + someproject git + someproject svn + someproject bzr + someproject hosted-super-crazy + +this will create cnames for each of them. + +running ``./do-domains`` will take care of all that and update the serial +automatically. + + +GeoDNS +====== + +As part of our Content Distribution Network we use geodns for certain +zones. At the moment just ``fedoraproject.org`` and ``*.fedoraproject.org`` zones. +We've got proxy servers all over the US and in Europe. We are +now sending users to proxy servers that are near them. The current list of +available 'zone areas' are: + +* DEFAULT +* EU +* NA + +DEFAULT contains all the zones. So someone who does not seem to be in or +near the EU, or NA would get directed to any random set. (South Africa +for example doesn't get directed to any particular server). + +.. important:: + Don't forget to increase the serial number in the fedoraproject.org zone + file. Even if you're making a change to one of the geodns IPs. There is + only one serial number for all setups and that serial number is in the + fedoraproject.org zone. + +.. note:: Non geodns fedoraproject.org IPs + If you're adding as server that is just in one location, and isn't going + to get geodns balanced. Just add that host to the fedoraproject.org zone. + +Adding and removing countries +----------------------------- + +Our setup actually requires us to specify which countries go to which +servers. To do this, simply edit the named.conf file in ansible. Below is +an example of what counts as "NA" (North America).:: + + view "NA" { + match-clients { US; CA; MX; }; + recursion no; + zone "fedoraproject.org" { + type master; + file "master/NA/fedoraproject.org.signed"; + }; + include "etc/zones.conf"; + }; + +IP Country Mapping +------------------ + +The IP -> Location mapping is done via a config file that exists on the +dns servers themselves (it's not ansible controlled). The file, located at +``/var/named/chroot/etc/GeoIP.acl`` is generated by the ``GeoIP.sh`` script +(that script is in ansible). + +.. warning:: + This is known to be a less efficient means of doing geodns than the + patched version from kernel.org. We're using this version at the moment + because it's in Fedora and works. The level of DNS traffic we see is + generally low enough that the inefficiencies aren't that noticed. For + example, average load on the servers before this geodns was .2, now it's + around .4 + +resolv.conf +=========== + +In order to make the network more transparent to the admins, we do a lot of +search based relative names. Below is a list of what a resolv.conf should +look like. + +.. important:: + Any machine that is not on our vpn or has not yet joined the vpn should + _NOT_ have the vpn.fedoraproject.org search until after it has been added + to the vpn (if it ever does) + +Phoenix + :: + + search phx2.fedoraproject.org vpn.fedoraproject.org fedoraproject.org + +Phoenix in the QA network: + :: + + search qa.fedoraproject.org vpn.fedoraproject.org phx2.fedoraproject.org fedoraproject.org + +Non-Phoenix + :: + + search vpn.fedoraproject.org fedoraproject.org + +The idea here is that we can, when need be, setup local domains to contact +instead of having to go over the VPN directly but still have sane configs. +For example if we tell the proxy server to hit "app1" and that box is in +PHX, it will go directly to app1, if its not, it will go over the vpn to +app1. diff --git a/docs/sysadmin-guide/sops/fas-notes.rst b/docs/sysadmin-guide/sops/fas-notes.rst new file mode 100644 index 0000000..a50860c --- /dev/null +++ b/docs/sysadmin-guide/sops/fas-notes.rst @@ -0,0 +1,130 @@ +.. title: Fedora Account System SOP +.. slug: infra-fas +.. date: 2013-04-04 +.. taxonomy: Contributors/Infrastructure + +===================== +Fedora Account System +===================== + +Notes about FAS and how to do things in it: + +- where are certs for fas accounts for koji, etc? + on fas01 /var/lib/fedora-ca - makefile targets allow you to do + things with them. + +look in index.txt for certs. One's marked with an 'R' in the left-most +column are 'REVOKED' + +to revoke a cert:: + + cd /var/lib/fedora-ca + +find the cert number in index.txt - the number is the 3rd column in the +file - you can match it to the user by searching for their username. You +want the highest number cert for their account. + +once you have the number you would run (as root or fas):: + + make revoke cert=newcerts/$that_number.pem + +How to gather information about a user +====================================== + +You'll want to have direct access to query the database for this. The common +way is to have someone in sysadmin-db ssh to the postgres db hosting FAS +(currently db01). Then access it via ident auth on the box:: + + sudo -u postgres psql fas2 + + +There are several tables that will have information about a user. Some of it +is redundant but it's good to check all the sources there shouldn't be +inconsistencies:: + + select * from people where username = 'USERNAME'; + +Of interest here are: + +:id: for later queries +:password_changed: tells when the password was last changed +:last_seen: last login to fas (including through jsonfas from other TG1/2 + apps. Maybe wiki and insight as well. Not fedorahosted trac, shell + login, etc) +:status_change: last time that the user's status was updated via the website. + Usually triggered when the user was marked inactive for a mass password + change and then they reset their password. + +Next table is the log table:: + + select * from log where author_id = ID_FROM_PREV_QUERY or description ~ '.*USERNAME.*'; + +The FAS writes certain events to the log table. This will get those events. +We use both the author_id field (who made the change) and the username in a +description regex search because a few changes are made to users by admins. +Fields of interest are pretty self explanatory here: + +:changetime: when the log was made +:description: description of the event that's being logged + +.. note:: FAS does not log every event that happens to a user. Only + "important" ones. FAS also cannot record direct changes to the database + here (for instance, when we mark accounts inactive administratively via + the db). + +Lastly, there's the groups and person_roles table. When a user joins a group, +the person_roles table is updated to reflect the user's status in the group, +when they applied, and when they were approved:: + + select groups.name, person_roles.* from person_roles, groups where person_id = ID_FROM_INITIAL_QUERY and groups.id = person_roles.group_id; + +This will give you the following fields to pay attention to: + +:name: Name of the group +:role_status: If this is unapproved, it just means the user applied for it. + If it is approved, it means they are actually in the group. +:creation: When the user applied to the group +:approval: When the user was approved to be in the group +:role_type: What role the person has or wants to have in the group +:sponsor_id: If you suspect something is suspicious with one of the roles, you + may want to ask the sponsor if they remember sponsoring this person + +Account Deletion and renaming +============================= + +.. note:: see also accountdeletion.rst + For information on how to disable, rename, and remove accounts. + +Pseudo Users +============ + +.. note:: see also nonhumanaccounts.rst + For information on creating pseudo user accounts for use in pkgdb/bugzilla + +fas staging +=========== + +we have a staging fas db setup on db-fas01.stg.phx2.fedoraproject.org - it accessed +by fas01.stg.phx2.fedoraproject.org + +This system is not autopopulated by production fas - it must be done manually. +To do this you must: + +- dump the fas2 db on db-fas01.phx2.fedoraproject.org:: + + sudo -u postgres pg_dump -C fas2 > fas2.dump + scp fas2.dump db-fas01.stg.phx2.fedoraproject.org:/tmp + +- then on fas01.stg.phx2.fedoraproject.org:: + + /etc/init.d/httpd stop + +- then on db02.stg.phx2.fedoraproject.org:: + + echo "drop database fas2\;" | sudo -u postgres psql ; cat fas2.dump | sudo -u postgres psql + +- then on fas01.stg.phx2.fedoraproject.org:: + + /etc/init.d/httpd start + +that should do it. diff --git a/docs/sysadmin-guide/sops/fas-openid.rst b/docs/sysadmin-guide/sops/fas-openid.rst new file mode 100644 index 0000000..419ddd5 --- /dev/null +++ b/docs/sysadmin-guide/sops/fas-openid.rst @@ -0,0 +1,52 @@ +.. title: FAS-OpenID +.. slug: infra-fas-openid +.. date: 2013-12-14 +.. taxonomy: Contributors/Infrastructure + +========== +FAS-OpenID +========== + + +FAS-OpenID is the OpenID server of Fedora infrastructure. + +Live instance is at https://id.fedoraproject.org/ +Staging instance is at https://id.dev.fedoraproject.org/ + +Contact Information +=================== + +Owner + Patrick Uiterwijk (puiterwijk) +Contact + #fedora-admin, #fedora-apps, #fedora-noc +Location + openid0{1,2}.phx2.fedoraproject.org + openid01.stg.fedoraproject.org +Purpose + Authentication & Authorization + +Trusted roots +============== + +FAS-OpenID has a set of "trusted roots", which contains websites which are +always trusted, and thus FAS-OpenID will not show the Approve/Reject form to +the user when they login to any such site. + +As a policy, we will only add websites to this list which Fedora +Infrastructure controls. If anyone ever ask to add a website to this list, +just answer with this default message:: + + We only add websites we (Fedora Infrastructure) maintain to this list. + + This feature was put in because it wouldn't make sense to ask for permission + to send data to the same set of servers that it already came from. + + Also, if we were to add external websites, we would need to judge their + privacy policy etc. + + Also, people might start complaining that we added site X but not their site, + maybe causing us "political" issues later down the road. + + As a result, we do NOT add external websites. + diff --git a/docs/sysadmin-guide/sops/fedmsg-certs.rst b/docs/sysadmin-guide/sops/fedmsg-certs.rst new file mode 100644 index 0000000..fe99a49 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-certs.rst @@ -0,0 +1,172 @@ +.. title: fedmsg Certificates SOP +.. slug: infra-fedmsg-certs +.. date: 2013-04-08 +.. taxonomy: Contributors/Infrastructure + +=================================================== +fedmsg (Fedora Messaging) Certs, Keys, and CA - SOP +=================================================== + +X509 certs, private RSA keys, Certificate Authority, and Certificate +Revocation List. + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-admin, #fedora-apps, #fedora-noc +Servers + - app0[1-7] + - packages0[1-2] + - fas0[1-3] + - pkgs01 + - busgateway01, + - value0{1,3} + - releng0{1,4} + - relepel03 +Purpose + Certify fedmsg messages come from authentic sources. + +Description +----------- + +fedmsg sends JSON-encoded messages from many services to a zeromq messaging +bus. We're not concerned with encrypting the messages, only with signing them +so an attacker cannot spoof. + +Every instance of each service on each host has its own cert and private key, +signed by the CA. By convention, we name the certs -.{crt,key} +For instance, bodhi has the following certs: + +- bodhi-app01.phx2.fedoraproject.org +- bodhi-app02.phx2.fedoraproject.org +- bodhi-app03.phx2.fedoraproject.org +- bodhi-app01.stg.phx2.fedoraproject.org +- bodhi-app02.stg.phx2.fedoraproject.org +- more + +Scripts to generate new keys, sign them, and revoke them live in the ansible +repo in ``ansible/roles/fedmsg/files/cert-tools/``. The keys and certs +themselves (including ca.crt and the CRL) live in the private repo in +``private/fedmsg-certs/keys/`` + +fedmsg is locally configured to find the key it needs by looking in +``/etc/fedmsg.d/ssl.py`` which is kept in ansible in +``ansible/roles/fedmsg/templates/fedmsg.d/ssl.py.erb``. + +Each service-host has its own key. This means: + +- A key is not shared across multiple instances of a service on + different machines. i.e., bodhi on app01 and bodhi on app02 should have + different key/cert pairs. + +- A key is not shared across multiple services on a host. i.e., mediawiki + on app01 and bodhi on app01 should have different key/cert pairs. + +The attempt here is to minimize the number of potential attack vectors. +Each private key should be readable only by the service that needs it. +bodhi runs under mod_wsgi in apache and should run as its own unique bodhi +user (not as apache). The permissions for its.phx2.fedoraproject.org +private_key, when deployed by ansible, should be read-only for that local +bodhi user. + +For more information on how fedmsg uses these certs see +http://fedmsg.readthedocs.org/en/latest/crypto.html + + +Configuring the Scripts +----------------------- + +Usage of the main scripts is described in more detail below. They are +located in ``ansible/rolesfedmsg/files/cert-tools``. + +Before you use them, you'll need to point them at the right directory to +modify. By default, this is ``~/private/fedmsg-certs/keys/``. You +can change that by editing ``ansible/roles/fedmsg/files/cert-tools/vars`` in +the event that you have the private repo checked out to an alternate location. + +There are other configuration values defined in that script. Most will not +need to be changed. + +Wiping and Rebuilding Everything +-------------------------------- + +There is a script in ``ansible/roles/fedmsg/files/cert-tools/`` named +``rebuild-all-fedmsg-certs``. You can run it with no arguments to wipe out +the old and generate a new CA root certificate, a signing cert and key, and +all key/cert pairs for all service-hosts. + +.. note:: Warning -- Obviously, this will wipe everything. Do you want that? + +Adding a new key for a new service-host +--------------------------------------- + +First, checkout the ansible private repo as that's where the keys are going +to be stored. The scripts will assume this is checked out to ~/private. + +In ``ansible/roles/fedmsg/files/cert-tools`` run:: + + $ source ./vars + $ ./build-and-sign-key - + +For instance, if we bring up a new app host, app10.phx2.fedoraproject.org, +we'll need to generate a new cert/key pair for each fedmsg-enabled service +that will be running on it, so you'd run:: + + $ source ./vars + $ ./build-and-sign-key shell-app10.phx2.fedoraproject.org + $ ./build-and-sign-key bodhi-app10.phx2.fedoraproject.org + $ ./build-and-sign-key mediawiki-app10.phx2.fedoraproject.org + +Just creating the keys isn't quite enough, there are four more things you'll +need to do. + +The private keys are created in your checkout of the private repo under +~/private/private/fedmsg-certs/keys . There will be four files for each cert +you created: .pem (ex: 5B.pem) and -.{crt,csr,key} +git add, commit, and push all of those. + +Second, You need to edit +``ansible/roles/fedmsg/files/cert-tools/rebuild-all-fedmsg-certs`` +and add the argument of the commands you just ran, so that next time certs need +to be blown away and recreated, the new service-hosts will be included. +For the examples above, you would need to add to the list: + shell-app10.phx2.fedoraproject.org + bodhi-app10.phx2.fedoraproject.org + mediawiki-app10.phx2.fedoraproject.org + +You need to ensure that the keys are distributed to the host with the proper +permissions. Only the bodhi user should be able to access bodhi's private +key. This can be accomplished by using the ``fedmsg::certificate`` in +ansible. It should distribute your new keys to the correct hosts and +correctly permission them. + +Lastly, if you haven't already updated the global fedmsg config, you'll need +to. You need to add your new service-node to ``fedmsg.d/endpoint.py`` and +to ``fedmsg.d/ssl.py``. Those can be found in +``ansible/roles/fedmsg/templates/fedmsg.d``. See +http://fedmsg.readthedocs.org/en/latest/config.html for more information on +the layout and meaning of those files. + +Revoking a key +-------------- + +In ``ansible/roles/fedmsg/files/cert-tools`` run:: + + $ source ./vars + $ ./revoke-full - + +This will alter ``private/fedmsg-certs/keys/crl.pem`` which should be +picked up and served publicly, and then consumed by all fedmsg consumers +globally. + +``crl.pem`` is publicly available at http://fedoraproject.org/fedmsg/crl.pem + +.. note:: Even though crl.pem lives in the private repo, we're just keeping + it there for convenience. It really *should* be served publicly, + so don't panic. :) + +.. note:: At the time of this writing, the CRL is not actually used. I need + one publicly available first so we can test it out. diff --git a/docs/sysadmin-guide/sops/fedmsg-gateway.rst b/docs/sysadmin-guide/sops/fedmsg-gateway.rst new file mode 100644 index 0000000..e66e891 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-gateway.rst @@ -0,0 +1,109 @@ +.. title: fedmsg-gateway SOP +.. slug: infra-fedmsg-gateway +.. date: 2012-10-31 +.. taxonomy: Contributors/Infrastructure + +================== +fedmsg-gateway SOP +================== + +Outgoing raw ZeroMQ message stream. + +.. note:: see also: fedmsg-websocket + +Contact Information +=================== + +Owner: + Messaging SIG, Fedora Infrastructure Team +Contact: + #fedora-apps, #fedora-admin, #fedora-noc +Servers: + busgateway01, proxy0* +Purpose: + Expose raw ZeroMQ messages outside the FI environment. + +Description +=========== + +Users outside of Fedora Infrastructure can listen to the production message +bus by connecting to specific addresses. This is required for local users to +run their own hubs and message processors ("Consumers"). It is also +required for user-facing tools like fedmsg-notify to work. + +The specific public endpoints are: + +production + tcp://hub.fedoraproject.org:9940 +staging + tcp://stg.fedoraproject.org:9940 + +fedmsg-gateway, the daemon running on busgateway01, is listening to the FI +production fedmsg bus and will relay every message that it receives out to a +special ZMQ pub endpoint bound to port 9940. haproxy mediates connections +to the fedmsg-gateway daemon. + +Connection Flow +=============== + +Clients connect through haproxy on proxy0*:9940 are redirected to +busgateway0*:9940. This can be found in the haproxy.cfg entry for +``listen fedmsg-raw-zmq 0.0.0.0:9940``. + +This is different than the apache reverse proxy pass setup we have for the +app0* and packages0* machines. *That* flow looks something like this:: + + Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01) + +The flow for the raw zmq stream provided by fedmsg-gateway looks something +like this:: + + Client -> haproxy(proxy01) -> fedmsg-gateway(busgateway01) + +haproxy is listening on a public port. + +At the time of this writing, haproxy does not actually load balance zeromq +session requests across multiple busgateway0* machines, but there is nothing +stopping us from adding them. New hosts can be added in ansible and pressed +from busgateway01's template. Add them to the fedmsg-raw-zmq listen in +haproxy's config and it should Just Work. + +Increasing the Maximum Number of Concurrent Connections +======================================================= + +HTTP requests are typically very short (a few seconds at most). This +means that the number of concurrent tcp connections we require for most +of our services is quite low (1024 is overkill). ZeroMQ tcp connections, +on the other hand, are expected to live for quite a long time. +Consequently we needed to scale up the number of possible concurrent tcp +connections. + +All of this is in ansible and should be handled for us automatically if we +bring up new nodes. + +- The pam_limits user limit for the fedmsg user was increased from + 1024 to 160000 on busgateway01. +- The pam_limits user limit for the haproxy user was increased from + 1024 to 160000 on the proxy0* machines. +- The zeromq High Water Mark (HWM) was increased to 160000 on + busgateway01. +- The maximum number of connections allowed was increased in haproxy.cfg. + +Nagios +====== + +New nagios checks were added for this that check to see if the number of +concurrent connections through haproxy is approaching the maximum number +allowed. + +You can check these numbers by hand by inspecting the haproxy web interface: +https://admin.fedoraproject.org/haproxy/proxy1#fedmsg-raw-zmq + +Look at the "Sessions" section. "Cur" is the current number of sessions +versus "Max", the maximum number seen at the same time and "Limit", the +maximum number of concurrent connections allowed. + +RHIT +==== + +We had RHIT open up port 9940 special to proxy01.phx2 for this. diff --git a/docs/sysadmin-guide/sops/fedmsg-introduction.rst b/docs/sysadmin-guide/sops/fedmsg-introduction.rst new file mode 100644 index 0000000..d8bb773 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-introduction.rst @@ -0,0 +1,63 @@ +.. title: fedmsg Intro SOP +.. slug: infra-fedmsg-intro +.. date: 2012-10-31 +.. taxonomy: Contributors/Infrastructure + +=================================== +fedmsg introduction and basics, SOP +=================================== + +General information about fedmsg + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-admin, #fedora-noc +Servers + Almost all of them. +Purpose + Introduce sysadmins to fedmsg tools and config + +Description +----------- + +fedmsg is a system that links together most of our webapps and services into +a message mesh or net (often called a "bus"). It is built on top of the +zeromq messaging library. + +fedmsg has its own developer documentation that is a good place to check if +this or other SOPs don't provide enough information - http://fedmsg.rtfd.org + +Tools +----- + +Generally, fedmsg-tail and fedmsg-logger are the two most commonly used +tools for debugging and testing. To see if bus-connectivity exists between +two machines, log onto each of them and run the following on the first:: + + $ echo testing from $(hostname) | fedmsg-logger + +And run the following on the second:: + + $ fedmsg-tail --really-pretty + +Configuration +------------- + +fedmsg configuration lives in /etc/fedmsg.d/ + +``/etc/fedmsg.d/endpoints.py`` keeps the list of every possible fedmsg endpoint. +It acts as a global index that defines the bus. + +See fedmsg.readthedocs.org/en/latest/config/ for a full glossary of +configuration values. + +Logs +---- + +fedmsg daemons keep their logs in /var/log/fedmsg. fedmsg message hooks in +existing apps (like bodhi) will log any errors to the logs of the app +they've been added to (like /var/log/httpd/error_log). diff --git a/docs/sysadmin-guide/sops/fedmsg-irc.rst b/docs/sysadmin-guide/sops/fedmsg-irc.rst new file mode 100644 index 0000000..e29c1d5 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-irc.rst @@ -0,0 +1,35 @@ +.. title: fedmsg IRC SOP +.. slug: infra-fedmsg-irc +.. date: 2014-02-13 +.. taxonomy: Contributors/Infrastructure +============== +fedmsg-irc SOP +============== + + Echo fedmsg bus activity to IRC. + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc +Servers + value03 +Purpose + Echo fedmsg bus activity to IRC + +Description +----------- + +fedmsg-irc is a daemon running on value03 and value01.stg. It is listening +to the fedmsg bus and echoing that activity to the #fedora-fedmsg channel in +IRC. + +It can be configured to ignore certain messages, join certain rooms, and +take on a different nick by editing the values in ``/etc/fedmsg.d/irc.py`` and +restarting it with ``sudo service fedmsg-irc restart`` + +See http://fedmsg.readthedocs.org/en/latest/config/#term-irc for more +information on configuration. diff --git a/docs/sysadmin-guide/sops/fedmsg-new-message-type.rst b/docs/sysadmin-guide/sops/fedmsg-new-message-type.rst new file mode 100644 index 0000000..bc7511a --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-new-message-type.rst @@ -0,0 +1,78 @@ +.. title: Adding a new fedmsg message type +.. slug: fedmsg-new-message-type +.. date: 2016-05-27 + +================================ +Adding a new fedmsg message type +================================ + + +Instrumenting the program +------------------------- +First, figure out how you're going to publish the message? Is it from a shell +script or from a long running process? + +If its from shell script, you need to just add a `fedmsg-logger` statement to +the script. Remember to set the `--modname` and `--topic` for your new +message's fully-qualified topic. + +If its from a python process, you need to just add a ``fedmsg.publish(..)`` +call. The same concerns about modname and topic apply here. + +If this is a short-lived python process, you'll want to add `active=True` to the +call to ``fedmsg.publish(..)``. This will make the fedmsg lib "actively" reach +out to our fedmsg-relay running on busgateway01. + +If it is a long-running python process (like a WSGI thread), then you don't need +to pass any extra arguments. You don't want it to reach out to the fedmsg-relay +if possible. Your process will require that some "endpoints" are created for it +in ``/etc/fedmsg.d/``. More on that below. + +Supporting infrastructure +------------------------- + + +You need to make sure that the machine this is running on has a cert and key +that can be read by the program to sign its message. If you don't have a cert +already, then you need to create it in the private repo. Ask a sysadmin-main +member. + +Then you need to declare those certs in the `fedmsg_certs` data structure stored +typically in our ansible ``group_vars/`` for this service. Declare both the +name of the cert, what group and user it should be owned by, and in the +``can_send:`` section, declare the list of topics that this cert should be +allowed to publish. + +If this is a long-running python process that is *not* passing `active=True` to +the call to `fedmsg.publish(..)`, then you have to also declare endpoints for +it. You do that by specifying the ``fedmsg_wsgi_procs`` and +``fedmsg_wsgi_vars`` in the ``group_vars`` for your service. The iptables rules +and fedmsg endpoints should be automatically created for you on the next +playbook run. + +Supporting code +--------------- + +At this point, you can push the change out to production and be publishing +messages "okay". Everything should be fine. + +However, your message will show up blank in datagrepper, in IRC, and in FMN, and +everywhere else we try to render it. You *must* then follow up and write a new +`Processor` for it in the fedmsg_meta library we maintain: +https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure + +You also *must* write a test case for it there. The docs listing all topics we +publish at http://fedora-fedmsg.rtfd.org/ is automatically generated from the +test suite. Please don't forget this. + +Lastly, you should cut a release of fedmsg_meta and deploy it using the +`playbooks/manual/upgrade/fedmsg.yml` playbook, which should update all the +relevant hosts. + +Corner cases +------------ + +If the process publishing the new message lives *outside* our main network, you +have to jump through more hoops. Look at abrt, koschei, and copr for examples +of how to configure this (you need a special firewall rule, and they need to be +configured to talk to our "inbound gateway" running on the proxies. diff --git a/docs/sysadmin-guide/sops/fedmsg-relay.rst b/docs/sysadmin-guide/sops/fedmsg-relay.rst new file mode 100644 index 0000000..2903794 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-relay.rst @@ -0,0 +1,66 @@ +.. title: fedmsg-relay SOP +.. slug: infra-fedmsg-relay +.. date: 2012-10-31 +.. taxonomy: Contributors/Infrastructure + +================ +fedmsg-relay SOP +================ + +Bridge ephemeral scripts into the fedmsg bus. + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-admin, #fedora-noc +Servers + app01 +Purpose + Bridge ephemeral bash and python scripts into the fedmsg bus. + +Description +----------- + +fedmsg-relay is running on app01, which is a bad choice. We should look to +move it to a more isolated place in the future. busgateway01 would be a +better choice. + +"Ephemeral" scripts like ``pkgdb2branch.py``, the post-receive git hook on +pkgs01, and anywhere fedmsg-logger is used all depend on fedmsg-relay. +Instead of emitting messages "directly" to the rest of the bus, they use +fedmsg-relay as an intermediary. + +Check that fedmsg-relay is running by looking for it in the process list. +You can restart it in the standard way with ``sudo service fedmsg-relay +restart``. Check for its logs in ``/var/log/fedmsg/fedmsg-relay.log`` + +Ephemeral scripts know where the fedmsg-relay is by looking for the +relay_inbound and relay_outbound values in the global fedmsg config. + +But What is it Doing? And Why? +------------------------------- + +The fedmsg bus is designed to be "passive" in its normal operation. A +mod_wsgi process under httpd sets up its fedmsg publisher socket to +passively emit messages on a certain port. When some other service wants +to receive these messages, it is up to that service to know where mod_wsgi +is emitting and to actively connect there. In this way, emitting is passive +and listening is active. + +We get a problem when we have a one-off or "ephemeral" script that is not a +long-running process -- a script like pkgdb2branch which is run when a user +runs it and which ends shortly after. Listeners who want these scripts +messages will find that they are usually not available when they try to +connect. + +To solve this problem, we introduced the "fedmsg-relay" daemon which is a +kind of "passive"-to-"passive" adaptor. It binds to an outbound port on one +end where it will publish messages (like normal) but it also binds to an +another port where it listens passively for inbound messages. Ephemeral +scripts then actively connect to the passive inbound port of the +fedmsg-relay to have their payloads echoed on the bus-proper. + +See http://fedmsg.readthedocs.org/en/latest/topology/ for a diagram. diff --git a/docs/sysadmin-guide/sops/fedmsg-websocket.rst b/docs/sysadmin-guide/sops/fedmsg-websocket.rst new file mode 100644 index 0000000..8dc10cd --- /dev/null +++ b/docs/sysadmin-guide/sops/fedmsg-websocket.rst @@ -0,0 +1,76 @@ +.. title: websocket SOP +.. slug: infra-websocket +.. date: 2012-10-31 +.. taxonomy: Contributors/Infrastructure + +============= +websocket SOP +============= + +websocket communication with Fedora apps. + +see-also: ``fedmsg-gateway.txt`` + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-admin, #fedora-noc +Servers + busgateway01, proxy0*, app0* +Purpose + Expose a websocket server for FI apps to use + +Description +----------- + +WebSocket is a protocol (an extension of HTTP/1.1) by which client web +browsers can establish full-duplex socket communications with a server -- +the "real-time web". + +In our case, webapps served from app0* and packages0* will include +javascript code instructing client browsers to establish a second connection +to our WebSocket server. They point browsers to the following addresses: + +production + wss://hub.fedoraproject.org:9939 +staging + wss://stg.fedoraproject.org:9939 + +The websocket server itself is a fedmsg-hub daemon running on busgateway01. +It is configured to enable its websocket server component in the presence of +certain configuration values. + +haproxy mediates connections to the fedmsg-hub websocket server daemon. +An stunnel daemon provides SSL support. + +Connection Flow +--------------- + +The connection flow is much the same as in the fedmsg-gateway.txt SOP, but +is somewhat more complicated. + +"Normal" HTTP requests to our app servers traverse the following chain:: + + Client -> apache(proxy01) -> haproxy(proxy01) -> apache(app01) + +The flow for a websocket requests looks something like this:: + + Client -> stunnel(proxy01) -> haproxy(proxy01) -> fedmsg-hub(busgateway01) + +stunnel is listening on a public port, negotiates the SSL connection, and +redirects the connection to haproxy who in turn hands it off to +the fedmsg-hub websocket server listening on busgateway01. + +At the time of this writing, haproxy does not actually load balance zeromq +session requests across multiple busgateway0* machines, but there is nothing +stopping us from adding them. New hosts can be added in ansible and pressed +from busgateway01's template. Add them to the fedmsg-websockets listen in +haproxy's config and it should Just Work. + +RHIT +---- + +We had RHIT open up port 9939 special to proxy01.phx2 for this. diff --git a/docs/sysadmin-guide/sops/fedocal.rst b/docs/sysadmin-guide/sops/fedocal.rst new file mode 100644 index 0000000..be7403c --- /dev/null +++ b/docs/sysadmin-guide/sops/fedocal.rst @@ -0,0 +1,39 @@ +.. title: Fedocal SOP +.. slug: infra-fedocal +.. date: 2016-01-04 +.. taxonomy: Contributors/Infrastructure + +====================== +Fedocal SOP +====================== + +Fedocal is a web-based group calender application that is made available to the various groups with in the Fedora project. + +Contents +======== + +1. Contact Information +2. Documentation Links + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + https://apps.fedoraproject.org/calendar +Servers + +Purpose + To provide links to the documentation for fedocal, as it exists elsewhere on the internet and it was decided that a link document would be a better use of resources than to rewrite the book. + +Documentation Links +=================== + +For information on the latest and greatest in fedocal please review: http://fedocal.readthedocs.org/en/latest/ + +For documentation on the usage of fedocal please consult: http://fedocal.readthedocs.org/en/latest/usage.html + + diff --git a/docs/sysadmin-guide/sops/fedora-releases.rst b/docs/sysadmin-guide/sops/fedora-releases.rst new file mode 100644 index 0000000..e911133 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedora-releases.rst @@ -0,0 +1,399 @@ +.. title: Fedora Release Infrastructure SOP +.. slug: infra-releng +.. date: 2015-03-10 +.. taxonomy: Contributors/Infrastructure + +================================= +Fedora Release Infrastructure SOP +================================= + +This SOP contains all of the steps required by the Fedora Infrastructure +team in order to get a release out. Much of this work overlaps with the +Release Engineering team (and at present share many of the same members). +Some work may get done by releng, some may get done by Infrastructure, as +long as it gets done, it doesn't matter. + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team, Fedora Release Engineering Team +Contact: + #fedora-admin, #fedora-releng, sysadmin-main, sysadmin-releng +Location: + N/A +Servers: + All +Purpose: + Releasing a new version of Fedora + +Preparations +============ + +Before a release ships, the following items need to be completed. + +1. New website from the websites team (typically hosted at + http://getfedora.org/_/) + +2. Verify mirror space (for all test releases as well) + +3. Verify with rel-eng permissions on content are right on the mirrors. Don't leak. + +4. Communication with Red Hat IS (Give at least 2 months notice, then + reminders as the time comes near) (final release only) + +5. Infrastructure change freeze + +6. Modify Template:FedoraVersion to reference new version. (Final release only) + +7. Move old releases to archive (post final release only) + +8. Switch release from development/N to normal releases/N/ tree in mirror + manager (post final release only) + +Change Freeze +============= + +The rules are simple: + +* Hosts with the ansible variable "freezes" "True" are frozen. + +* You may make changes as normal on hosts that are not frozen. + (For example, staging is never frozen) + +* Changes to frozen hosts requires a freeze break request sent to + the fedora infrastructure list, containing a description of the + problem or issue, actions to be taken and (if possible) patches + to ansible that will be applied. These freeze breaks must then get + two approvals from sysadmin-main or sysadmin-releng group members + before being applied. + +* Changes to recover from outages are acceptable to frozen hosts if needed. + +Change freezes will be sent to the fedora-infrastructure-list and begin 2 +weeks before each release and the final release. The freeze will end one +day after the release. Note, if the release slips during a change freeze, +the freeze just extends until the day after a release ships. + +You can get a list of frozen/non-frozen hosts by:: + + git clone https://infrastructure.fedoraproject.org/infra/ansible.git + scripts/freezelist -i inventory/inventory + +Notes about release day +======================= + +Release day is always an interesting and unique event. After the final +sprint from test to the final release a lot of the developers will be +looking forward to a bit of time away, as well as some sleep. Once Release +Engineering has built the final tree, and synced it to the mirrors it is +our job to make sure everything else (except the bit flip) gets done as +painlessly and easily as possible. + +.. note:: All communication is typically done in #fedora-admin. Typically these + channels are laid back and staying on topic isn't strictly enforced. On + release day this is not true. We encourage people to come, stay in the + room and be quiet unless they have a specific task or question releated to + release day. Its nothing personal, but release day can get out of hand + quick. + +During normal load, our websites function as normal. This is especially +true since we've moved the wiki to mod_fcgi. On release day our load +spikes a great deal. During the Fedora 6 launch many services were offline +for hours. Some (like the docs) were off for days. A large part of this +outage was due to the wiki not being able to handle the load, part was a +lack of planning by the Infrastructure team, and part is still a mystery. +(There are questions as to whether or not all of the traffic was legit or +a ddos. + +The Fedora 7 release went much better. Some services were offline for +minutes at a time but very little of it was out longer then that. The wiki +crashed, as it always does. We had made sure to make the fedoraproject.org +landing page static though. This helped a great deal though we did see +load on the proxy boxes as spiky. + +Recent releases have been quite smooth due to a number of changes: we +have a good deal more bandwith on master mirrors, more cpus and memory, +as well as prerelease versions are much easier to come by for those +interested before release day. + +Day Prior to Release Day +======================== + +Step 1 (Torrent) +---------------- +Setup the torrent. All files can be synced with the torrent box +but just not published to the world. Verify with sha1sum. Follow the +instructions on the torrentrelease.txt sop up to and including step 4. + +Step 2 (Website) +---------------- + +Verify the website design / content has been finalized with the websites +team. Update the Fedora version number wiki template if this is a final +release. It will need to be changed in https://fedoraproject.org/wiki/Template:CurrentFedoraVersion + +Additionally, there are redirects in the ansible +playbooks/include/proxies-redirects.yml file for Cloud +Images. These should be pushed as soon as the content is available. +See: https://fedorahosted.org/fedora-infrastructure/ticket/3866 for example + +Step 3 (Mirrors) +---------------- + +Verify enough mirrors are setup and have Fedora ready for release. If for +some reason something is broken it needs to be fixed. Many of the mirrors +are running a check-in script. This lets us know who has Fedora without +having to scan everyone. Hide the Alpha, Beta, and Preview releases from +the publiclist page. + +You can check this by looking at:: + + wget "http://mirrors.fedoraproject.org/mirrorlist?path=pub/fedora/linux/releases/test/20-Alpha&country=global" + + (replace 20 and Alpha with the version and release.) + +Release day +=========== + +Step 1 (Prep and wait) +---------------------- + +Verify the mirrors are ready and that the torrent has valid copies of its +files (use sha1sum) + +Do not move on to step two until the Release Engineering team has given +the ok for the release. It is the releng team's decision as to whether or +not we release and they may pull the plug at any moment. + +Step 2 (Torrent) +---------------- + +Once given the ok to release, the Infrastructure team should publish the +torrent and encourage people to seed. Complete the steps on the +http://infrastructure.fedoraproject.org/infra/docs/torrentrelease.txt +after step 4. + +Step 3 (Bit flip) +----------------- + +The mirrors sit and wait for a single permissions bit to be altered so +that they show up to their services. The bit flip (done by the releng +team) will replicate out to the mirrors. Verify that the mirrors have +received the change by seeing if it is actually available, just use a spot +check. Once that is complete move on. + +Step 4 (Taskotron) (final release only) +--------------------------------------- + +Please file a Taskotron ticket and ask for the new release support to be +added (log in to Phabricator using your FAS_account@fedoraproject.org email +address) +https://phab.qadevel.cloud.fedoraproject.org/maniphest/task/edit/form/default/?title=new%20Fedora%20release&priority=80&tags=libtaskotron + +Step 5 (Website) +---------------- + +Once all of the distribution pieces are verified (mirrors and torrent), +all that is left is to publish the website. At present this is done by +making sure the master branch of fedora-web is pulled by the syncStatic.sh +script in ansible. It will sync in an hour normally but on release day +people don't like to wait that long so do the following on sundries01 + + sudo -u apache /usr/local/bin/lock-wrapper syncStatic 'sh -x /usr/local/bin/syncStatic' + +Once that completes, on batcave01:: + + sudo -i ansible proxy\* "/usr/bin/rsync --delete -a --no-owner --no-group bapp02::getfedora.org/ /srv/web/getfedora.org/" + +Verify http://getfedora.org/ is working. + +Step 6 (Docs) +------------- + +Just as with the website, the docs site needs to be published. Just as +above follow the following steps:: + + /root/bin/docs-sync + +Step 7 (Monitor) +---------------- + +Once the website is live, keep an eye on various news sites for the +release announcement. Closely watch the load on all of the boxes, proxy, +application and otherwise. If something is getting overloaded, see +suggestions on this page in the "Juggling Resources" section. + +Step 8 (Badges) (final release only) +------------------------------------ + +We have some badge rules that are dependent on which release of Fedora +we're on. As you have time, please performs the following on your local +box:: + + $ git clone ssh://git.fedorahosted.org/git/badges.git + $ cd badges + +Edit ``rules/tester-it-still-works.yml`` and update the release tag to match +the now old but stable release. For instance, if we just released fc21, +then the tag in that badge rule should be fc20. + +Edit ``rules/tester-you-can-pry-it-from-my-cold-dead-hands.yml`` and update +the release tag to match the release that is about to reach EOL. For +instance, if we just released fc21, then the tag in that badge rule +should be fc19. Commit the changes:: + + $ git commit -a -m 'Updated tester badge rule for f21 release.' + $ git push origin master + +Then, on batcave, perform the following:: + + $ sudo -i ansible-playbook $(pwd)/playbooks/manual/push-badges.yml + +Step 9 (Done) +-------------- + +Just chill, keep an eye on everything and make changes as needed. If you +can't keep a service up, try to redirect randomly to some of the mirrors. + +Priorities +========== + +Priorities of during release day (In order): + +1. Website + Anything related to a user landing at fedoraproject.org, and + clicking through to a mirror or torrent to download something must be + kept up. This is distribution, and without it we can potentially lose + many users. + +2. Linked addresses + We do not have direct control over what Digg, + Slashdot or anyone else links to. If they link to something on the + wiki and it is going down or link to any other site we control a + rewrite should be put in place to direct them to + http://fedoraproject.org/get-fedora. + +3. Torrent + The torrent server has never had problems during a release. + Make sure it is up. + +4. Release Notes + Typically grouped with the docs site, the release + notes are often linked to (this is fine, no need to redirect) but keep + an eye on the logs and ensure that where we've said the release notes + are, that they can be found there. In previous releases we sometimes + had to make this available in more than one spot. + +5. docs.fedoraproject.org + People will want to see whats new in Fedora + and get further documentation about it. Much of this is in the release + notes. + +6. wiki + Because it is so resource heavy, and because it is so developer + oriented we have no choice but to give the wiki a lower priority. + +7. Everything else. + +Juggling Resources +================== + +In our environment we're running different things on many different +servers. Using Xen we can easily give machines more or less ram, +processors. We can take down builders and bring up application servers. +The trick is to be smart and make sure you understand what is causing the +problem. These are some tips to keep in mind: + +* IPTables based bandwidth and connection limiting (successful in the + past) + +* Altering the weight on the proxy balancers + +* Create static pages out of otherwise dynamic content + +* Redirect pages to a mirror + +* Add a server / remove un-needed servers + +CHECKLISTS: +=========== + +Alpha: +------ + +* Announce infrastructure freeze 2 weeks before Alpha +* Change /topic in #fedora-admin +* mail infrastucture list a reminder. +* File all tickets +* new website, check mirror permissions, mirrormanager, check +* mirror sizes, release day ticket. + +After release is a "go": + +* Make sure torrents are setup and ready to go. +* fedora-web needs a branch for fN-alpha. In it: + * Alpha used on get-prerelease + * get-prerelease doesn't direct to release + * verify is updated with Alpha info + * releases.txt gets a branched entry for preupgrade + * bfo gets updated to have a Alpha entry. + +After release: + +* Update /topic in #fedora-admin +* post to infrastructure list that freeze is over. + +Beta: +----- + +* Announce infrastructure freeze 2 weeks before Beta +* Change /topic in #fedora-admin +* mail infrastucture list a reminder. +* File all tickets +* new website +* check mirror permissions, mirrormanager, check + mirror sizes, release day ticket. + +After release is a "go": + +* Make sure torrents are setup and ready to go. +* fedora-web needs a branch for fN-beta. In it: +* Beta used on get-prerelease +* get-prerelease doesn't direct to release +* verify is updated with Beta info +* releases.txt gets a branched entry for preupgrade +* bfo gets updated to have a Beta entry. + +After release: + +* Update /topic in #fedora-admin +* post to infrastructure list that freeze is over. + +Final: +------ + +* Announce infrastructure freeze 2 weeks before Final +* Change /topic in #fedora-admin +* mail infrastucture list a reminder. +* File all tickets +* new website, check mirror permissions, mirrormanager, check +* mirror sizes, release day ticket. + +After release is a "go": + +* Make sure torrents are setup and ready to go. +* fedora-web needs a branch for fN-alpha. In it: +* get-prerelease does direct to release +* verify is updated with Final info +* bfo gets updated to have a Final entry. +* update wiki version numbers and names. + +After release: + +* Update /topic in #fedora-admin +* post to infrastructure list that freeze is over. +* Move MirrorManager repository tags from the development/$version/ + Directory objects, to the releases/$version/ Directory objects. This is + done using the ``move-devel-to-release --version=$version`` command on bapp02. + This is usually done now a week or two after release. diff --git a/docs/sysadmin-guide/sops/fedorahosted-fedmsg.rst b/docs/sysadmin-guide/sops/fedorahosted-fedmsg.rst new file mode 100644 index 0000000..fd08621 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorahosted-fedmsg.rst @@ -0,0 +1,106 @@ +.. title: Fedmsg Fedorahosted SOP +.. slug: infra-fedorahosted-fedmsg +.. date: 2013-08-21 +.. taxonomy: Contributors/Infrastructure + +====================================== +Fedorahosted FedMsg Infrastructure SOP +====================================== + +Publish fedmsg messages from Fedora Hosted trac instances. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-apps #fedora-admin, sysadmin-hosted +Location + Serverbeach +Servers + hosted03, hosted04 +Purpose + Broadcast trac activity for select projects (opt-in) + +Description +=========== + +fedmsg activity is usually an all-or-nothing proposition. We emit messages +for all koji jobs and all bodhi updates, or none. + +fedmsg activity for Fedora Hosted is another story. We provide the option +for project owners to opt-in to fedmsg and have their activity broadcast, +but it is off by default. + +This document describes how to: + +1. Enable the fedmsg plugin for a fedora hosted project. +2. Setup the fedmsg plugin on a new node. + +Enable the fedmsg plugin for a fedora hosted project. +===================================================== + +Enable the trac plugin +---------------------- + +The trac-fedmsg-plugin package should be installed, but disabled. + +Edit ``/srv/web/trac/projects/$PROJECT/conf/trac.ini``. Under the [components] section add:: + + trac_fedmsg_plugin.* = enabled + +And restart apache with "sudo apachectl graceful" + +Enable the git hook +------------------- + +There is an ansible playbook that does this. There is no +need to do it by hand anymore. Run:: + + $ sudo -i ansible-playbook \ + /srv/web/infra/ansible/playbooks/fedorahosted_fedmsg_git.yml \ + --extra-vars '{"repos":["yanex.git"]}' + + +Enabling by hand +````````````````` + +*If* you were to do it by hand, without the playbook, you could follow +the instructions below: Make a backup of the old post-receive hook. It +should be empty when you encounter it, but just to be safe:: + + $ mv /srv/git/$PROJECT.git/hooks/post-receive \ + /srv/git/$PROJECT.git/hooks/post-receive.orig + +Then, symlink in the new post-receive hook with:: + + $ ln -s /usr/local/share/git/hooks/post-receive-fedorahosted-fedmsg \ + /srv/git/$PROJECT.git/hooks/post-receive + +That hooks is managed by ansible -- if you want to modify it you can do +so there. + +.. note:: IF there was an old post-receive hook in place, you should + check to see if it did something important. The 'fedora-web' git + repo (which was converted early on) had such a hook. See + /srv/git/fedora-web.git/hooks for an example of how to handle + multiple git hooks. Something like + /usr/share/git-core/post-receive-chained can be used to chain the + hook across multiple scripts. + + +How to setup the fedmsg plugin on a new fedorahosted node. +========================================================== + +1) Create certs for the new node as per the fedmsg-certs doc. + +2) Declare those certs in `/etc/fedmsg.d/ssl.py`` globally. + +3) Declare endpoints for the new node in ``/etc/fedmsg.d/endpoints.py``. + +4) Use our configuration management tool to distribute that new global + fedmsg config to the new node and all other nodes. + +5) Install the trac-fedmsg-plugin package on the new node and follow the + steps above. diff --git a/docs/sysadmin-guide/sops/fedorahosted-project-cleanup.rst b/docs/sysadmin-guide/sops/fedorahosted-project-cleanup.rst new file mode 100644 index 0000000..31f2380 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorahosted-project-cleanup.rst @@ -0,0 +1,82 @@ +.. title: Fedorahosted Cleanup SOP +.. slug: infra-fedorahosted-cleanup +.. date: 2011-10-10 +.. taxonomy: Contributors/Infrastructure + +====================================== +FH-Projects-Cleanup Infrastructure SOP +====================================== + +Contents + +1. Introduction +2. Our first move +3. Removing Project's git repo +4. Removing Trac's project +5. Removing Project's ML +6. FAS Group Removal + +Introduction +============ + +This wiki page will help any sysadmin having a [50]Fedora Hosted Project +completely removed either because the owner requested to have it removed +or for whatever any other issue that would take us to remove a project. +This page covers git, Trac, Mailing List and FAS group clean-up. + +Our first move +============== + +If you are going to remove a Fedora Hosted's project, please remember to +create a folder into /srv/tmp that should follow the following syntax:: + + cd /srv/tmp && mkdir $project-hold-until-xx-xx-xx + +where xx-xx-xx should be substituted with the date everything should be +purged away from there. (it happens 14 days after the delete request) + +Removing Project's git repo +=========================== + +Having a git repository removed can be achieved with the following steps:: + + ssh uid@fedorahosted.org + cd /git + mv $project.git/ /srv/tmp/$project-hold-until-xx-xx-xx/ + +We're done with git! + +Removing Trac's project +======================= + +Steps are:: + + ssh uid@fedorahosted.org + cd /srv/web/trac/projects + mv $project/ /srv/tmp/$project-hold-until-xx-xx-xx/ + +and...that's all! + +Removing Project's ML +===================== + +We have two options here: + +Delete a list, but keep the archives:: + + sudo /usr/lib/mailman/bin/rmlist + +Delete a list and its archives:: + + sudo /usr/lib/mailman/bin/rmlist -a + +If you are going to completely remove the Mailing List and its archives, +please make sure the list is empty and there are no subscribers in it. + +FAS Group Removal +================= + +Not every Fedora sysadmin can have this done. See +[51]ISOP:ACCOUNT_DELETION for information. You may want to remove the +group or simply disable it. + diff --git a/docs/sysadmin-guide/sops/fedorahosted-repo-setup.rst b/docs/sysadmin-guide/sops/fedorahosted-repo-setup.rst new file mode 100644 index 0000000..768f092 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorahosted-repo-setup.rst @@ -0,0 +1,300 @@ +.. title: Fedorahosted Repository Setup SOP +.. slug: infra-fedorahosted-repo-setup +.. date: 2014-09-24 +.. taxonomy: Contributors/Infrastructure + +======================= +Hosted repository setup +======================= + +Fedora provides SCM repositories for open source projects. + +Contents + +1. Mercurial Repository + + 3. Repo Setup + 2. Commit Mail + +2. Git Repository + + 1. Repo Setup + 2. Commit Mail + +3. Bazaar Repository +4. SVN Repository + + 1. Repo Setup + 2. Commit Mail + +Mercurial Repository +==================== + +You'll need to know three things in order to start the mercurial +repository. + +PROJECTNAME + what the project wants to be called. + +OLDURL + how to access the project's current sourcecode in their + mercurial repository. + +PROJECTGROUP + the group setup in the account system for readwrite + access to the repository. + +Repo Setup +---------- + +The Mercurial repository lives on the hosted server. Access it by logging +into hosted1 Then follow these steps: + +1. Fetch latest content from the FAS Database.:: + + $ fasClient -i -f + +2. Create the repo:: + + $ cd /hg + $ sudo hg clone -U $OLDURL $PROJECTNAME (or sudo mkdir $PROJECTNAME; cd $PROJECTNAME; sudo hg init) + $ sudo find $PROJECTNAME -type d -exec chmod g+s \{\} \; + $ sudo chmod -R g+w $PROJECTNAME + $ sudo chown -R root:$PROJECTGROUP $PROJECTNAME + +This should setup all the files needed for the repository. + +Commit Mail +----------- + +The Mercurial Notify extension can be used to send out email when +commits are pushed to a Mecurial repository. To enable notifications, +create the file ``/hg/$PROJECTNAME/.hg/hgrc``:: + + [extensions] + hgext.notify = + + [hooks] + changegroup.notify = python:hgext.notify.hook + + [email] + from = admin@fedoraproject.org + + [smtp] + host = localhost + + [web] + baseurl = http://hg.fedorahosted.org/hg + + [notify] + sources = serve push pull bundle + test = False + config = /hg/$PROJECTNAME/.hg/subscriptions + maxdiff = -1 + +And the file ``/hg/$PROJECTNAME/.hg/subscriptions``:: + + [usersubs] + + user@host = * + + [reposubs] + +Git Repository +-------------- + +You'll need to know several things in order to start the git repository. + + +PROJECTNAME + what the project wants to be called. + +OLDURL + how to access the project's current source code in their git repository. + +PROJECTGROUP + the group setup in the account system for write access to the repository. + +COMMITLIST + comma-separated list of email addresses for commits (optional) + +DESCRIPTION + description of the project (optional) + +PROJECTOWNER + the FAS username of the project owner + +Repo Setup +---------- + +The git repository lives on the hosted server. Access it by logging into +hosted1 Then follow these steps: + +Fetch latest content from the FAS Database.:: + + $ sudo fasClient -i -f + + $ cd /git + +Clone an existing repository:: + + $ sudo git clone --bare $OLDURL $PROJECTNAME.git + $ cd $PROJECTNAME.git + $ sudo git config core.sharedRepository true + $ # + $ ## or + $ # + $ # Create a new repository: + $ sudo mkdir $PROJECTNAME.git + $ cd $PROJECTNAME.git + $ sudo git init --bare --shared=true + +Give the repository a nice description for gitweb:: + + $ echo $DESCRIPTION | sudo tee description > /dev/null + +Setup and run post-update hook. + +..note:: + We symlink this because /git is on a filesystem with noexec set) + +:: + + $ sudo ln -svf /usr/share/git-core/templates/hooks/post-update.sample ./hooks/post-update + $ sudo git update-server-info + +Ensure ownership and modes are correct:: + + $ sudo find -type d -exec chmod g+s \{\} \; + $ sudo find -perm /u+w -a ! -perm /g+w -exec chmod g+w \{\} \; + $ sudo chown -R $PROJECTOWNER:$PROJECTGROUP . + +This should setup all the files needed for the repository. The repository +owner can push changes into the repo by running:: + + $ git push ssh://git.fedorahosted.org/git/$PROJECTNAME.git/ master + +from within their local git repository. + +Commit Mail +----------- + +If they want commit mail, then there are a couple of additional steps.:: + + $ cd /git/$PROJECTNAME.git + $ sudo git config hooks.mailinglist $COMMITLIST + $ sudo git config hooks.maildomain fedoraproject.org + $ sudo git config hooks.emailprefix "[$PROJECTNAME]" + $ sudo git config hooks.repouri "http://git.fedorahosted.org/cgit/$PROJECTNAME.git" + $ sudo ln -svf /usr/share/git-core/post-receive-chained ./hooks/post-receive + $ sudo mkdir ./hooks/post-receive-chained.d + $ sudo ln -svf /usr/local/bin/git-notifier ./hooks/post-receive-chained.d/post-receive-email + $ sudo ln -svf /usr/local/share/git/hooks/post-receive-fedorahosted-fedmsg ./hooks/post-receive-chained.d/post-receive-fedmsg + +Bazaar Repository +================= +You'll need to know three things in order to start a bazaar repository. + + +PROJECTNAME + what the project wants to be called. + +OLDBRANCHURL + how to access the project's current sourcecode in + their previous bazaar repository. Note that a project may have + multiple branches that they want to import. Each branch will have a + separate URL. (The project can import the new branches after the + repository is created if they want.) + +PROJECTGROUP + the group setup in the account system for readwrite + access to the repository. + +Repo Setup +---------- + +The bzr repository lives on the hosted server. Access it by logging into +hosted1 then follow these steps: + +The first stage is to create the Bazaar repository. + +Fetch latest content from the FAS Database.:: + + $ fasClient -i -f + + $ cd /srv/bzr/ + $ # This creates a Bazaar repository which has shared storage between branches + $ sudo bzr init-repo $PROJECTNAME --no-trees + $ cd $PROJECTNAME + $ sudo bzr branch $OLDURL + $ sudo bzr branch $OLDURL2 + $ # [...] + $ sudo bzr branch $OLDURLN + $ cd .. + $ sudo find $PROJECTNAME -type d -exec chmod g+s \{\} \; + $ sudo chmod -R g+w $PROJECTNAME + $ sudo chown -R root:$PROJECTGROUP $PROJECTNAME + +This should be all that is needed. To checkout run:: + + bzr init-repo $MYLOCALPROJECTREPO + cd $MYLOCALPROJECTREPO + bzr branch bzr+ssh://bzr.fedorahosted.org/bzr/$PROJECTNAME/$BRANCHNAME + bzr branch bzr://bzr.fedorahosted.org/bzr/$PROJECTNAME/$BRANCHNAME/ + +.. note:: + If the end user checks out a branch without creating their own + repository they will need to create a local working tree by doing the + following:: + + cd $BRANCHNAME + bzr checkout --lightweight + +SVN Repository +============== + +You'll need to know two things in order to start a svn repository. + + +PROJECTNAME + what the project wants to be called. + +PROJECTGROUP + The Fedora account system group with read-write + access. + +COMMITLIST + comma-separated list of email addresses for commits + (optional) + +Repo Setup +---------- + +SVN lives on the hosted server. Access it by logging into hosted1. Then +run the following steps: + +Fetch latest content from the FAS Database.:: + + $ fasClient -i -f + +Create the repo:: + + $ cd /svn/ + $ sudo svnadmin create $PROJECTNAME + $ cd $PROJECTNAME + $ sudo chgrp -R $PROJECTGROUP . + $ sudo chmod -R g+w . + $ sudo find -type d -exec chmod g+s \{\} \; + +This should be all that is needed. To checkout run:: + + svn co svn+ssh://svn.fedorahosted.org/svn/$PROJECTNAME + +Commit Mail +----------- + +If they want commit mail, then there are a couple of additional steps.:: + + $ echo $COMMITLIST | sudo tee ./commit-list > /dev/null + $ sudo ln -sv /usr/bin/fedora-svn-commit-mail-hook ./hooks/post-commit + diff --git a/docs/sysadmin-guide/sops/fedorahosted.rst b/docs/sysadmin-guide/sops/fedorahosted.rst new file mode 100644 index 0000000..6c1a3b7 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorahosted.rst @@ -0,0 +1,114 @@ +.. title: Fedorahosted Infrastructure SOP +.. slug: infra-fedorahosted +.. date: 2014-09-22 +.. taxonomy: Contributors/Infrastructure + +=============================== +Fedorahosted Infrastructure SOP +=============================== + +Provide hosting place for open source projects. + +.. important:: + This page is for administrators only. People wishing to request a hosted + project should use the Ticketing System ; see the + new project request template. (Requires Fedora Account) + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-hosted +Location + Serverbeach +Servers + hosted03, hosted04 +Purpose + Provide hosting place for open source projects + +Description +=========== + +fedorahosted.org can be used to host open source projects. It provides the +following facilities: + +1. An scm for maintaining the code. The currently supported SCMs include + Mercurial, Git, Bazaar, or SVN. There is no cvs. +2. A trac instance, which provides a mini-wiki for hosting information + and also provides a ticketing system. +3. A mailing list + +How to setup a new hosted project +================================= + +1. Create source group in Fedora Account System of the form + ex ``gitepel``, ``svnkernel``, etc + +2. Create source repo + +3. Log into hosted03 + +4. Create new project space:: + + sudo /usr/local/bin/hosted-setup.sh + + * must use the same case as the scm repo + * You're likely to end up with:: + + 'Command failed: columns username, action are not unique' + + this can be safely ignored as this only tries to tell you + that you are giving admin access to a person already + having admin access. + +5. If a mailing list is desired, follow the directions for the mailman SOP. + +How to import data from a cvs repo into git repo +================================================ + +Often users request their git repos to be imported from an existing cvs +repo. This is a two step process as follows:: + + git-cvsimport -v -d :pserver:anonymous@cvs.fedoraproject.org/cvs/docs -C + + sudo git clone --bare --no-hardlinks /git/.git/ + +Example:: + + git-cvsimport -v -d :pserver:anonymous@cvs.fedoraproject.org/cvs/docs -C translation-quick-start-guide translation-quick-start-guide + sudo git clone --bare --no-hardlinks translation-quick-start-guide/ /git/translation-quick-start-guide.git/ + +.. note:: + + Note that our git repos disallow non-fast-forward pushes by default. This + default makes the most sense, but sometimes, users understand the impact + of doing so, but still wish to make such a push. + + To enable this temporarily, edit the config file inside of the git repo, + and make sure that receive.denyNonFastforwards is set to false. Make sure + to reenable this once the user has finished their push. + +How to allow a project to redirect parts of their release tree +============================================================== + +A project may want to host parts of their release tree elsewhere (for +instance, moving docs from hosting inside of the fedorhosted release tree +to an external service). To do that, modify::: + + configs/web/fedorahosted.org/release.conf + +Adding a new Directory section like this:: + + # Allow python-fedora project to redirect documentation/release tree elsewhere + + AllowOverride FileInfo + + +Then tell the project that they can create a .htaccess file with the +Redirect (Note that the release tree can be reached by two URLs so you need to +redirect both of them):: + + Redirect permanent /releases/p/y/python-fedora/doc http://pythonhosted.org/python-fedora + Redirect permanent /released/python-fedora/doc/ http://pythonhosted.org/python-fedora diff --git a/docs/sysadmin-guide/sops/fedorahostedrename.rst b/docs/sysadmin-guide/sops/fedorahostedrename.rst new file mode 100644 index 0000000..75d6da9 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorahostedrename.rst @@ -0,0 +1,88 @@ +.. title: Fedorahosted Project Rename SOP +.. slug: infra-fedorahosted-rename +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +=============================== +FedoraHosted Project Rename SOP +=============================== + +This describes the steps necessary to rename a project in Fedora Hosted. + +Contents +======== + +1. Rename the Trac instance +2. Rename the git / svn / hg / ... directory +3. Rename any old releases directories +4. Rename the group in FAS + +Rename the Trac instance +========================= + +:: + + cd /srv/web/trac/projects + mv oldname newname + cd newname/conf + sed -i -e 's/oldname/newname/' trac.ini + cd .. + sudo -u apache trac-admin . + resync + +Rename the git / svn / hg / ... directory +========================================= + +:: + + cd /git + mv oldname.git newname.git + +Rename any old releases directories +=================================== + +:: + + cd /srv/web/releases/o/l/oldname + +somehow, the newname releases dir gets created; if there were old releases, move them to the new location. + +Rename the group in FAS +======================= + +.. note:: + Don't blindly rename + fedorahosted groups are usually safe to rename. If the old group could be + present in other apps/configs, though, (like provenpackagers, perl-sig, + etc) do not rename them. The other apps would need to have the group name + updated there as well to make this safe. + +:: + + ssh db2 + sudo -u postgres psql fas2 + +:: + + BEGIN; + select * from groups where name = '$OLDNAME'; + update groups set name = '$NEWNAME' where name = '$OLDNAME'; + +* Check that only one row was modified:: + + select * from groups where name in ('$OLDNAME', '$NEWNAME'); + +* Check that there's only one row and the name == $NEWNAME + +* If incorrect, do ROLLBACK; instead of commit:: + + COMMIT; + +.. warning:: Don't delete groups + If, for some reason, you end up with a group in FAS that was a typo but it + doesn't conflict with anything else, don't delete it without talking to + other admins on fedora-infrastructure-list. The numeric group ids could be + present on a filesystem somewhere and removing the group could eventually + lead to the id being allocated to some other group which would give + unintended people access to the files. As a group we can figure out what + hosts and files need to be checked for this issue if a delete is needed. diff --git a/docs/sysadmin-guide/sops/fedorapackages.rst b/docs/sysadmin-guide/sops/fedorapackages.rst new file mode 100644 index 0000000..d3e0d59 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorapackages.rst @@ -0,0 +1,83 @@ +.. title: Fedora Packages SOP +.. slug: infra-fedora-packages +.. date: 2012-02-23 +.. taxonomy: Contributors/Infrastructure + +=================== +Fedora Packages SOP +=================== + +This SOP is for the Fedora Packages web application. +https://community.dev.fedoraproject.org/packages + +Contents +======== + +1. Contact Information +2. Building a new release +3. Deploying to the development server +4. Hotfixing +5. Checking for AGPL violations + +Contact Information +=================== + +Owner + Luke Macken + +Contact + lmacken@redhat.com + +Location + PHX2 + +Servers + community01.dev + +Purpose + Web interface for package information + +Building a new release +====================== +There is a helper script that lives in the fedoracommunity git repository +that automatically handles spinning up a new release, building it in mock, and +scping it to batcave. First, edit the version/release in the specfile and +setup.py, then run::: + + ./release + +Deploying to the development server: +===================================== + +There is a script in the fedoracommunity git repository called +'fcomm-dev-update' that you must first copy to the ansible server. Then you run +it with the same arguments as the release script. This tool will sign the +RPMs, copy them into the infrastructure testing repo, update the repodata, +and then run a bunch of func commands to update the package on the dev server. + +:: + + ./fcomm-dev-release + +Hotfixing +========= +If you wish to make a hotfix to the Fedora Packages application, simply +make your change in your local git repository, and then perform the building & +deployment steps above. This will still work even if you do not wish to commit +& push your change back upstream. + +In order to ensure AGPL compliance, we DO NOT do ansible based hotfixing for +Fedora Packages. + +Checking for AGPL violations +============================ + +To remain AGPL compliant, we must ensure that all modifications to the code +are made available in the SRPM that we link to in the footer of the +application. You can easily query our app servers to determine if any AGPL +violating code modifications have been made to the package.:: + + func-command --host="*app*" --host="community*" "rpm -V fedoracommunity" + +You can safely ignore any changes to non-code files in the output. If any +violations are found, the Infrastructure Team should be notified immediately. diff --git a/docs/sysadmin-guide/sops/fedorapastebin.rst b/docs/sysadmin-guide/sops/fedorapastebin.rst new file mode 100644 index 0000000..7cc25f9 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorapastebin.rst @@ -0,0 +1,89 @@ +.. title: Fedora Pastebin SOP +.. slug: infra-fpaste +.. date: 2013-04-15 +.. taxonomy: Contributors/Infrastructure + +=================== +Fedora Pastebin SOP +=================== + +Contents +======== + +1. Contact Information +2. Introduction +3. Installation +4. Dashboard +5. Add a word to censored list + + +1. Contact Information +----------------------- + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Persons + athmane herlo +Sponsor + nirik +Location + phx2 +Servers + paste01.stg, paste01.dev +Purpose + To host Fedora Pastebin + + +2. Introduction +---------------- + +Fedora pastebin is powered by sticky-notes which is included in EPEL. + +Fedora theming (skin) is included in ansible role. + + +3. Installation +---------------- + +Sticky-notes needs a MySQL db and a user with 'select, update, delete, insert' privileges. + +It's recommended to dump and import db from a working installation +to save time (skipping the installation and tweaking). + +By default the installation is locked ie: you can't relaunch it. + +However, you can unlock the installation by commenting the line containing +``$gsod->trigger`` in ``/etc/sticky-notes/install.php`` then pointing the web browser to '/install' + +The configuration file containing general settings and DB credentials +is located in ``/etc/sticky-notes/config.php`` + +4. Dashboard +------------- + +Sticky-notes has a dashboard (URL: /admin/) that can be used to : + +- Manage pastes: + - deleting paste + - getting information about the paste author (IP/Date/time etc...) +- Manage users (aka admins) which can log into the dashboard +- Manage IP Bans (add / delete banned IPs). +- Authentication (not needed) +- Site configuration: + - General configuration (included in config.php). + - Project Honey Pot configuration (not a FOSS service) + - Word censor configuration: a list of words to be censored in pastes. + +5. Add a word to censored list +------------------------------ + +If a word is in censored list, any paste containing that word will be +rejected, to add one, edit the variable '$sg_censor' in sticky-notes configuration file.:: + + $sg_censor = "WORD1 + WORD2 + ... + ... + WORDn"; diff --git a/docs/sysadmin-guide/sops/fedorawebsites.rst b/docs/sysadmin-guide/sops/fedorawebsites.rst new file mode 100644 index 0000000..a850361 --- /dev/null +++ b/docs/sysadmin-guide/sops/fedorawebsites.rst @@ -0,0 +1,314 @@ +.. title: Websites Release SOP +.. slug: infra-websites +.. date: 2015-08-27 +.. taxonomy: Contributors/Infrastructure + +=================== +Webites Release SOP +=================== + + + * 1. Preparing the website for a release + + * 1.1 Obsolete GPG key of the EOL Fedora release + * 1.2 Update GPG key + * 1.2.1 Steps + + * 2. Update website + + * 2.1 For Alpha + * 2.2 For Beta + * 2.3 For GA + + * 3. Fire in the hole + + * 4. Tips + + * 4.1 Merging branches + + + + 1. Preparing the website for a new release cycle + + 1.1 Obsolete GPG key + + One month after a Fedora release the release number 'FXX-2' (i.e. 1 month + after F21 release, F19 will be EOL) will be EOL (End of Life). + At this point we should drop the GPG key from the list in verify/ and move + the keys to the obsolete keys page in keys/obsolete.html. + + 1.2 Update GPG key + + After another couple of weeks and as the next release approaches, watch + the fedora-release package for a new key to be added. Use the update-gpg-keys + script in the fedora-web git repository to add it to static/. Manually add it + to /keys and /verify in all websites where we use these keys: + * arm.fpo + * getfedora.org + * labs.fpo + * spins.fpo + + 1.2.1 Steps + + a) Get a copy of the new key(s) from the fedora-release repo, you will + find FXX-primary and FXX-secondary keys. Save them in ./tools to make the + update easier. + + https://pagure.io/fedora-repos + + b) Start by editing ./tools/update-gpg-keys and adding the key-ids of + any obsolete keys to the obsolete_keys list. + + c) Then run that script to add the new key(s) to the fedora.gpg block: + + fedora-web git:(master) cd tools/ + tools git:(master) ./update-gpg-keys RPM-GPG-KEY-fedora-23-primary + tools git:(master) ./update-gpg-keys RPM-GPG-KEY-fedora-23-secondary + + This will add the key(s) to the keyblock in static/fedora.gpg and + create a text file for the key in static/$KEYID.txt as well. Verify + that these files have been created properly and contain all the keys + that they should. + + * Handy checks: gpg static/fedora.gpg or gpg static/$KEYID.txt + * Adding "--with-fingerprint" option will add the fingerprint to the + output + + The output of fedora.gpg should contain only the actual keys, not the + obsolete keys. + The single text files should contain the correct information for the + uploaded key. + + d) Next, add new key(s) to the list in data/verify.html and move the new + key informations in the keys page in data/content/keys/index.html. A + script to aid in generating the HTML code for new keys is in + ./tools/make-gpg-key-html. + It will print HTML to stdout for each RPM-GPG-KEY-* file given as + arguments. This is suitable for copy/paste (or directly importing if + your editor supports this). + Check the copied HTML code and select if the key info is for a primary + or secondary key (output says 'Primary or Secondary'). + + tools git:(master) ./make-gpg-key-html RPM-GPG-KEY-fedora-23-primary + + Build the website with 'make en test' and carefully verify that the + data is correct. Please double check all keys in http://localhost:5000/en/keys + and http://localhost:5000/en/verify. + + NOTE: the tool will give you an outdated output, adapt it to the new + websites and bootstrap layout! + + + 2. Update website + + 2.1 For Alpha + + a) Create the fXX-alpha branch from master + fedora-web git:(master) git push origin master:refs/heads/f22-alpha + + and checkout to the new branch: + fedora-web git:(master) git checkout -t -b f13-alpha origin/f13-alpha + + b) Update the global variables + Change curr_state to Alpha for all arches + + c) Add Alpha banner + Upload the FXX-Alpha banner to static/images/banners/f22alpha.png + which should appear in every ${PRODUCT}/download/index.html page. + Make sure the banner is shown in all sidebars, also in labs, spins, and arm. + + d) Check all Download links and paths in ${PRODUCT}/prerelease/index.html + You can find all paths in bapp01 (sudo su - mirrormanager first) or + you can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage + + e) Add CHECKSUM files to static/checksums and verify that the paths are + correct. The files should be in sundries01 and you can query them with: + $ find /pub/fedora/linux/releases/test/17-Alpha/ -type f -name \ + *CHECKSUM* -exec cp '{}' . \; + Remember to add the right checksums to the right websites (same path). + + f) Add EC2 AMI IDs for Alpha. All IDs now are in the globalvar.py file. + We get all data from there, even the redirect path to trac the AMI IDs. + We now also have a script which is useful to get all the AMI IDs uploaded + with fedimg. Execute it to get the latest uploads, but don't run the script too + early, as new builds are added constantly. + fedora-web git:(fXX-alpha) python ~/fedora-web/tools/get_ami.py + + g) Add CHECKSUM files also to http://spins.fedoraproject.org in + static/checksums. Verify the paths are correct in data/content/verify.html. + (see point e) to query them on sundries01). Same for labs.fpo and arm.fpo. + + h) Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo. + + i) Update Alpha Image sizes and pre_cloud_composedate in ./build.d/globalvar.py. + Verify they are right in Cloud images and Docker image. + + j) Update the new POT files and push them to Zanata (ask a maintainer to do + so) every time you change text strings. + + k) Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to + test the pages online. + + l) Release Date: + * Merge the fXX-alpha branch to master and correct conflicts manually + * Remove the redirect of prerelease pages in ansible, edit: + * ansible/playbooks/include/proxies-redirects.yml + * ask a sysadmin-main to run playbook + * When ready and about 90 minutes before Release Time push to master + * Tag the commit as new release and push it too: + $ git tag -a FXX-Alpha -m 'Releasing Fedora XX Alpha' + $ git push --tags + * If needed follow "Fire in the hole" below. + + + 2.2 For Beta + + a) Create the fXX-beta branch from master + fedora-web git:(master) git push origin master:refs/heads/f22-beta + + and checkout to the new branch: + fedora-web git:(master) git checkout -t -b f22-beta origin/f22-beta + + b) Update the global variables + Change curr_state to Beta for all arches + + c) Add Alpha banner + Upload the FXX-Beta banner to static/images/banners/f22beta.png + which should appear in every ${PRODUCT}/download/index.html page. + Make sure the banner is shown in all sidebars, also in labs, spins, and arm. + + d) Check all Download links and paths in ${PRODUCT}/prerelease/index.html + You can find all paths in bapp01 (sudo su - mirrormanager first) or + you can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage + + e) Add CHECKSUM files to static/checksums and verify that the paths are + correct. The files should be in sundries and you can query them with: + $ find /pub/fedora/linux/releases/test/17-Beta/ -type f -name \ + *CHECKSUM* -exec cp '{}' . \; + Remember to add the right checksums to the right websites (same path). + + f) Add EC2 AMI IDs for Beta. All IDs now are in the globalvar.py file. + We get all data from there, even the redirect path to trac the AMI IDs. + We now also have a script which is useful to get all the AMI IDs uploaded + with fedimg. Execute it to get the latest uploads, but don't run the script too + early, as new builds are added constantly. + fedora-web git:(fXX-beta) python ~/fedora-web/tools/get_ami.py + + g) Add CHECKSUM files also to http://spins.fedoraproject.org in + static/checksums. Verify the paths are correct in data/content/verify.html. + (see point e) to query them on sundries01). Same for labs.fpo and arm.fpo. + + h) Remove static/checksums/Fedora-XX-Alpha* in all websites. + + i) Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo. + + j) Update Beta Image sizes and pre_cloud_composedate in ./build.d/globalvar.py. + Verify they are right in Cloud images and Docker image. + + k) Update the new POT files and push them to Zanata (ask a maintainer to do + so) every time you change text strings. + + l) Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to + test the pages online. + + m) Release Date: + * Merge the fXX-beta branch to master and correct conflicts manually + * When ready and about 90 minutes before Release Time push to master + * Tag the commit as new release and push it too: + $ git tag -a FXX-Beta -m 'Releasing Fedora XX Beta' + $ git push --tags + * If needed follow "Fire in the hole" below. + + + 2.3 For GA + + a) Create the fXX branch from master + fedora-web git:(master) git push origin master:refs/heads/f22 + + and checkout to the new branch: + fedora-web git:(master) git checkout -t -b f22 origin/f22 + + b) Update the global variables + Change curr_state for all arches + + c) Check all Download links and paths in ${PRODUCT}/download/index.html + You can find all paths in bapp01 (sudo su - mirrormanager first) or + you can look at the downlaod page http://dl.fedoraproject.org/pub/alt/stage + + d) Add CHECKSUM files to static/checksums and verify that the paths are + correct. The files should be in sundries01 and you can query them with: + $ find /pub/fedora/linux/releases/17/ -type f -name \ + *CHECKSUM* -exec cp '{}' . \; + Remember to add the right checksums to the right websites (same path). + + e) At some point freeze translations. Add an empty PO_FREEZE file to every + website's directory you want to freeze. + + f) Add EC2 AMI IDs for GA. All IDs now are in the globalvar.py file. + We get all data from there, even the redirect path to trac the AMI IDs. + We now also have a script which is useful to get all the AMI IDs uploaded + with fedimg. Execute it to get the latest uploads, but don't run the script too + early, as new builds are added constantly. + fedora-web git:(fXX) python ~/fedora-web/tools/get_ami.py + + g) Add CHECKSUM files also to http://spins.fedoraproject.org in + static/checksums. Verify the paths are correct in data/content/verify.html. + (see point e) to query them on sundries01). Same for labs.fpo and arm.fpo. + + h) Remove static/checksums/Fedora-XX-Beta* in all websites. + + i) Verify all paths and links on http://spins.fpo, labs.fpo and arm.fpo. + + j) Update GA Image sizes and cloud_composedate in ./build.d/globalvar.py. + Verify they are right in Cloud images and Docker image. + + k) Update static/js/checksum.js and check if the paths and checksum still match. + + l) Update the new POT files and push them to Zanata (ask a maintainer to do + so) every time you change text strings. + + m) Add this build to stg.fedoraproject.org (ansible syncStatic.sh.stg) to + test the pages online. + + n) Release Date: + * Merge the fXX-beta branch to master and correct conflicts manually + * Add the redirect of prerelease pages in ansible, edit: + * ansible/playbooks/include/proxies-redirects.yml + * ask a sysadmin-main to run playbook + * Unfreeze translations by deleting the PO_FREEZE files + * When ready and about 90 minutes before Release Time push to master + * Update the short links for the Cloud Images for 'Fedora XX', 'Fedora + XX-1' and 'Latest' + * Tag the commit as new release and push it too: + $ git tag -a FXX -m 'Releasing Fedora XX' + $ git push --tags + * If needed follow "Fire in the hole" below. + + + 3. Fire in the hole + + We now use ansible for everything, and normally use a regular build to make + the websites live. If something is not happening as expected, you should get in + contact with a sysadmin-main to run the ansible playbook again. + + All our stuff, such as SyncStatic.sh and SyncTranslation.sh scripts are now + also in ansible! + + Staging server app02 and production server bapp01 do not exist anymore, now our staging + websites are on sundries01.stg and the production on sundries01. Change your scripts + accordingly and as sysadmin-web you should have access to those servers as before. + + + 4. Tips + + 4.1 Merging branches + + Suggested by Ricky + This can be useful if you're *sure* all new changes on devel branch should go into + the master branch. Conflicts will be solved directly accepting only the changes + in the devel branch. + If you're not 100% sure do a normal merge and fix conflicts manually! + + $ git merge f22-beta + $ git checkout --theirs f22-beta [list of conflicting po files] + $ git commit diff --git a/docs/sysadmin-guide/sops/fmn.rst b/docs/sysadmin-guide/sops/fmn.rst new file mode 100644 index 0000000..5466a7d --- /dev/null +++ b/docs/sysadmin-guide/sops/fmn.rst @@ -0,0 +1,81 @@ +.. title: fedmsg Notifications SOP +.. slug: infra-fmn +.. date: 2015-03-24 +.. taxonomy: Contributors/Infrastructure + +============================== +fmn (fedmsg notifications) SOP +============================== + +Route individualized notifications to fedora contributors over email, irc. + +Contact Information +------------------- + +Owner + Messaging SIG, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-fedmsg, #fedora-admin, #fedora-noc +Servers + notifs-backend01, notifs-web0{1,2} +Purpose + Route notifications to users + +Description +----------- + +fmn is a pair of systems intended to route fedmsg notifications to Fedora +contributors and users. + +There is a web interface running on notifs-web01 and notifs-web02 that +allows users to login and configure their preferences to select this or that +type of message. + +There is a backend running on notifs-backend01 where most of the work is +done. + +The backend process is a 'fedmsg-hub' daemon, controlled by systemd. + +Disable an account (on notifs-backend01):: + + $ sudo -u fedmsg /usr/local/bin/fmn-disable-account USERNAME + +Restart:: + + $ sudo systemctl restart fedmsg-hub + +Watch logs:: + + $ sudo journalctl -u fedmsg-hub -f + +Configuration:: + + $ ls /etc/fedmsg.d/ + $ sudo fedmsg-config | less + +Monitor performance:: + + http://threebean.org/fedmsg-health-day.html#FMN + +Upgrade (from batcave):: + + $ sudo -i ansible-playbook /srv/web/infra/ansible/playbooks/manual/upgrade/fmn.yml + +Mailing Lists +------------- + +We use FMN as a way to forward certain kinds of messages to mailing lists so +people can read them the good old fashioned way that they like to. To +accomplish this, we create 'bot' FAS accounts with their own FMN profiles and +we set their email addresses to the lists in question. + +If you need to change the way some set of messages are forwarded, you can do +it from the FMN web interface (if you are an FMN admin as defined in the config +file in roles/notifs/frontend/). You can navigate to +https://apps.fedoraproject.org/notifications/USERNAME.id.fedoraproject.org to do +this. + +If the account exists as a FAS user already (for instance, the ``virtmaint`` +user) but it does not yet exist in FMN, you can add it to the FMN database by +logging in to notifs-backend01 and running ``fmn-create-user --email +DESTINATION@EMAIL.COM --create-defaults FAS_USERNAME``. diff --git a/docs/sysadmin-guide/sops/freemedia.rst b/docs/sysadmin-guide/sops/freemedia.rst new file mode 100644 index 0000000..cb51f7c --- /dev/null +++ b/docs/sysadmin-guide/sops/freemedia.rst @@ -0,0 +1,194 @@ +.. title: FreeMedia Infrastructure SOP +.. slug: infra-freemedia +.. date: 2014-12-18 +.. taxonomy: Contributors/Infrastructure + +FreeMedia Infrastructure SOP + +This page is for defining the SOP for Fedora FreeMedia Program. This will +cover the infrastructural things as well as procedural things. + +Contents +======== + +1. Location of Resources +2. Location on Ansible +3. Opening of the form +4. Closing of the Form +5. Tentative timeline +6. How to + + 1. Open + 2. Close + +7. Handling of tickets + + 1. Login + 2. Rejecting Invalid Tickets + 3. Accepting Valid Tickets + +8. Handling of non fulfilled requests +9. How to handle membership applications + +Location of Resources +===================== +* The web form is at + https://fedoraproject.org/freemedia/FreeMedia-form.html +* The TRAC is at [63]https://fedorahosted.org/freemedia/report + +Location on ansible +=================== + +$PWD = ``roles/freemedia/files`` + +Freemedia form + FreeMedia-form.html +Backup form + FreeMedia-form.html.orig +Closed form + FreeMedia-close.html +Backend processing script + process.php +Error Document + FreeMedia-error.html + +Opening of the form +=================== + +The form will be opened on the First day of each month. + +Closing of the Form +=================== + +Tentative timeline +------------------ + +The form will be closed after a couple of days. This may vary according to +the capacity. + +How to +====== + +* The form is available at + ``roles/freemedia/files/FreeMedia-form.html`` and + ``roles/freemedia/files//FreeMedia-form.html.orig`` + +* The closed form is at + ``roles/freemedia/files/FreeMedia-close.html`` + +Open +---- + +* Goto roles/freemedia/tasks +* Open ``main.yml`` +* Goto line 32. +* To Open: Change the line to read:: + src="FreeMedia-form.html" +* After opening the form, go to trac and grant "Ticket Create and + Ticket View" privilege to "Anonymous". + +Close +----- + +* Goto roles/freemedia/tasks +* Open main.yml +* Goto line 32. +* To Close: Change the line to read:: + src="FreeMedia-close.html", +* After closing the form, go to trac and remove "Ticket Create and + Ticket View" privilege from "Anonymous". + +.. note:: + * Have to check about monthly cron. + * Have to write about changing init.pp for closing and opening + +Handling of tickets +=================== + +Login +----- + +* Contributors are requested to visit + https://fedorahosted.org/freemedia/report +* Please login with your FAS account. + +Rejecting Invalid Tickets +------------------------- + +* If a ticket is invalid, don't accept the request. Go to "resolve as:" + and select "invalid" and then press "Submit Changes". + +* A ticket is Invalid if + + * No Valid email-id is provided. + * The region does not match the country. + * No Proper Address is given. + +* If a ticket is duplicate, accept one copy, close the others as + duplicate Go to "resolve as:" and select "duplicate" and then press + "Submit Changes". + +Accepting Valid Tickets +----------------------- +* If you wish to fulfill a request, please ensure it from the above + section, it is not liable to be discarded. + +* Now "Accept" the ticket from the "Action" field at the bottom, and + press the "Submit Changes" button. + +* These accepted tickets will be available from + https://fedorahosted.org/freemedia/report user both "My Tickets" + and "Accepted Tickets for XX" (XX= your region e.g APAC) + +* When You ship the request, please go to the ticket again, go to + "resolve as:" from the "Action" field and select "Fixed" and then + press "Submit Changes". + +* If an accepted ticket is not finalised by the end of the month, is + should be closed with "shipping status unknown" in a comment + +Handling of non fulfilled requests +---------------------------------- + +We shall close all the pending requests by the end of the Month. + +* Please Check your region + +How to handle membership applications +------------------------------------- + +Steps to become member of Free-media Group. + +1. Create an account in Fedora Account System (FAS) +2. Create an user page in Fedora Wiki with contact data. Like + User:. There are templates. +3. Apply to Free-Media Group in FAS +4. Apply to Free-Media mailing list subscription + +Rules for deciding over membership applications +```````````````````````````````````````````````` +======= ================ ========== =============== ========================= +Case Applied to User Page Applied to Action + Free-Media Group Created Free-Media List +======= ================ ========== =============== ========================= +1 Yes Yes Yes Approve Group and mailing + list applications +------- ---------------- ---------- --------------- ------------------------- + Put on hold + Write to +2 Yes Yes No subscribe to list Within + a Week +------- ---------------- ---------- --------------- ------------------------- + Put on hold + Write to +3 Yes No whatever make User Page Within a + Week +------- ---------------- ---------- --------------- ------------------------- +4 No No Yes Reject +======= ================ ========== =============== ========================= + +.. note:: + 1. As you need to have an FAS account for steps 2 and 3, this is not + included in the decision rules above + 2. The time to be on hold is one week. If not action is taken after one + week, the application has to be rejected. + 3. When writing asking to fulfil steps, send CC to other Free-media + sponsors to let them know the application has been reviewed. diff --git a/docs/sysadmin-guide/sops/freenode-irc-channel.rst b/docs/sysadmin-guide/sops/freenode-irc-channel.rst new file mode 100644 index 0000000..539a3b9 --- /dev/null +++ b/docs/sysadmin-guide/sops/freenode-irc-channel.rst @@ -0,0 +1,88 @@ +.. title: Freenode IRC SOP +.. slug: infra-freenode +.. date: 2013-11-08 +.. taxonomy: Contributors/Infrastructure + +======================================= +Freenode IRC Channel Infrastructure SOP +======================================= + +Fedora uses the freenode IRC network for it's IRC communications. If you +want to make a new Fedora Related IRC Channel, please follow the following +guidelines. + +Contents +======== + +1. Contact Information +2. Is a new channel needed? +3. Adding new channel +4. Recovering/fixing an existing channel + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin +Location: + freenode +Servers: + none +Purpose: + Provides a channel for Fedora contributors to use. + +Is a new channel needed? +======================== + +First you should see if one of the existing Fedora channels will meet your +needs. Adding a new channel can give you a less noisy place to focus on +something, but at the cost of less people being involved. If you +topic/area is development related, perhaps the main #fedora-devel channel +will meet your needs? + +Adding new channel +================== + +* Make sure the channel is in the #fedora-* namespace. This allows the + Fedora Group Coordinator to make changes to it if needed. + +* Found the channel. You do this by /join #channelname, then /msg + chanserv register #channelname + +* Setup GUARD mode. This allows ChanServ to be in the channel for easier + management: ``/msg chanserv set #channel GUARD on`` + +* Add Some other Operators/Managers to the access list. This would allow + them to manage the channel if you are asleep or absent.:: + + /msg chanserv access #channel add NICK +ARfiorstv + +You can see what the various flags mean at http://toxin.jottit.com/freenode_chanserv_commands#cs03 + +You may want to consider adding some or all of the folks in #fedora-ops +who manage other channels to help you with yours. You can see this list +with `/msg chanserv access #fedora-ops list`` + +* Set default modes. + ``/msg chanserv set mlock #channel +Ccnt`` + (The t for topic lock is optional, if your channel would like + to have people change the topic often). + +* If your channel is of general interest, add it to the main communicate + page of IRC Channels, and possibly announce it to your target + audience. + +* You may want to request zodbot join your channel if you need it's + functions. You can request that in #fedora-admin. + +Recovering/fixing an existing channel +===================================== + +If there is an existing channel in the #fedora-* namespace that has a +missing founder/operator, please contact the Fedora Group Coordinator: +[49]User:Spot and request it be reassigned. Follow the above procedure +on the channel once done so it's setup and has enough +operators/managers to not need reassiging again. + diff --git a/docs/sysadmin-guide/sops/gather-easyfix.rst b/docs/sysadmin-guide/sops/gather-easyfix.rst new file mode 100644 index 0000000..66b252f --- /dev/null +++ b/docs/sysadmin-guide/sops/gather-easyfix.rst @@ -0,0 +1,49 @@ +.. title: gather-easyfix SOP +.. slug: infra-gather-easyfix +.. date: 2016-03-14 +.. taxonomy: Contributors/Infrastructure + +========================= +Fedora gather easyfix SOP +========================= + +Fedora-gather-easyfix as the name says gather tickets marked as easyfix from +multiple sources (pagure, github and fedorahosted currently). Providing a single +place for new-comers to find small tasks to work on. + + +Contents +======== + +1. Contact Information +2. Documentation Links + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + http://fedoraproject.org/easyfix/ +Servers + sundries01, sundries02, sundries01.stg +Purpose + Gather easyfix tickets from multiple sources. + + +Upstream sources are hosted on github at: +https://github.com/fedora-infra/fedora-gather-easyfix/ + +The files are then mirrored to our ansible repo, under the `easyfix/gather` +role. + +The project is a simple script ``gather_easyfix.py`` gathering information from +the projects sets on the `Fedora wiki +`_ and outputing a single html file. +This html file is then improved via the css and javascript files present in the +sources. + +The generated html file together with the css and js files are then synced to +the proxies for public consumption :) diff --git a/docs/sysadmin-guide/sops/geoip-city-wsgi.rst b/docs/sysadmin-guide/sops/geoip-city-wsgi.rst new file mode 100644 index 0000000..cf0d486 --- /dev/null +++ b/docs/sysadmin-guide/sops/geoip-city-wsgi.rst @@ -0,0 +1,69 @@ +.. title: geoip-city-wsgi SOP +.. slug: geoip-city-wsgi +.. date: 2017-01-30 +.. taxonomy: Contributors/Infrastructure + + +==================== +geoip-city-wsgi SOP +==================== + +A simple web service that return geoip information as JSON-formatted dictionary in utf-8. Particularly, it's used by anaconda[1] to get the most probable territory code, based on the public IP of the caller. + +Contents +======== + +1. Contact Information +2. Basic Function +3. Ansible Roles +4. Apps depending of geoip-city-wsgi +5. Documentation Links + + +Contact Information +==================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-admin, #fedora-noc +Location + https://geoip.fedoraproject.org +Servers + sundries*, sundries*-stg +Purpose + A simple web service that return geoip information as JSON-formatted dictionary in utf-8. Particularly, it's used by anaconda[1] to get the most probable territory code, based on the public IP of the caller. + +Basic Function +============== + +- Users go to https://geoip.fedoraproject.org/city + +- The website is exposed via ``/etc/httpd/conf.d/geoip-city-wsgi-proxy.conf``. + +- Return a string with geoip information with syntax as JSON-formatted dict in utf8 + +- It also currently accepts one override: ?ip=xxx.xxx.xxx.xxx, e.g. https://geoip.fedoraproject.org/city?ip=18.0.0.1 which then uses the passed IP address instead of the determined IP address of the client. + + +Ansible Roles +============== +The geoip-city-wsgi role https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/geoip-city-wsgi +is present in sundries playbook https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/groups/sundries.yml + +the proxy task are present in +https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/include/proxies-reverseproxy.yml + +Apps depending of geoip-city-wsgi +================================= +unknown + +Documentation Links +=================== + +app: https://geoip.fedoraproject.org +source: https://github.com/fedora-infra/geoip-city-wsgi +bugs: https://github.com/fedora-infra/geoip-city-wsgi/issues +Role: https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/geoip-city-wsgi +[1] https://fedoraproject.org/wiki/Anaconda + diff --git a/docs/sysadmin-guide/sops/github.rst b/docs/sysadmin-guide/sops/github.rst new file mode 100644 index 0000000..3a4f182 --- /dev/null +++ b/docs/sysadmin-guide/sops/github.rst @@ -0,0 +1,77 @@ +.. title: Fedora Infrastructure Github SOP +.. slug: infra-githup +.. date: 2014-09-26 +.. taxonomy: Contributors/Infrastructure + +=============================== +Using github for Infra Projects +=============================== + +We're presently using github to host git repositories and issue tracking for +some infrastructure projects. Anything we need to know should be recorded +here. + +--------------------- +Setting up a new repo +--------------------- + +Create projects inside of the fedora-infra group: + +https://github.com/fedora-infra + +That will allow us to more easily track what projects we have. + +[TODO] How do we create a new project and import it? + +- After creating a new repo, click on the Settings tab to set up some fancy + things. + + If using git-flow for your project: + + - Set the default branch from 'master' to 'develop'. Having the default + branch be develop is nice: new contributors will automatically start + committing there if they're not paying attention to what branch they're + on. You almost never want to commit directly to the master branch. + + If there does not exist a develop branch, you should create one by + branching off of master.:: + + $ git clone GIT_URL + $ git checkout -b develop + $ git push --all + + - Set up an IRC hook for notifications. From the "settings" tab click on + "Webhooks & Services." Under the "Add Service" dropdown, find "IRC" and + click it. You might need to enter your password. + In the form, you probably want the following values: + + - Server, irc.freenode.net + - Port, 6697 + - Room, #fedora-apps + - Nick, + - Branch Regexes, + - Password, + - Ssl, + - Message Without Join, + - No Colors, + - Long Url, + - Notice, + - Active, + + +Add an EasyFix label +==================== + +The EasyFix label is used to mark bugs that are potentially fixable by new +contributors getting used to our source code or relatively new to python +programming. GitHub doesn't provide this label automatically so we have to +add it. You can add the label from the issues page of the repository or use +this curl command to add it:: + + curl -k -u '$GITHUB_USERNAME:$GITHUB_PASSWORD' https://api.github.com/repos/fedora-infra/python-fedora/labels -H "Content-Type: application/json" -d '{"name":"EasyFix","color":"3b6eb4"}' + +Please try to use the same color for consistency between Fedora Infrastructure +Projects. You can then add the github repo to the list that +easyfix.fedoraproject.org scans for easyfix tickets here: + +https://fedoraproject.org/wiki/Easyfix diff --git a/docs/sysadmin-guide/sops/github2fedmsg.rst b/docs/sysadmin-guide/sops/github2fedmsg.rst new file mode 100644 index 0000000..16c6854 --- /dev/null +++ b/docs/sysadmin-guide/sops/github2fedmsg.rst @@ -0,0 +1,62 @@ +.. title: github2fedmsg SOP +.. slug: infra-github2fedmsg +.. date: 2016-04-08 +.. taxonomy: Contributors/Infrastructure + +================= +github2fedmsg SOP +================= + +Bridge github events onto our fedmsg bus. + +App: https://apps.fedoraproject.org/github2fedmsg/ +Source: https://github.com/fedora-infra/github2fedmsg/ + +Contact Information +------------------- + +Owner + Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-admin, #fedora-noc +Servers + github2fedmsg01 +Purpose + Bridge github events onto our fedmsg bus. + +Description +----------- + +github2fedmsg is a small Python Pyramid app that bridges github events onto our +fedmsg bus by way of github's "webhooks" feature. It is what allows us to have +IRC notifications of github activity via fedmsg. It has two phases of +operation: + +- Infrequently, a user will log in to github2fedmsg via Fedora OpenID. They + then push a button to also log in to github.com. They are then logged in to + github2fedmsg with *both* their FAS account and their github account. + + They are then presented with a list of their github repositories. They can + toggle each one: "on" or "off". When they turn a repo on, our webapp makes a + request to github.com to install a "webhook" for that repo with a callback URL + to our app. + +- When events happen to that repo on github.com, github looks up our callback + URL and makes an http POST request to us, informing us of the event. Our + github2fedmsg app receives that, validates it, and then republishes the + content to our fedmsg bus. + +What could go wrong? +-------------------- + +- Restarting the app or rebooting the host shouldn't cause a problem. It should + come right back up. + +- Our database could die. We have a db with a list of all the repos we have + turned on and off. We would want to restore that from backup. + +- If github gets compromised, they might have to revoke all of their application + credentials. In that case, our app would fail to work. There are *lots* of + private secrets set in our private repo that allow our app to talk to + github.com. There are inline comments there with instructions about how to + generate new keys and secrets. diff --git a/docs/sysadmin-guide/sops/gitweb.rst b/docs/sysadmin-guide/sops/gitweb.rst new file mode 100644 index 0000000..9df9172 --- /dev/null +++ b/docs/sysadmin-guide/sops/gitweb.rst @@ -0,0 +1,38 @@ +.. title: Gitweb Infrastructure SOP +.. slug: infra-gitweb +.. date: 2011-08-23 +.. taxonomy: Contributors/Infrastructure + +========================= +Gitweb Infrastructure SOP +========================= + +Gitweb-caching is the web interface we use to expose git to the web at +http://git.fedorahosted.org/git/ + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-hosted +Location + Serverbeach +Servers + hosted[1-2] +Purpose + Http access to git sources. + +Basic Function +============== + +- Users go to [46]http://git.fedorahosted.org/git/ + +- Pages are generated from cache stored in ``/var/cache/gitweb-caching/``. + +- The website is exposed via ``/etc/httpd/conf.d/git.fedoraproject.org.conf``. + +- Main config file is ``/var/www/gitweb-caching/gitweb_config.pl``. + This pulls git repos from /git/. + diff --git a/docs/sysadmin-guide/sops/guestdisk.rst b/docs/sysadmin-guide/sops/guestdisk.rst new file mode 100644 index 0000000..17dd70e --- /dev/null +++ b/docs/sysadmin-guide/sops/guestdisk.rst @@ -0,0 +1,116 @@ +.. title: Guest Disk Resize SOP +.. slug: infra-guest-disk-resize +.. date: 2012-06-13 +.. taxonomy: Contributors/Infrastructure + +===================== +Guest Disk Resize SOP +===================== + +Resize disks in our kvm guests + +Contents +======== + +1. Contact Information +2. How to do it + + 1. KVM/libvirt Guests + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin-main +Location: + PHX, Tummy, ibiblio, Telia, OSUOSL +Servers: + All xen servers, kvm/libvirt servers. +Purpose: + Resize guest disks + +How to do it +============ + +KVM/libvirt Guests +------------------ + +1. SSH to the kvm server and resize the guest's logical volume. If you + want to be extra careful, make a snapshot of the LV first:: + + lvcreate -n [guest name]-snap -L 10G -s /dev/VolGroup00/[guest name] + + Optional, but always good to be careful + +2. Shutdown the guest:: + + sudo virsh shutdown [guest name] + +3. Disable the guests lv:: + + lvchange -an /dev/VolGroup00/[guest name] + +4. Resize the lv:: + + lvresize -L [NEW TOTAL SIZE]G /dev/VolGroup00/[guest name] + + or + + lvresize -L +XG /dev/VolGroup00/[guest name] + (to add X GB to the disk) + +5. Enable the lv:: + + lvchange -ay /dev/VolGroup00/[guest name] + +6. Bring the guest back up:: + + sudo virsh start [guest name] + +7. Login into the guest:: + + sudo virsh console [guest name] + You may wish to boot single user mode to avoid services coming up and going down again + +8. On the guest, run:: + + fdisk /dev/vda + +9. Delete the the LVM partition on the guest you want to add space to and + recreate it with the maximum size. Make sure to set its type to LV (8e) + +10. Run partprobe:: + + partprobe + +11. Check the size of the partition:: + + fdisk -l /dev/vdaN + + If this still reflects the old size, then reboot the guest and verify + that its size changed correctly when it comes up again. + +12. Login to the guest again, and run:: + + pvresize /dev/vdaN + +13. A vgs should now show the new size. Use lvresize to resize the root lv:: + + lvresize -L [new root partition size]G /dev/GuestVolGroup00/root + + (pvs will tell you how much space is available) + +14. Finally, resize the root partition:: + + resize2fs /dev/GuestVolGroup00/root + (If the root fs is ext4) + + or + + xfs_growfs /dev/GuestVolGroup00/root + (if the root fs is xfs) + + verify that everything worked out, and delete the snapshot you made + if you made one. diff --git a/docs/sysadmin-guide/sops/guestedit.rst b/docs/sysadmin-guide/sops/guestedit.rst new file mode 100644 index 0000000..bcca35f --- /dev/null +++ b/docs/sysadmin-guide/sops/guestedit.rst @@ -0,0 +1,72 @@ +.. title: Guest Editing SOP +.. slug: infra-guest-editing +.. date: 2012-04-23 +.. taxonomy: Contributors/Infrastructure + +================= +Guest Editing SOP +================= + +Various virsh commands + +Contents +======== + +1. Contact Information +2. How to do it + + 1. add/remove cpus + 2. resize memory + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin-main +Location: + PHX, Tummy, ibiblio, Telia, OSUOSL +Servers: + All xen servers, kvm/libvirt servers. +Purpose: + Resize guest disks + +How to do it +============= + +Add cpu +------- + +1. SSH to the virthost server + +2. Calculate the number of CPUs the system needs + +3. ``sudo virsh setvcpus --config`` - ie:: + + sudo virsh setvcpus bapp01 16 --config + +4. Shutdown the virtual system + +5. Start the virtual system + +6. Login and check that cpu count matches + + +Resize memory +------------- + +1. SSH to the virthost server + +2. Calculate the amount of memory the system needs in kb + +3. ``sudo virsh setmem --config`` - ie:: + + sudo virsh setmem bapp01 16777216 --config + +4. Shutdown the virtual system + +5. Start the virtual system + +6. Login and check that memory matches + diff --git a/docs/sysadmin-guide/sops/guestmigrate.rst b/docs/sysadmin-guide/sops/guestmigrate.rst new file mode 100644 index 0000000..4df35ad --- /dev/null +++ b/docs/sysadmin-guide/sops/guestmigrate.rst @@ -0,0 +1,90 @@ +.. title: Guest Migration SOP +.. slug: infra-guest-migration +.. date: 2011-10-07 +.. taxonomy: Contributors/Infrastructure + +============================== +Guest migration between hosts. +============================== + +Move guests from one host to another. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main + +Location + PHX, Tummy, ibiblio, Telia, OSUOSL + +Servers + All xen servers, kvm/libvirt servers. + +Purpose + Migrate guests + +How to do it +============ + +1. Schedule outage time if any. This will need to be long enough to copy + the data from one host to another, so will depend on guest disk + size. + +2. Turn off monitoring in nagios + +3. On new host create disk space for server:: + + lvcreate -n app03 -L 32G vg_guests00 + +4. prepare old guest for migration: + a) if system is xen, install a regular kernel + b) look for entries for xenblk and hvc0 in /etc files + +5. Shutdown the guest. + +6. :: + + virsh dumpxml guestname > guest.xml + +7. Copy guest.xml to the new machine. You will need to make various + edits depending on if the system was originally xen or such. I + normally need to compare an existing xml on the target system and the + one we dumped out to make up the differences. + +8. Define the guest on the new machine: 'virsh define guest.xml'. + Depending on the changes in the xml this may not work and you will + need to make many manual changes plus copy the guest.xml to + ``/etc/libvirtd/qemu`` and do a ``/sbin/service libvirtd restart`` + +9. Insert iptables rule for nc transfer:: + + iptables -I INPUT 14 -s -m tcp -p tcp --dport 11111 -j ACCEPT + +10. On the destination host: + + - RHEL-5:: + + nc -l -p 11111 | dd of=/dev/mapper/ + + - RHEL-6:: + + nc -l 11111 | dd of=/dev/mapper/ + +11. On the source host:: + + dd if=/dev/mapper/guest-partition | nc desthost 11111 + + Wait for the copy to finish. You can do the following to track how + far something has gone by finding the dd pid and then sending a + 'kill -USR1' to it. + +11. start the guest on the new host:: + + ``virsh start guest`` + +12. On the source host, rename storage and undefine guest so it's not started. + diff --git a/docs/sysadmin-guide/sops/haproxy.rst b/docs/sysadmin-guide/sops/haproxy.rst new file mode 100644 index 0000000..4ed26c1 --- /dev/null +++ b/docs/sysadmin-guide/sops/haproxy.rst @@ -0,0 +1,156 @@ +.. title: haproxy Infrastructure SOP +.. slug: infra-haproxy +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +========================== +Haproxy Infrastructure SOP +========================== + +haproxy is an application that does load balancing at the tcp layer or at +the http layer. It can do generic tcp balancing but it does specialize in +http balancing. Our proxy servers are still running apache and that is +what our users connect to. But instead of using mod_proxy_balancer and +ProxyPass balancer://, we do a ProxyPass to [45]http://localhost:10001/ or +[46]http://localhost:10002/. haproxy must be told to listen to an +individual port for each farm. All haproxy farms are listed in +/etc/haproxy/haproxy.cfg. + +Contents +-------- + +1. Contact Information +2. How it works +3. Configuration example +4. Stats +5. Advanced Usage + +Contact Information +------------------- + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin-main, sysadmin-web group +Location: + Phoenix, Tummy, Telia +Servers: + proxy1, proxy2, proxy3, proxy4, proxy5 +Purpose: + Provides load balancing from the proxy layer to our application + layer. + +How it works +------------ + +haproxy is a load balancer. If you're familiar, this section won't be that +interesting. haproxy in its normal usage acts just like a web server. It +listens on a port for requests. Unlike most webservers though it then +sends that request to one of our back end application servers and sends +the response back. This is referred to as reverse proxying. We typically +configure haproxy to send check to a specific url and look for the +response code. If this url isn't sent, it just does basic checks to /. In +most of our configurations we're using round robin balancing. IE, request +1 goes to app1, request2 goes to app2, request 3 goes to app3 request 4 +goes to app1, and the whole process repeats. + +.. warning:: + These checks do add load to the app servers. As well as additional + connections. Be smart about which url you're checking as it gets checked + often. Also be sure to verify the application servers can handle your new + settings, monitor them closely for the hour or two after you make changes. + +Configuration example +--------------------- + +The below example is how our fedoraproject wiki could be configured. Each +application should have its own farm. Even though it may have an identical +configuration to another farm, this allows easy addition and subtraction +of specific nodes when we need them.:: + + listen fpo-wiki 0.0.0.0:10001 + balance roundrobin + server app1 app1.fedora.phx.redhat.com:80 check inter 2s rise 2 fall 5 + server app2 app2.fedora.phx.redhat.com:80 check inter 2s rise 2 fall 5 + server app4 app4.fedora.phx.redhat.com:80 backup check inter 2s rise 2 fall 5 + option httpchk GET /wiki/Infrastructure + +* The first line "listen ...." Says to create a farm called 'fpo-wiki'. + Listening on all IP's on port 10001. fpo-wiki can be arbitrary but make it + something obvious. Aside from that the important bit is :10001. Always + make sure that when creating a new farm, its listening on a unique port. + In Fedora's case we're starting at 10001, and moving up by one. Just check + the config file for the lowest open port above 10001. + +* The next line "balance roundrobin" says to use round robin balancing. + +* The server lines each add a new node to the balancer farm. In this case + the wiki is being served from app1, app2 and app4. If the wiki is + available at [53]http://app1.fedora.phx.redhat.com/wiki/ Then this config + would be used in conjunction with "RewriteRule ^/wiki/(.*) + [54]http://localhost:10001/wiki/$1 [P,L]". + +* 'server' means we're adding a new node to the farm + +* 'app1' is the worker name, it is analagous to fpo-wiki but should + match shorthostname of the node to make it easy to follow. + +* 'app1.fedora.phx.redhat.com:80' is the hostname and port to be + contacted. + +* 'check' means to check via bottom line "option httpchk GET + /wiki/Infrastructure" which will use /wiki/Infrastructure to verify + the wiki is working. If that URL fails, that entire node will be taken + out of the farm mix. + +* 'inter 2s' means to check every 2 seconds. 2s is the same as 2000 in + this case. + +* 'rise 2' means to not put this node back in the mix until it has had + two successful connections in a row. haproxy will continue to check + every 2 seconds whether a node is up or down + +* 'fall 5' means to take a node out of the farm after 5 failures. + +* 'backup' You'll notice that app4 has a 'backup' option. We don't + actually use this for the wiki but do for other farms. It basically + means to continue checking and treat this node like any other node but + don't send it any production traffic unless the other two nodes are + down. + +All of these options can be tweaked so keep that in mind when changing or +building a new farm. There are other configuration options in this file +that are global. Please see the haproxy documentation for more info:: + + /usr/share/doc/haproxy-1.3.14.6/haproxy-en.txt + +Stats +----- + +In order to view the stats for a farm please see the stats page. Each +proxy server has its own stats page since each one is running its own +haproxy server. To view the stats point your browser to +https://admin.fedoraproject.org/haproxy/shorthostname/ so proxy1 is at +https://admin.fedoraproject.org/haproxy/proxy1/ The trailing / is +important. + +* https://admin.fedoraproject.org/haproxy/proxy1/ +* https://admin.fedoraproject.org/haproxy/proxy2/ +* https://admin.fedoraproject.org/haproxy/proxy3/ +* https://admin.fedoraproject.org/haproxy/proxy4/ +* https://admin.fedoraproject.org/haproxy/proxy5/ + +Advanced Usage +-------------- + +haproxy has some more advanced usage that we've not needed to worry about +yet but is worth mentioning. For example, one could send users to just one +app server based on session id. If user A happened to hit app1 first and +user B happened to hit app4 first. All subsequent requests for user A +would go to app1 and user B would go to app4. This is handy for +applications that cannot normally be balanced because of shared storage +needs or other locking issues. This won't solve all problems though and +can have negative affects for example when app1 goes down user A would +either lose their session, or be unable to work until app1 comes back up. +Please do some great testing before looking in to this option. + diff --git a/docs/sysadmin-guide/sops/hosted_git_to_svn.rst b/docs/sysadmin-guide/sops/hosted_git_to_svn.rst new file mode 100644 index 0000000..d890a8d --- /dev/null +++ b/docs/sysadmin-guide/sops/hosted_git_to_svn.rst @@ -0,0 +1,174 @@ +.. title: Fedorahosted Repository Migration SOP +.. slug: infra-fedorahosted-migration +.. date: 2011-12-14 +.. taxonomy: Contributors/Infrastructure + +======================= +Fedorahosted migrations +======================= + +Migrating hosted repositories to that of another type. + +Contents +======== +1. Contact Information +2. Description +3. SVN to GIT migration + + 1. Questions left to be answered with this SOP + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-hosted + +Location + Serverbeach + +Servers + hosted1, hosted2 + +Purpose + Migrate hosted SCM repositories to that of another SCM. + +Description +=========== + +fedorahosted.org can be used to host open source projects. Occasionally +those projects want to change the SCM they utilize. This document provides +documentation for doing so. + +1. An scm for maintaining the code. The currently supported scm's include + Mercurial, Git, Bazaar, or SVN. Note: There is no cvs +2. A trac instance, which provides a mini-wiki for hosting information + and also provides a ticketing system. +3. A mailing list + +.. important:: + This page is for administrators only. People wishing to request a hosted + project should use the [50]Ticketing System ; see the + new project request template. (Requires Fedora Account) + +SVN to GIT migration +==================== + +FAS User Prep +-------------- + +Currently you must manually generate $PROJECTNAME-users.txt by grabbing a +list of people in the FAS group - and recording them in th following +format:: + + $fasusername = FirstName LastName <$emailaddress> + +This is error prone, and will stop the git-svn fetch below if an author +appears that doesn't exist in the list of users.:: + + svn log --quiet | awk '/^r/ {print $3}' | sort -u + +The above will generate a list of users in the svn repo. + +If all users are FAS users you can use the following script to create a +users file (written by tmz (Todd Zullinger):: + + #!/bin/bash + + if [ -z "$1" ]; then + echo "usage: $0 " >&2 + exit 1 + fi + + svnurl=file:///svn/$1 + + if ! svn info $svnurl &>/dev/null; then + echo "$1 is not a valid svn repo." >&2 + fi + + svn log -q $svnurl | awk '/^r[0-9]+/ {print $3}' | sort -u | while read user; do + name=$( (getent passwd $user 2>/dev/null | awk -F: '{print $5}') || '' ) + [ -z "$name" ] && name=$user + email="$user@fedoraproject.org" + echo "$user=$name <$email>" + done + +Doing the conversion +--------------------- + +1. Log into hosted1 +2. Make a temporary directory to convert the repos in:: + + $ sudo mkdir /tmp/tmp-$PROJECTNAME.git + + $ cd /tmp/tmp-$PROJECTNAME.git + +3. Create an git repo ready to receive migrated SVN data:: + + $ sudo git-svn init http://svn.fedorahosted.org/svn/$PROJECTNAME --no-metadata + +4. Tell git to fetch and convert the repository:: + + $ git svn fetch + + .. note:: + This creation of a temporary repository is necessary because SVN leaves a + number of items floating around that git can ignore, and we want those + essentially ignored. + +5. From here, you'll wanted to follow [53]Creating a new git repo as if + cloning an existing git repository to Fedorahosted. + +6. After that process is done - kindly remove the temporary repo that was created:: + + $ sudo rm -rf /tmp/tmp-$PROJECTNAME.git + +Doing the converstion (alternate) +--------------------------------- + +Alternately, here's another way to do this (tmz): + +Setup a working dir:: + + [tmz@hosted1 tmp (master)]$ mkdir im-chooser-conversion && cd im-chooser-conversion + +Create authors file mapping svn usernames to Name form git uses.:: + + [tmz@hosted1 im-chooser-conversion (master)]$ ~tmz/svn-to-git-authors im-chooser > authors + +Convert svn to git:: + + [tmz@hosted1 im-chooser-conversion (master)]$ git svn clone -s -A authors --no-metadata file:///svn/im-chooser + +Move svn branches and tags into proper locations for the new git repo. +(git-svn leaves them as 'remote' branches/tags.):: + + [tmz@hosted1 im-chooser-conversion (master)]$ cd im-chooser + [tmz@hosted1 im-chooser (master)]$ mv .git/refs/remotes/tags/* .git/refs/tags/ && rmdir .git/refs/remotes/tags + [tmz@hosted1 im-chooser (master)]$ mv .git/refs/remotes/* .git/refs/heads/ + +Now 'git branch' and 'git tag' should display the branches/tags. + +Create a bare repo from the converted git repo. +Using ``file://$(pwd)`` here ensures that git copies all objects to the new bare repo.:: + + [tmz@hosted1 im-chooser-conversion (master)]$ git clone --bare --shared file://$(pwd)/im-chooser im-chooser.git + +Follow the steps in https://fedoraproject.org/wiki/Hosted_repository_setup to +finish setting proper modes and permissions for the repo. Don't forget to +update the description file. + +.. note:: + This still leaves moving the converted bare repo (im-chooser.git) to /git + and fixing up the user/group. + +Questions left to be answered with this SOP +============================================ + +* Obviously we need to have requestor review the migration and confirm + it's ok. +* Do we then delete the old SCM contents? +* Do we need to change the FAS-group type to grant them access to + pull/push from it? diff --git a/docs/sysadmin-guide/sops/hotfix.rst b/docs/sysadmin-guide/sops/hotfix.rst new file mode 100644 index 0000000..4a78987 --- /dev/null +++ b/docs/sysadmin-guide/sops/hotfix.rst @@ -0,0 +1,58 @@ +.. title: Hotfixes SOP +.. slug: infra-hotfix +.. date: 2015-02-24 +.. taxonomy: Contributors/Infrastructure + +============ +HOTFIXES SOP +============ + +From time to time we have to quickly patch a problem or issue +in applications in our infrastructure. This process allows +us to do that and track what changed and be ready to remove +it when the issue is fixed upstream. + + +Ansible based items: +==================== +For ansible, they should be placed after the task that installs +the package to be changed or modified. Either in roles or tasks. + +hotfix tasks should be called "HOTFIX description" +They should also link in comments to any upstream bug or ticket. +They should also have tags of 'hotfix' + +The process is: + +- Create a diff of any files changed in the fix. +- Check in the _original_ files and change to role/task +- Check in now your diffs of those same files. +- ansible will replace the files on the affected machines + completely with the fixed versions. +- If you need to back it out, you can revert the diff step, + wait and then remove the first checkin + +Example:: + + + + # + # install hash randomization hotfix + # See bug https://bugzilla.redhat.com/show_bug.cgi?id=812398 + # + - name: hotfix - copy over new httpd init script + copy: src="{{ files }}/hotfix/httpd/httpd.init" dest=/etc/init.d/httpd + owner=root group=root mode=0755 + notify: + - restart apache + tags: + - config + - hotfix + - apache + +Upstream changes +================ + +Also, if at all possible a bug should be filed with the upstream +application to get the fix in the next version. Hotfixes are something +we should strive to only carry a short time. diff --git a/docs/sysadmin-guide/sops/hotness.rst b/docs/sysadmin-guide/sops/hotness.rst new file mode 100644 index 0000000..2992cee --- /dev/null +++ b/docs/sysadmin-guide/sops/hotness.rst @@ -0,0 +1,67 @@ +.. title: The New Hotness SOP +.. slug: hotness-sop +.. date: 2017-01-31 +.. taxonomy: Contributors/Infrastructure + +.. _hotness-sop: + +The New Hotness +=============== +`the-new-hotness `_ is a +`fedmsg consumer `_ +that subscribes to `release-monitoring.org `_ fedmsg +notifications to determine when a package in Fedora should be updated. For more details +on the-new-hotness, consult the `project documentation `_. + + +Contact Information +------------------- +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + phx2.fedoraproject.org +Servers + hotness01.phx2.fedoraproject.org + hotness01.stg.phx2.fedoraproject.org +Purpose + File issues when upstream projects release new versions of a package + + +Deploying a New Version +----------------------- +As of January 31, 2017, the-new-hotness is not packaged for Fedora or EPEL. When upstream +tags a new version in Git and you are building a new version (from the specfile in the upstream +repository), you will need to build it into the :ref:`infra-repo`. + +1. Build the SRPM with ``koji build epel7-infra the-new-hotness--.src.rpm``. If + you do not have permission to perform this build (it fails with permission denied), ask for help + in #fedora-admin. + +2. Consult the upstream changelog. If necessary, adjust the Ansible configuration for + the-new-hotness. + +3. Update the host. At the moment this is done with shell access to the host and running:: + + $ sudo -i yum clean all + $ sudo -i yum update the-new-hotness + +4. Ensure the configuration is up-to-date by running this on batcave01:: + + $ sudo rbac-playbook -l staging groups/hotness.yml # remove the "-l staging" to update prod + +All done! + + +Monitoring Activity +------------------- +It can be nice to check up on the-new-hotness to make sure its behaving correctly. +You can see all the Bugzilla activity using the +`user activity query `_ (staging uses +`partner-bugzilla.redhat.com `_) +and querying for the ``upstream-release-monitoring@fedoraproject.org`` user. + +You can also view all the Koji tasks dispatched by the-new-hotness. For example, you can see the +`failed tasks `_ +it has created. diff --git a/docs/sysadmin-guide/sops/ibm-drive-replacement.rst b/docs/sysadmin-guide/sops/ibm-drive-replacement.rst new file mode 100644 index 0000000..afafe0c --- /dev/null +++ b/docs/sysadmin-guide/sops/ibm-drive-replacement.rst @@ -0,0 +1,342 @@ +.. title: Drive Replacement SOP +.. slug: infra-drive-replacement +.. date: 2012-07-13 +.. taxonomy: Contributors/Infrastructure + +==================================== +Drive Replacement Infrastructure SOP +==================================== + +At present this SOP only works for the X series IBM servers. + +We have multiple machines with lots of different drives in them. For the +most part now though, we are trying to standardise on IBM X series +servers. At present I've not figured out how to disable onboard raid, as a +result of this many of our servers have two raid 0 arrays then we do +software raid on this. + +The system xen11 is currently an HP ProLiant DL180 G5 with its own +interesting RAID system (using Compaq Smart Array ccis). Like the IBM X +series each drive is considered a single RAID-0 instance which is then +accessed through a logical drive. + +Contents +======== + +1. Contact Information +2. Verify the drive is dead + + 1. Re-adding a drive (poor man's fix) + +3. Actually replacing the drive (IBM) + + 1. Collecting Data + 2. Call IBM + 3. Get the package, give access to the tech + 4. Prepwork before the tech arrives + 5. Tech on site + 6. Rebuild the array + +4. Actually Replacing the Drive (HP) + + 1. Collecting data + 2. Call HP + 3. Get the package, give access to the tech + 4. Prepwork before the tech arrives + 5. Tech on site + 6. Rebuild the array + +5. Installing RaidMan (IBM Only) + +Database - DriveReplacement + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + All +Servers + All +Purpose + Steps for drive replacement. + +Verify the drive is dead +======================== + +:: + + $ cat /proc/mdadm + Personalities : [raid1] + md0 : active raid1 sdb1[1] sda1[0] + 513984 blocks [2/2] [UU] + + md1 : active raid1 sdb2[2](F) sda2[0] + 487717248 blocks [2/1] [U_] + +This indicates that md1 is in a degraded state and that /dev/sdb2 is the +failed drive. Notice that /dev/sdb1 (same physical drive as /dev/sdb2) is +not failed. /dev/md0 (not yet degraded) is showing a good state. This is +because /dev/md0 is /boot. If you run:: + + touch /boot/t + sync + rm /boot/t + +That should make /dev/md0 notice that its drive is also failed. If it does +not fail, its possible the drive is fine and that some blip happened that +caused it to get flagged as dead. It is also worthwhile to log in to +xenX-mgmt to determine if the RSAII adapter has noticed the drive is dead. + +If you think the drive just had a blip and is fine, see "Re-adding" below + +Re-adding a drive (poor man's fix) +----------------------------------- + +Basically what we're doing here is making sure the drive is, infact, dead. +Obviously you don't want to do this more then once on a drive, if it +continues to fail. Replace it. + +:: + + # cat /proc/mdadm + Personalities : [raid1] + md0 : active raid1 sdb1[1] sda1[0] + 513984 blocks [2/2] [UU] + + md1 : active raid1 sdb2[2](F) sda2[0] + 487717248 blocks [2/1] [U_] + # mdadm /dev/md1 --remove /dev/sdb2 + # mdadm /dev/md1 --add /dev/sdb2 + # cat /proc/mdstat + md0 : active raid1 sdb1[1] sda1[0] + 513984 blocks [2/1] [U_] + resync=DELAYED + + md1 : active raid1 sdb2[2] sda2[0] + 487717248 blocks [2/1] [U_] + [=>...................] recovery = 9.2% (45229120/487717248) finish=145.2min speed=50771K/sec + +So we removed the bad drive, added it again and you can now see the +recovery status. Watch it carefully. If it fails again, time for a drive +replacement. + +Actually replacing the drive (IBM) +================================== + +Actually replacing the drive is a bit of a todo. If the box is in a RH +owned location, we'll have to file a ticket and get someone access to the +colo. If it is at another location, we may be able to just ship the drive +there and have someone do it on site. Please follow the below steps for +drive replacement. + +Collecting Data +---------------- + +There's a not insignificant amount of data you'll need to place the call. +Please have the following information handy: + +1) The hosts machine type (this is not model number).:: + + # lshal | grep system.product + system.product = 'IBM System x3550 -[7978AC1]-' (string) + + In the above case, the machine type is encoded into [7978AC1]. And is just + the first 4 numbers. So this machine type is 7978. M/T (machine type) is + always 4 digits for IBM boxes. + +2) Machine's serial number:: + + # lshal | grep system.hardware.serial + system.hardware.serial = 'FAAKKEE' (string) + + The above's serial number is 'FAAKKEE' + +3) Drive Stats + + There are two ways to get the drive stats. You can get some of this + information via hal, but for the full complete information you need to + either have someone physically go look at the drive (some of which is in + inventory) or use RaidMan. See "Installing RaidMan" below for more + information on how to install RaidMan. + + Specifically you need: + + - Drive Size (in G) + - Drive Type (SAS or SATA?) + - Drive Model + - Drive Vendor + + To get this information run:: + + # cd /usr/RaidMan/ + # ./arcconf GETCONFIG 1 + +4) The phone number and address of the building where the drive is + currently located. This will go to the RH cage. + + This information is located in the contacts.txt of private git repo on + batcave01 (only available to sysadmin-main people) + + Call IBM + + Call 1-800-426-7378 and follow the directions they give you. You'll need + to use the M/T above to get to the correct rep. They will ask you for the + information above (you wrote it down, right?) + + When they agree to replace the drive, make sure to tell them you need the + shipping number of the drive as well as the name of the tech who will do + the drive replacement. Sometimes the tech will just bring the drive. If + not though, you need to open a ticket with the colo to let them know a + drive is coming. + + Get the package, give access to the tech + + As SOON as you get this information, open a ticket with RH. at + is-ops-tickets redhat.com. Request a ticket ID from RH. If the tech has + any issues getting into the colo, you can give the AT&T ticket request to + the tech to get them in. + + NOTE: this can often take hours. We have 4 hour on site response time from + IBM. This time goes very quickly, sometimes you may need to page out + someone in IS to ensure it gets created quickly. To get this pager + information see contacts.txt in batcave01's private repo (if batcave01 is down + for some reason see the dr copy on backup2.fedoraproject.org:/srv/ + + Prepwork before the tech arrives + + Really the big thing here is to remove the broken drive from the array. In + our earlier example we found /dev/sdb failed. We'll want to remove it from + both arrays: + + # mdadm /dev/md0 --remove /dev/sdb1 + # mdadm /dev/md1 --remove /dev/sdb2 + + Next get the current state of the drives and save it somewhere. See + "Installing RaidMan" for more information if RaidMan is not installed. + + # cd /usr/RaidMan + # ./arcconf GETCONFIG 1 > /tmp/raid1.txt + + Copy /tmp/raid1.txt off to some other device and save it until the tech is + on site. It should contain information about the failed drive. + + Tech on site + + When the tech is on site you may have to give him the rack location. All + of our Mesa servers are in one location, "the same room that the desk is + in". You may have to give him the serial number of the server, or possibly + make it blink. It's either the first rack on the left labeled: "01 2 55" + or "01 2 58". + + Once he's replaced the drive, he'll have you verify. Use the RaidMan tools + to do the following: + + # cd /usr/RaidMan + # ./arcconf RESCAN 1 + # ./arcconf GETCONFIG 1 > /tmp/raid2.txt + # # arcconf CREATE LOGICALDRIVE [Options] + # ./arcconf create 1 LOGICALDRIVE 476790 Simple_volume 0 1 + + First we're going to re-scan the array for the new drive. Then we'll + re-get the configs. Compare /tmp/raid2.txt to /tmp/raid1.txt and verify + the bad drive is fixed and that it has a different serial number. Also + make sure its the correct size. Thank the tech and send him on his way. + The last line there creates a new logical drive from the physical drive. + "Simple_volume" tells it to create a raid0 array of one drive. The size + was pulled out of our initial /tmp/raid1.txt (should match the other + drive). The last two numbers are the Channel and ID of the new drive. + + Rebuild the array + + Now that the disk has been replaced we need to put a partition table on + the new drive and add it to the array: + + * /dev/sdGOOD is the *GOOD* drive + * /dev/sdBAD is the *BAD* drive + + # dd if=/dev/sdGOOD of=/tmp/sda-mbr.bin bs=512 count=1 + # dd if=/tmp/sda-mbr.bin of=/dev/sdBAD + # partprobe + + Next re-add the drives to the array: + + * /dev/sdBAD1 and /dev/sdBAD2 are the partitons on the new drive which + is no longer bad. + + # mdadm /dev/md0 --add /dev/sdBAD1 + # mdadm /dev/md1 --add /dev/sdBAD2 + # cat /proc/mdadm + + This starts rebuilding the arrays, the last line checks the status. + +Actually Replacing the Drive (HP) + + Replacing the drive on the HP's is similar to the IBM's. First you will + need to contact HP, then you will need to open a ticket with Red Hat's + Helpdesk to get into the PHX2 facility. Then you will need to coordinate + with the technician on the colocation's rules for entry and who to + call/talk with. + + Collecting data + + Call HP + + Get the package, give access to the tech + + Prepwork before the tech arrives + + Tech on site + + Rebuild the array + + Now that the disk has been replaced we need to put a partition table on + the new drive and add it to the array: + + * /dev/cciss/c0dGOOD is the *GOOD* drive. The HP utilities will have a + code like 1I:1:1 + * /dev/cciss/c0dBAD is the *BAD* drive. The HP utilities will have a + code like 2I:1:1 + + First we need to create the logical drive on the system. + + # hpacucli controller serialnumber=P61630H9SVU4JF create type=ld sectors=63 drives=2I:1:1 raid=0 + + # dd if=/dev/ccis/c0dGOOD of=/tmp/sda-mbr.bin bs=512 count=1 + # dd if=/tmp/sda-mbr.bin of=/dev/ccis/c0dBAD + # partprobe + + Next re-add the drives to the array: + + * /dev/sdBAD1 and /dev/sdBAD2 are the partitons on the new drive which + is no longer bad. + + # mdadm /dev/md0 --add /dev/sdBAD1 + # mdadm /dev/md1 --add /dev/sdBAD2 + # cat /proc/mdadm + + This starts rebuilding the arrays, the last line checks the status. + +Installing RaidMan (IBM Only) + + Unfortunately there is no feasible alternative to managing IBM Raid Arrays + without causing downtime. You can get and do this via the pre-POST + interface. This requires downtime, and if the first drive is the failed + drive, may result in a non-booting system. So for now RaidMan it is until + we can figure out how to get rid of the raid controllers in these boxes + completely. + + yum -y install compat-libstdc++-33.i686 + rpm -ihv https://infrastructure.fedoraproject.org/rhel/RaidMan/RaidMan-9.00.i386.rpm + + To verify installation has completed successfully: + + # cd /usr/RaidMan/ + # ./arcconf GETCONFIG 1 + + This should print the current configuration of the raid controller and its + logical drives. + diff --git a/docs/sysadmin-guide/sops/ibm_rsa_ii.rst b/docs/sysadmin-guide/sops/ibm_rsa_ii.rst new file mode 100644 index 0000000..f037d98 --- /dev/null +++ b/docs/sysadmin-guide/sops/ibm_rsa_ii.rst @@ -0,0 +1,61 @@ +.. title: IBM RSA II Remote Management SOP +.. slug: infra-ibm-rsa-ii +.. date: 2011-08-23 +.. taxonomy: Contributors/Infrastructure + +============================= +IBM RSA II Infrastructure SOP +============================= + +Many of our physical machines use RSA II cards for remote management. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + PHX, ibiblio +Servers + All physical IBM machines +Purpose + Provide remote management for our physical IBM machines + +Restarting the RSA II card +========================== + +Normally, the RSA II can be restarted from the web/ssh interface. If you +are locked out of any outside access to the RSA II, follow these +instructions on the physical machine. + +If the machine can be rebooted without issue, cut off all power to the +machine, wait a few seconds, and restart everything. + +Otherwise, to restart the card without rebooting the machine: + +1. Download and install the IBM Remote Supervisor Adapter II Daemon + + 1. ``yum install usbutils libusb-devel`` # (needed by the RSA II daemon) + + 2. Download the correct tarball from + http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5071676&brandind=5000008 + (TODO: check if this can be packaged in Fedora) + + 3. Extract the tarball and run ``sudo ./install.sh --update`` + +2. Download and extract the IBM Advanced Settings Utility + http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=TOOL-ASU&brandind=5000016 + + .. warning:: this tarball dumps files in the current working directory + +3. Issue a ``sudo ./asu64 rebootrsa`` to reboot the RSA II. + +4. Clean up: ``yum remove ibmusbasm64`` + +Other Resources +=============== + +http://www.redbooks.ibm.com/abstracts/sg246495.html may be a useful +resource to refer to when working with this. diff --git a/docs/sysadmin-guide/sops/index.rst b/docs/sysadmin-guide/sops/index.rst new file mode 100644 index 0000000..89fd5b6 --- /dev/null +++ b/docs/sysadmin-guide/sops/index.rst @@ -0,0 +1,142 @@ +.. Fedora Infrastructure Best Practices documentation master file, created by + sphinx-quickstart on Wed Jan 25 17:17:34 2017. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +.. _sops: + +Standard Operating Procedures +============================= + +Below is a table of contents containing all the standard operating procedures +for Fedora Infrastructure applications. For information on how to write a new +standard operating procedure, consult the guide on :ref:`develop-sops`. + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + 2-factor + accountdeletion + anitya + ansible + apps-fp-o + archive-old-fedora + arm + askbot + badges + basset + bastion-hosts-info + bladecenter + blockerbugs + bodhi + bugzilla2fedmsg + bugzilla + cloud + collectd + contenthosting + copr + cyclades + darkserver + database + datanommer + denyhosts + departing-admin + dns + fas-notes + fas-openid + fedmsg-certs + fedmsg-gateway + fedmsg-introduction + fedmsg-irc + fedmsg-new-message-type + fedmsg-relay + fedmsg-websocket + fedocal + fedorahosted-fedmsg + fedorahosted-project-cleanup + fedorahostedrename + fedorahosted-repo-setup + fedorahosted + fedorapackages + fedorapastebin + fedora-releases + fedorawebsites + fmn + freemedia + freenode-irc-channel + gather-easyfix + github2fedmsg + github + gitweb + guestdisk + guestedit + guestmigrate + haproxy + hosted_git_to_svn + hotfix + ibm-drive-replacement + ibm_rsa_ii + infra-git-repo + infra-hostrename + infra-raidmismatch + infra-repo + infra-retiremachine + infra-yubikey + ipsilon + iscsi + jenkins-fedmsg + kerneltest-harness + kickstarts + koji-builder-setup + koji + koschei + layered-image-buildsys + linktracking + loopabull + mailman + making-ssl-certificates + massupgrade + mastermirror + memcached + mirrorhiding + mirrormanager + mirrormanager-S3-EC2-netblocks + mote + nagios + netapp + new-hosts + nonhumanaccounts + nuancier + openvpn + orientation + outage + packagedatabase + pdc + pesign-upgrade + planetsubgroup + privatefedorahosted + publictest-dev-stg-production + rdiff-backup + requestforresources + resultsdb + reviewboard + scmadmin + selinux + sigul-upgrade + sshaccess + sshknownhosts + staging-infra + staging + stagingservers + status-fedora + syslog + taskotron + torrentrelease + unbound + virt-image + virtio + virt-notes + voting + wiki + zodbot diff --git a/docs/sysadmin-guide/sops/infra-git-repo.rst b/docs/sysadmin-guide/sops/infra-git-repo.rst new file mode 100644 index 0000000..d4ce5d5 --- /dev/null +++ b/docs/sysadmin-guide/sops/infra-git-repo.rst @@ -0,0 +1,62 @@ +.. title: Fedora Infrastructure Git Repo SOP +.. slug: infra-git +.. date: 2013-06-17 +.. taxonomy: Contributors/Infrastructure + +======================== +Infrastructure Git Repos +======================== + +Setting up an infrastructure git repo - and the push mechanisms for the +magicks + +We have a number of git repos (in /git on batcave) that manage files +for ansible, our docs, our common host info database and our kickstarts +This is a doc on how to setup a new one of these, if it is needed. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + Phoenix +Servers + batcave01.phx2.fedoraproject.org, + batcave-comm01.qa.fedoraproject.org + + +Steps +====== +Create the bare repo:: + + make $git_dir + setfacl -m d:g:$yourgroup:rwx -m d:g:$othergroup:rwx \ + -m g:$yourgroup:rwx -m g:$othergroup:rwx $git_dir + + cd $git_dir + git init --bare + + +edit up config - add these lines to the bottom:: + + [hooks] + # (normallysysadmin-members@fedoraproject.org) + mailinglist = emailaddress@yourdomain.org + emailprefix = + maildomain = fedoraproject.org + reposource = /path/to/this/dir + repodest = /path/to/where/you/want/the/files/dumped + + +edit up description - make it something useful:: + + + cd hooks + rm -f *.sample + cp hooks from /git/infra-docs/hooks/ on batcave01 to this path + +modify sudoers to allow users in whatever groups can commit to +this repo can run /usr/local/bin/syncgittree.sh w/o inputting a password diff --git a/docs/sysadmin-guide/sops/infra-hostrename.rst b/docs/sysadmin-guide/sops/infra-hostrename.rst new file mode 100644 index 0000000..d8b59ca --- /dev/null +++ b/docs/sysadmin-guide/sops/infra-hostrename.rst @@ -0,0 +1,110 @@ +.. title: Infrastructure Host Rename SOP +.. slug: infra-host-rename +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +============================== +Infrastructure Host Rename SOP +============================== + +This page is intended to guide you through the process of renaming a +virtual node. + +Contents +======== + +1. Introduction +2. Finding out where the host is +3. Preparation +4. Renaming the Logical Volume +5. Doing the actual rename +6. Telling ansible about the new host +7. VPN Stuff + +Introduction +============ + +Throughout this SOP, we will refer to the old hostname as $oldhostname and +the new hostname as $newhostname. We will refer to the Dom0 host that the +vm resides on as $vmhost. + +If this process is being followed so that a temporary-named host can +replace a production host, please be sure to follow the [51]Infrastructure +retire machine SOP to properly decommission the old host before +continuing. + +Finding out where the host is +============================= + +In order to rename the host, you must have access to the Dom0 (host) on +which the virtual server resides. To find out which host that is, log in +to batcave01, and run:: + + grep $oldhostname /var/log/virthost-lists.out + +The first column of the output will be the Dom0 of the virtual node. + +Preparation +=========== + +SSH to $oldhostname. If the new name is replacing a production box, change +the IP Address that it binds to, in ``/etc/sysconfig/network-scripts/ifcfg-eth0``. + +Also change the hostname in ``/etc/sysconfig/network``. + +At this point, you can ``sudo poweroff`` $oldhostname. + +Open an ssh session to $vmhost, and make sure that the node is listed as +``shut off``. If it is not, you can force it off with:: + + virsh destroy $oldhostname + +Renaming the Logical Volume +============================ +Find out the name of the logical volume (on $vmhost):: + + virsh dumpxml $oldhostname | grep 'source dev' + +This will give you a line that looks like ```` which tells you that +``/dev/VolGroup00/$oldhostname`` is the path to the logical volume. + +Run ``/usr/sbin/lvrename`` (the path that you found above) (the path that you +found above, with $newhostname at the end instead of $oldhostname)` + +For example:: + /usr/sbin/lvrename /dev/VolGroup00/noc03-tmp /dev/VolGroup00/noc01 + +Doing the actual rename +======================= +Now that the logical volume has been renamed, we can rename the host in +libvirt. + +Dump the configuration of $oldhostname into an xml file, by running:: + + virsh dumpxml $oldhostname > $newhostname.xml + +Open up $newhostname.xml, and change all instances of $oldhostname to +$newhostname. + +Save the file and run:: + + virsh define $newhostname.xml + +If there are no errors above, you can undefine $oldhostname:: + + virsh undefine $oldhostname + +Power on $newhostname, with:: + + virsh start $newhostname + +And remember to set it to autostart:: + + virsh autostart $newhostname + + +VPN Stuff +========= + +TODO diff --git a/docs/sysadmin-guide/sops/infra-raidmismatch.rst b/docs/sysadmin-guide/sops/infra-raidmismatch.rst new file mode 100644 index 0000000..d5cf5a9 --- /dev/null +++ b/docs/sysadmin-guide/sops/infra-raidmismatch.rst @@ -0,0 +1,75 @@ +.. title: Infrastructure Raid Mismatch Count SOP +.. slug: infra-raid-mismatch +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +====================================== +Infrastructure/SOP/Raid Mismatch Count +====================================== + +What to do when a raid device has a mismatch count + +Contents +======== +1. Contact Information +2. Description +3. Correction + + 1. Step 1 + 2. Step 2 + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main + +Location + All + +Servers + Physical hosts + +Purpose + Provides database connection to many of our apps. + +Description +=========== +In some situations a raid device may indicate there is a count mismatch as +listed in:: + + /sys/block/mdX/md/mismatch_cnt + +Anything other than 0 is considered not good. Though if the number is low +it's probably nothing to worry about. To correct this situation try the +directions below. + +Correction +========== + +More than anything these steps are to A) Verify there is no problem and B) +make the error go away. If step 1 and step 2 don't correct the problems, +PROCEED WITH CAUTION. The steps below, however, should be relatively safe. + + +Issue a repair (replace mdX with the questionable raid device):: + + echo repair > /sys/block/mdX/md/sync_action + +Depending on the size of the array and disk speed this can take a while. +Watch the progress with:: + + cat /proc/mdstat + +Issue a check. It's this check that will reset the mismatch count if there +are no problems. Again replace mdX with your actual raid device.:: + + echo check > /sys/block/mdX/md/sync_action + +Just as before, you can watch the progress with:: + + cat /proc/mdstat + diff --git a/docs/sysadmin-guide/sops/infra-repo.rst b/docs/sysadmin-guide/sops/infra-repo.rst new file mode 100644 index 0000000..535c995 --- /dev/null +++ b/docs/sysadmin-guide/sops/infra-repo.rst @@ -0,0 +1,109 @@ +.. title: Infrastructure RPM Repository SOP +.. slug: infra-repo +.. date: 2016-10-12 +.. taxonomy: Contributors/Infrastructure + +=========================== +Infrastructure Yum Repo SOP +=========================== + +In some cases RPM's in Fedora need to be rebuilt for the Infrastructure +team to suit our needs. This repo is provided to the public (except for +the RHEL RPMs). Rebuilds go into this repo which are stored on the netapp +and shared via the proxy servers after being built on koji. + +For basic instructions, read the standard documentation on Fedora wiki: +- https://fedoraproject.org/wiki/Using_the_Koji_build_system + +This document will only outline the differences between the "normal" repos +and the infra repos. + + +Contents +======== + +1. Contact Information +2. Building an RPM +3. Tagging an existing build +4. Koji package list + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location: PHX [53]http + //infrastructure.fedoraproject.org/ +Servers + koji + batcave01 / Proxy Servers +Purpose + Provides infrastructure repo for custom Fedora Infrastructure rebuilds + +Building an RPM +=============== + +Building an RPM for Infrastructure is significantly easier then building +an RPM for Fedora. Basically get your SRPM ready, then submit it to koji +for building to the $repo-infra target. (e.g. epel7-infra). + +Example:: + + rpmbuild --define "dist .el7" -bs test.spec + koji build epel7-infra test-1.0-1.el7.src.rpm + +.. note:: + Remember to build it for every dist / arch you need to deploy it on. + +After it has been built, you will see it's tagged as $repo-infra-candidate, +this means that it is a candidate for being signed. The automatic signing +system will pick it up and sign the package for you without any further +intervention. You can track when this is done by checking the build info: +when it is moved from $repo-infra-candidate to $repo-infra, it has been +signed. You can check this on the web interface (look under "Tags"), or via:: + + koji buildinfo test-1.0-1.el7. + +For importing it into the live repositories, you can just wait a few minutes. +There's a cronjob that runs every :00, :15, :30 and :45 that refreshes the +infrastructure repository with all packages that have been tagged. +After this time, you can yum clean all and then install the packages via yum +install or yum update. + +Admins can also manually trigger that script via:: + + /mnt/fedora/app/fi-repo/infra/update.sh + + +Tagging existing builds +======================= + +If you already have a real build and want to use it inthe infrastructure before +it has landed in stable, you can tag it into the respective infra-candidate tag. +For example, if you have an epel7 build of test2-1.0-1.el7, run:: + + koji tag epel7-infra-candidate test2-1.0-1.el7 + +And then the same autosigning and cronjob from the previous section applies. + + +Koji package list +================= + +If you try to build a package into the infra tags, and koji says something like: +BuildError: package test not in list for tag epel7-infra-candidate +That means that the package has not been added to the list for building in that +particular tag. Either add the package to the respective Fedora/EPEL branches +(this is the preferred method, since we should always aim to get everything +packaged for Fedora/EPEL), or ask a koji admin to add the package to the listing +for the respective tag. + +To list koji admins:: + + koji list-history --permission=admin --active | grep grant + +For koji admins, they can run:: + + koji add-pkg $tag $package --owner=$user diff --git a/docs/sysadmin-guide/sops/infra-retiremachine.rst b/docs/sysadmin-guide/sops/infra-retiremachine.rst new file mode 100644 index 0000000..b5dbbb8 --- /dev/null +++ b/docs/sysadmin-guide/sops/infra-retiremachine.rst @@ -0,0 +1,54 @@ +.. title: Infrastructure Machine Retirement SOP +.. slug: infra-machine-retirement +.. date: 2011-08-23 +.. taxonomy: Contributors/Infrastructure + +================================= +Infrastructure retire machine SOP +================================= + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin +Location: + anywhere +Servers: + any +Purpose: + Makes sure decommisioning machines is correctly done + +Introduction +============ + +When a machine (be it virtual instance or real physical hardware is +decommisioned, a set of steps must be followed to ensure that the machine +is properly removed from the set of machines we manage and doesn't cause +problems down the road. + +Retire process +============== + +1. Ensure that the machine is no longer used for anything. Use git-grep, + stop services, etc. + +2. Remove the machine from ansible. Make sure you not only remove the main + machine name, but also any aliases it might have (or move them to an + active server if they are active services. Make sure to search for the IP + address(s) of the machine as well. Ensure dns is updated to remove the + machine. + +3. Remove the machine from any labels in hardware devices like consoles or + the like. + +4. Revoke the ansible cert for the machine. + +5. Move the machine xml defintion to ensure it does NOT start on boot. You + can move it to 'name-retired-YYYY-MM-DD'. + +6. Ensure any backend storage the machine was using is freed or renamed to + name-retired-YYYY-MM-DD + +TODO +====== +fill in commands diff --git a/docs/sysadmin-guide/sops/infra-yubikey.rst b/docs/sysadmin-guide/sops/infra-yubikey.rst new file mode 100644 index 0000000..a11c994 --- /dev/null +++ b/docs/sysadmin-guide/sops/infra-yubikey.rst @@ -0,0 +1,147 @@ +.. title: Infrastructure Yubikey SOP +.. slug: infra-yubikey +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +========================== +Infrastructure/SOP/Yubikey +========================== + +This document describes how yubikey authentication works + +Contents +======== + +1. Contact Information +2. User Information +3. Host Admins + + 1. pam_yubico + +4. Server Admins + + 1. Basic architecture + 2. ykval + 3. ykksm + 4. Physical Yubikey info + +5. fas integration + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main + +Location + Phoenix + +Servers + fas*, db02 + +Purpose + Provides yubikey authentication in Fedora + +Config Files +============ +* ``/etc/httpd/conf.d/yk-ksm.conf`` +* ``/etc/httpd/conf.d/yk-val.conf`` +* ``/etc/ykval/ykval-config.php`` +* ``/etc/ykksm/ykksm-config.php`` +* ``/etc/fas.cfg`` + +User Information +================ + +See [57]Infrastruture/Yubikey + +Host Admins +=========== + +pam_yubico + +Generated from fas, the /etc/yubikeyid works like a authroized_keys file +and maps valid keys to users. It is downloaded from FAS: + +[58]https://admin.fedoraproject.org/accounts/yubikey/dump + +Server Admins +============= +Basic architecture +------------------ +Yubikey authentication takes place in 3 basic phases. + +1. User presses yubikey which generates a one time password +2. The one time password makes its way to the yk-val application which + verifies it is not a replay +3. yk-val passes that otp on to the yk-ksm application which verifies the + key itself is a valid key + +If all of those steps succeed, the ykval application sends back an OK and +authentication is considered successful. The two applications are defined +below, if either of them is unavailable, yubikey authentication will fail. + +ykval +`````` + +Database: db02:ykval + +The database contains 3 tables. clients: just a valid client. These are +not users, these are systems able to authenticate against ykval. In our +case Fedora is the only client so there's just one entry here queue: Used +for distributed setups (we don't do this) yubikeys: maps which yubikey +belongs to which user + +ykval is installed on fas* and is located at: +[59]http://localhost/yk-val/verify + +Purpose: Is to map keys to users and protect against replay attacks + +ykksm +`````` +Database: db02:ykksm + +The database contains one table: yubikeys: maps who created keys, what key +was created, when, and the public name and serial number, whether its +active, etc. + +ykksm is installed on fas* at [60]http://localhost/yk-ksm + +Purpose: verify if a key is a valid known key or not. Nothing contacts +this service directly except for ykval. This should be considered the +“high security” portion of the system as access to this table would allow +users to make their own yubikeys. + +Physical Yubikey info +`````````````````````` + +The actual yubikey contains information to generate a one time password. +The important bits to know are the begining of the otp contains the +identifier of the key (used similar to how ssh uses authorized_keys) and +note the rest of it contains lots of bits of information, including a +serial incremental. + +Sample key: ``ccccfcdaivjrvdhvzfljbbievftnvncljhibkulrftt`` + +Breaking this up, the first 12 characters are the identifier. This can be +considered 'public' + +ccccfcdaivj rvdhvzfljbbievftnvncljhibkulrftt + +The second half is the otp part. + +fas integration +=============== +Fas integration has two main parts. First is key generation, the next is +activation. The fas-plugin-yubikey contains the bits for both, and +verification. Users call on this page to generate the key info: + +[61]https://admin.fedoraproject.org/accounts/yubikey/genkey + +The fas password field automatically detects whether someone is using a +otp or a regular password. It then sends otp requests to yk-val for +verification. + diff --git a/docs/sysadmin-guide/sops/ipsilon.rst b/docs/sysadmin-guide/sops/ipsilon.rst new file mode 100644 index 0000000..a1c4d70 --- /dev/null +++ b/docs/sysadmin-guide/sops/ipsilon.rst @@ -0,0 +1,80 @@ +.. title: Ipsilon Infrastucture SOP +.. slug: infra-ipsilon +.. date: 2016-03-21 +.. taxonomy: Contributors/Infrastructure + +========================= +Ipsilon Infrastructure SOP +========================= + + + +Contents +======== + +1. Contact Information +2. Description +3. Known Issues +4. ReStarting +5. Configuration +6. Common actions + 6.1. Registering OpenID Connect Scopes + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + Phoenix +Servers + ipsilon01.phx2.fedoraproject.org ipsilon02.phx2.fedoraproject.org ipsilion01.stg.phx2.fedoraproject.org. + +Purpose + Ipsilon is our central authentication service that is used to authenticate users agains FAS. It is seperate from FAS. + +Description +=========== + +Ipsilon is our central authentication agent that is used to authenticate users agains FAS. It is seperate from FAS. The only service that is not using this currently is the wiki. It is a web service that is presented via httpd and is load balanced by our standard haproxy setup. + +Known issues +============== + +No known issues at this time. There is not currently a logout option for ipsilon, but it is not considered an issue. If group memberships are updated in ipsilon the user will need to wait a few minutes for them to replicate to the all the systems. + +Restarting +=============== + +To restart the application you simply need to ssh to the servers for the problematic region and issue an 'service httpd restart'. This should rarely be required. + +Configuration +================ + +Configuration is handled by the ipsilon.yaml playbook in Ansible. This can also be used to reconfigure application, if that becomes nessecary. + +Common actions +============== +This section describes some common configuration actions. + +OpenID Connect Scope Registration +--------------------------------- +As documented on https://fedoraproject.org/wiki/Infrastructure/Authentication, application developers can request their own scopes. +When a request for this comes in, look in ansible/roles/ipsilon/files/oidc_scopes/ and copy an example module. +Copy this to a new file, so we have a file per scope set. +Fill in the information: + - name is an Ipsilon-internal name. This should not include any spaces + - display_name is the name that is displayed to the category of scopes to the user + - scopes is a dictionary with the full scope identifier (with namespace) as keys. + The values are dicts with the following keys: + display_name: The complete display name for this scope. This is what the user gets shown to accept/reject + claims: A list of additional "claims" (pieces of user information) an application will get when the user + consents to this scope. For most scopes, this will be the empty list. +In ansible/roles/ipsilon/tasks/main.yml, add the name of the new file (without .py) to the with_items of + "Copy OpenID Connect scope registrations"). +To enable, open ansible/roles/ipsilon/templates/configuration.conf, and look for the lines starting with + "openidc enabled extensions". +Add the name of the plugin (in the "name" field of the file) to the environment this scopeset has been requested for. +Run the ansible ipsilon.yml playbook. diff --git a/docs/sysadmin-guide/sops/iscsi.rst b/docs/sysadmin-guide/sops/iscsi.rst new file mode 100644 index 0000000..5b9abf6 --- /dev/null +++ b/docs/sysadmin-guide/sops/iscsi.rst @@ -0,0 +1,132 @@ +.. title: Infrastructure iSCSI SOP +.. slug: infra-iscsi +.. date: 2011-08-23 +.. taxonomy: Contributors/Infrastructure + +===== +iSCSI +===== + +iscsi allows one to share and mount block devices using the scsi protocol +over a network. Fedora currently connects to a netapp that has an iscsi +export. + +Contents +======== + +1. Contact Information +2. Typical uses +3. iscsi basics + + 1. Terms + 2. iscsi's basic login / logout procedure is + +4. Loggin in +5. Logging out +6. Important note about creating new logical volumes + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + Phoenix +Servers + xen[1-15] +Purpose + Provides iscsi connectivity to our netapp. + +Typical uses +============ + +The best uses for Fedora are for servers that are not part of a farm or +live replicated. For example, we wouldn't put app1 on the iscsi share +because we don't gain anything from it. Shutting down app1 to move it +isn't an issue because app1 is part of our application server farm. + +noc1, however, is not replicated. It's a stand alone box that, at best, +would have a non-live failover. By placing this host on an iscsi share, we +can make it more highly available as it allows us to move that box around +our virtualization infrastructure without rebooting it or even taking it +down. + +iscsi basics +============ + +Terms +------- + +* initiator means client +* target means server +* swab means mop +* deck means floor + +iscsi's basic login / logout procedure is +------------------------------------------- +1. Notify your client that a new target is available (similar to editing + /etc/fstab for a new nfs mount) +2. Login to the iscsi target (similar to running "mount /my/nfs" +3. Logout from the iscsi target (similar to running "umount /my/nfs" +4. Delete the target from the client (similar to removing the nfs mount + from /etc/fstab) + +Logging in +``````````` +Most mounts are covered by ansible so this should be automatic. In the +event that something goes wrong though, the best way to fix this is: + +- Notify the client of the target:: + + iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 -o new + +- Log in to the new target:: + + iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 --login + +- Scan and activate lvm:: + + pvscan + vgscan + vgchange -ay xenGuests + +Once this is done, one should be able to run "lvs" to see the logical +volumes + +Logging out +``````````` +Logging out isn't normally needed, for example rebooting a machine +automatically logs the initiator out. Should a problem arise though here +are the steps: + +- Disable the logical volume:: + + vgchange -an xenGuests + +- log out:: + + iscsiadm --mode node --targetname iqn.1992-08.com.netapp:sn.118047036 --portal 10.5.88.21:3260 --logout + +.. note:: ``Cannot deactivate volume group`` + + If the vgchange command fails with an error about not being able to + deactivate the volume group, this means that one of the logical volumes is + still in use. By running "lvs" you can get a list of volume groups. Look + in the Attr column. There are 6 attrs listed. The 5th column usually has a + '-' or an 'a'. 'a' means its active, - means it is not. To the right of + that (the last column) you will see an '-' or an 'o'. If you see an 'o' + that means that logical volume is still mounted and in use. + +.. important:: Note about creating new logical volumes + + At present we do not have logical volume locking on the xen servers. This + is dangerous and being worked on. Basically when you create a new volume + on a host, you need to run:: + + pvscan + vgscan + lvscan + + on the other virtualization servers. diff --git a/docs/sysadmin-guide/sops/jenkins-fedmsg.rst b/docs/sysadmin-guide/sops/jenkins-fedmsg.rst new file mode 100644 index 0000000..f45324b --- /dev/null +++ b/docs/sysadmin-guide/sops/jenkins-fedmsg.rst @@ -0,0 +1,49 @@ +.. title: Jenkins Fedmsg SOP +.. slug: infra-jenkins-fedmsg +.. date: 2016-05-11 +.. taxonomy: Contributors/Infrastructure + +================== +Jenkins Fedmsg SOP +================== + +Send information about Jenkins builds to fedmsg. + +Contact Information +------------------- + +Owner + Ricky Elrod, Fedora Infrastructure Team +Contact + #fedora-apps + +Reinstalling when it disappears +------------------------------- + +For an as-of-yet unknown reason, the plugin sometimes seems to disappear, +though it still shows as "installed" on Jenkins. + +To re-install it, grab `fedmsg.hpi` from `/srv/web/infra/bigfiles/jenkins`. +Go to the Jenkins web interface and log in. Click `Manage Jenkins` -> +`Manage Plugins` -> `Advanced`. Upload the plugin and on the page that comes +up, check the box to have Jenkins restart when running jobs are finished. + +Configuration Values +-------------------- + +These are written here in case the Jenkins configuration ever gets lost. +This is how to configure the jenkins-fedmsg-emit plugin. + +Assume the plugin is already installed. + +Go to "Configure Jenkins" -> "System Configuration" + +Towards the bottom, look for "Fedmsg Emitter" + +Values: + +Signing: Checked +Fedmsg Endpoint: tcp://209.132.181.16:9941 +Environment Shortname: prod +Certificate File: /etc/pki/fedmsg/jenkins-jenkins.fedorainfracloud.org.crt +Keystore File: /etc/pki/fedmsg/jenkins-jenkins.fedorainfracloud.org.key diff --git a/docs/sysadmin-guide/sops/kerneltest-harness.rst b/docs/sysadmin-guide/sops/kerneltest-harness.rst new file mode 100644 index 0000000..49d9959 --- /dev/null +++ b/docs/sysadmin-guide/sops/kerneltest-harness.rst @@ -0,0 +1,79 @@ +.. title: Kerneltest-harness SOP +.. slug: infra-kerneltest-harness +.. date: 2016-03-14 +.. taxonomy: Contributors/Infrastructure + +====================== +Kerneltest-harness SOP +====================== + +The kerneltest-harness is the web application used to gather and present +statistics about kernel test results. + +Contents +======== + +1. Contact Information +2. Documentation Links + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + https://apps.fedoraproject.org/kerneltest/ +Servers + kerneltest01, kerneltest01.stg +Purpose + Provide a system to gather and present kernel tests results + + +Add a new Fedora release +======================== + +* Login + +* On the front page, in the menu on the left side, if there is a `Fedora + Rawhide` release, click on `(edit)`. + +* Bump the `Release number` on `Fedora Rawhide` to avoid conflicts with the new + release you're creating + +* Back on the index page, click on `New release` + +* Complete the form: + + Release number + This would be the integer version of the Fedora release, for example 24 for + Fedora 24. + + Support + The current status of the Fedora release + - Rawhide for Fedora Rawhide + - Test for branched release + - Release for released Fedora + - Retired for retired release of Fedora + + +Upload new test results +======================= + +The kernel tests are available on the `kernel-test +`_ git repository. + +Once ran with `runtests.sh`, you can upload the resulting file either using +`fedora_submit.py` or the UI. + +If you choose the UI the steps are simply: + +* Login + +* Click on `Upload` in the main menu on the top + +* Select the result file generated by running the tests + +* Submit + diff --git a/docs/sysadmin-guide/sops/kickstarts.rst b/docs/sysadmin-guide/sops/kickstarts.rst new file mode 100644 index 0000000..e06b6d9 --- /dev/null +++ b/docs/sysadmin-guide/sops/kickstarts.rst @@ -0,0 +1,169 @@ +.. title: Infrastructure Kickstart SOP +.. slug: infra-kickstart +.. date: 2016-02-08 +.. taxonomy: Contributors/Infrastructure + +============================ +Kickstart Infrastructure SOP +============================ + +Kickstart scripts provide our install infrastructure. We have a +plethora of different kickstarts to best match the system you are trying +to install. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + Everywhere we have machines. +Servers + batcave01 (stores kickstarts and install media) +Purpose + Provides our install infrastructure + +Introduction +============ + +Our kickstart infrastructure lives on batcave01. All +install media and kickstart scripts are located on batcave01. Because the +RHEL binaries are not public we have these bits blocked. You can add +needed IPs to (from batcave01):: + + ansible/roles/batcave/files/allows + +Physical Machine (kvm virthost) +====================================== + +.. note:: PXE Booting + + If PXE booting just follow the prompt after doing the pxe boot (most hosts + will pxeboot via console hitting f12). + +Prep +---- + +This only works on an already booted box, many boxes at our colocations +may have to be rebuilt by the people in those locations first. Also make +sure the IP you are about to boot to install from is allowed to our IP +restricted infrastructure.fedoraproject.org as noted above (in +Introduction). + +Download the vmlinuz and initrd images. + +for a rhel6 install:: + + wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/images/pxeboot/vmlinuz \ + -O /boot/vmlinuz-install + wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/images/pxeboot/initrd.img \ + -O /boot/initrd-install.img + + grubby --add-kernel=/boot/vmlinuz-install \ + --args="ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-6-nohd \ + repo=https://infrastructure.fedoraproject.org/repo/rhel/RHEL6-x86_64/ \ + ksdevice=link ip=$IP gateway=$GATEWAY netmask=$NETMASK dns=$DNS" \ + --title="install el6" --initrd=/boot/initrd-install.img + +for a rhel7 install:: + + wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/images/pxeboot/vmlinuz -O /boot/vmlinuz-install + wget https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/images/pxeboot/initrd.img -O /boot/initrd-install.img + +For phx2 hosts:: + + grubby --add-kernel=/boot/vmlinuz-install \ + --args="ks=http://10.5.126.23/repo/rhel/ks/hardware-rhel-7-nohd \ + repo=http://10.5.126.23/repo/rhel/RHEL7-x86_64/ \ + net.ifnames=0 biosdevname=0 bridge=br0:eth0 ksdevice=br0 \ + ip={{ br0_ip }}::{{ gw }}:{{ nm }}:{{ hostname }}:br0:none" \ + --title="install el7" --initrd=/boot/initrd-install.img + +(You will need to setup the br1 device if any after install) + +For non phx2 hosts:: + + grubby --add-kernel=/boot/vmlinuz-install \ + --args="ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-ext \ + repo=https://infrastructure.fedoraproject.org/repo/rhel/RHEL7-x86_64/ \ + net.ifnames=0 biosdevname=0 bridge=br0:eth0 ksdevice=br0 \ + ip={{ br0_ip }}::{{ gw }}:{{ nm }}:{{ hostname }}:br0:none" \ + --title="install el7" --initrd=/boot/initrd-install.img + +Fill in the br0 ip, gateway, etc + +The default here is to use the hardware-rhel-7-nohd config which requires +you to connect via VNC to the box and configure its drives. If this is a +new machine or you are fine with blowing everything away, you can instead +use https://infrastructure.fedoraproject.org/rhel/ks/hardware-rhel-6-minimal +as your kickstart + +If you know the number of hard drives the system has there are other +kickstarts which can be used. + +2 disk system:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-02disk +or external:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-02disk-ext + +4 disk system:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-04disk +or external:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-04disk-ext + +6 disk system:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-06disk +or external:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-06disk-ext + +8 disk system:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-08disk +or external:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-08disk-ext + +10 disk system:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-10disk +or external:: + ks=https://infrastructure.fedoraproject.org/repo/rhel/ks/hardware-rhel-7-10disk-ext + + +Double and triple check your configuration settings (On RHEL-6 ``cat +/boot/grub/menu.lst`` and on RHEL-7 ``cat /boot/grub2/grub.cfg``), +especially your IP information. In places like ServerBeach not all hosts +have the same netmask or gateway. Once everything you are ready to run +the commands to get it set up to boot next boot. + +RHEL-6:: + + echo "savedefault --default=0 --once" | grub --batch + shutdown -r now + +RHEL-7:: + + grub2-reboot 0 + shutdown -r now + +Installation +------------ + +Once the box logs you out, start pinging the IP address. It will disappear +and come back. Once you can ping it again, try to open up a VNC session. +It can take a couple of minutes after the box is back up for it to +actually allow vnc sessions. The VNC password is in the kickstart script +on batcave01:: + + grep vnc /mnt/fedora/app/fi-repo/rhel/ks/hardware-rhel-7-nohd + + vncviewer $IP:1 + +If using the standard kickstart script, one can watch as the install +completes itself, there should be no need to do anything. If using the +hardware-rhel-6-nohd script, one will need to configure the drives. The +password is in the kickstart file in the kickstart repo. + +Post Install +------------ +Run ansible on the box asap to set root passwords and other security features. +Don't leave a newly installed box sitting around. diff --git a/docs/sysadmin-guide/sops/koji-builder-setup.rst b/docs/sysadmin-guide/sops/koji-builder-setup.rst new file mode 100644 index 0000000..841955e --- /dev/null +++ b/docs/sysadmin-guide/sops/koji-builder-setup.rst @@ -0,0 +1,138 @@ +.. title: Infrastructure Koji Builder SOP +.. slug: infra-koji-builder +.. date: 2012-11-29 +.. taxonomy: Contributors/Infrastructure + +====================== +Setup Koji Builder SOP +====================== + +Contents +======== + +- Setting up a new koji builder +- Resetting/installing an old koji builder + +Builder Setup +============== +Setting up a new koji builder involves a goodly number of steps: + +Network Overview +---------------- + +1. First get an instance spun up following the kickstart sop. + +2. Define a hostname for it on the 125 network and a $hostname-nfs name + for it on the .127 network. + +3. make sure the instance has 2 network connections: + + - eth0 should be on the .125 network + - eth1 should be on the .127 network + + For VM eth0 should be on br0, eth1 on br1 on the vmhost. + +Setup Overview +-------------- + +- install the system as normal:: + + virt-install -n $builder_fqdn -r $memsize \ + -f $path_to_lvm --vcpus=$numprocs \ + -l http://10.5.126.23/repo/rhel/RHEL6-x86_64/ \ + -x "ksdevice=eth0 ks=http://10.5.126.23/repo/rhel/ks/kvm-rhel-6 \ + ip=$ip netmask=$netmask gateway=$gw dns=$dns \ + console=tty0 console=ttyS0" \ + --network=bridge=br0 --network=bridge=br1 \ + --vnc --noautoconsole + +- run python ``/root/tmp/setup-nfs-network.py`` + this should print out the -nfs hostname that you made above + +- change root pw + +- disable selinux on the machine in /etc/sysconfig/selinux + +- reboot + +- setup ssl cert into private/builders - use fqdn of host as DN + + - login to fas01 as root + - ``cd /var/lib/fedora-ca`` + - ``./kojicerthelper.py normal --outdir=/tmp/ \ + --name=$fqdn_of_the_new_builder --cadir=. --caname=Fedora`` + + - info for the cert should be like this:: + + Country Name (2 letter code) [US]: + State or Province Name (full name) [North Carolina]: + Locality Name (eg, city) [Raleigh]: + Organization Name (eg, company) [Fedora Project]: + Organizational Unit Name (eg, section) []:Fedora Builders + Common Name (eg, your name or your servers hostname) []:$fqdn_of_new_builder + Email Address []:buildsys@fedoraproject.org + + - scp the file in ``/tmp/${fqdn}_key_and_cert.pem`` over to batcave01 + + - put file in the private repo under ``private/builders/${fqdn}.pem`` + + - ``git add`` + ``git commit`` + + - ``git push`` + + +- run ``./sync-hosts`` in infra-hosts repo; ``git commit; git push`` + +- as a koji admin run:: + + koji add-host $fqdnr i386 x86_64 + + (note: those are yum basearchs on the end - season to taste) + + +Resetting/installing an old koji builder +---------------------------------------- + +- disable the builder in koji (ask a koji admin) +- halt the old system (halt -p) +- undefine the vm instance on the buildvmhost:: + + virsh undefine $builder_fqdn + +- reinstall it - from the buildvmhost run:: + + virt-install -n $builder_fqdn -r $memsize \ + -f $path_to_lvm --vcpus=$numprocs \ + -l http://10.5.126.23/repo/rhel/RHEL6-x86_64/ \ + -x "ksdevice=eth0 ks=http://10.5.126.23/repo/rhel/ks/kvm-rhel-6 \ + ip=$ip netmask=$netmask gateway=$gw dns=$dns \ + console=tty0 console=ttyS0" \ + --network=bridge=br0 --network=bridge=br1 \ + --vnc --noautoconsole + +- watch install via vnc:: + + vncviewer -via bastion.fedoraproject.org $builder_fqdn:1 + +- when the install finishes: + + - start the instance on the buildvmhost:: + + virsh start $builder_fqdn + + - set it to autostart on the buildvmhost:: + + virsh autostart $builder_fqdn + +- when the guest comes up + + - login via ssh using the temp root password + - python /root/tmp/setup-nfs-network.py + - change root password + - disable selinux in /etc/sysconfig/selinux + - reboot + - ask a koji admin to re-enable the host + + + + diff --git a/docs/sysadmin-guide/sops/koji.rst b/docs/sysadmin-guide/sops/koji.rst new file mode 100644 index 0000000..7de7eea --- /dev/null +++ b/docs/sysadmin-guide/sops/koji.rst @@ -0,0 +1,212 @@ +.. title: Koji Infrastructure SOP +.. slug: infra-koji +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +======================= +Koji Infrastructure SOP +======================= + +.. note:: + We are transitioning from two buildsystems, koji for Fedora and plague for + EPEL, to just using koji. This page documents both. + +Koji and plague are our buildsystems. They share some of the same machines +to do their work. + +Contents +======== + +1. Contact Information +2. Description +3. Add packages into Buildroot +4. Troubleshooting and Resolution + + 1. Restarting Koji + 2. kojid won't start or some builders won't connect + 3. OOM (Out of Memory) Issues + + 1. Increase Memory + 2. Decrease weight + + 4. Disk Space Issues + +5. Should there be mention of being sure filesystems in chroots are +unmounted before you delete the chroots? + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-build group + +Persons + mbonnet, dgilmore, f13, notting, mmcgrath, SmootherFrOgZ + +Location + Phoenix + +Servers + - koji.fedoraproject.org + - buildsys.fedoraproject.org + - xenbuilder[1-4] + - hammer1, ppc[1-4] + +Purpose + Build packages for Fedora. + +Description +=========== + +Users submit builds to koji.fedoraproject.org or +buildsys.fedoraproject.org. From there it gets passed on to the builders. + +.. important:: + At present plague and koji are unaware of each other. A result of this may + be an overloaded builder. A easy fix for this is not clear at this time + +Add packages into Buildroot +=========================== + +Some contributors may have the need to build packages against fresh built +packages which are not into buildroot yet. Koji has override tags as a +Inheritance to the build tag in order to include them into buildroot which +can be set by:: + + koji tag-pkg dist-$release-override + +Troubleshooting and Resolution +============================== + +Restarting Koji +--------------- + +If for some reason koji needs to be restarted, make sure to restart the +koji master first, then the builders. If the koji master has been down for +a short enough time the builders do not need to be restarted.:: + + service httpd restart + service kojira restart + service kojid restart + +.. important:: + If postgres becomes interrupted in some way, koji will need to be + restarted. As long as the koji master daemon gets restarted the builders + should reconnect automatically. If the db server has been restarted and + the builders don't seem to be building, restart their daemons as well. + +kojid won't start or some builders won't connect +------------------------------------------------ + +In the event that some items are able to connect to koji while some are +not, please make sure that the database is not filled up on connections. +This is common if koji crashes and the db connections aren't properly +cleared. Upon restart many of the connections are full so koji cannot +reconnect. Clearing old connections is easy, guess about how long it the +new koji has been up and pick a number of minutes larger then that and +kill those queries. From db3 as postgres run:: + + echo "select procpid from pg_stat_activity where usename='koji' and now() - query_start \ + >= '00:40:00' order by query_start;" | psql koji | grep "^ " | xargs kill + +OOM (Out of Memory) Issues +-------------------------- + +Out of memory issues occur from time to time on the build machines. There +are a couple of options for correction. The first fix is to just restart +the machine and hope it was a one time thing. If the problem continues +please choose from one of the following options. + +Increase Memory +``````````````` + +The xen machines can have memory increased on their corresponding xen +hosts. At present this is the table: + ++----------+-------------+ +| xen3 | xenbuilder1 | ++----------+-------------+ +| xen4 | xenbuilder2 | ++----------+-------------+ +| disabled | xenbuilder3 | ++----------+-------------+ +| xen8 | xenbuilder4 | ++----------+-------------+ + +Edit ``/etc/xen/xenbuilder[1-4]`` and add more memory. + +Decrease weight +``````````````` + +Each builder has a weight as to how much work can be given to it. +Presently the only way to alter weight is actually changing the database +on db3:: + + $ sudo su - postgres + -bash-2.05b$ psql koji + koji=# select * from host limit 1; + id | user_id | name | arches | task_load | capacity | ready | enabled + ---+---------+------------------------+-----------+-----------+----------+-------+--------- + 6 | 130 | ppc3.fedora.redhat.com | ppc ppc64 | 1.5 | 4 | t | t + (1 row) + koji=# update host set capacity=2 where name='ppc3.fedora.redhat.com'; + +Simply update capacity to a lower number. + +Disk Space Issues +------------------ + +The builders use a lot of temporary storage. Failed builds also get left +on the builders, most should get cleaned but plague does not. The easiest +thing to do is remove some older cache dirs. + +Step one is to turn off both koji and plague:: + + /etc/init.d/plague-builder stop + /etc/init.d/kojid stop + +Next check to see what file system is full:: + + df -h + +.. important:: + If any one of the following directories is full, send an outage + notification as outlined in: [62]Infrastructure/OutageTemplate to the + fedora-infrastructure-list and fedora-devel-list, then contact Mike + McGrath + + - /mnt/koji + - /mnt/ntap-fedora1/scratch + - /pub/epel + - /pub/fedora + +Typically just / will be full. The next thing to do is determine if we +have any extremely large builds left on the builder. Typical locations +include /var/lib/mock and /mnt/build (/mnt/build actually is on the local +filesystem):: + + du -sh /var/lib/mock/* /mnt/build/* + +``/var/lib/mock/dist-f8-build-10443-1503`` + classic koji build +``/var/lib/mock/fedora-6-ppc-core-57cd31505683ef1afa533197e91608c5a2c52864`` + classic plague build + +If nothing jumps out immediately, just start deleting files older than one +week. Once enough space has been freed start koji and plague back up:: + + /etc/init.d/plague-builder start + /etc/init.d/kojid start + +Unmounting +---------- + +.. warning:: + Should there be mention of being sure filesystems in chroots + are unmounted before you delete the chroots? + + Res ipsa loquitur. + diff --git a/docs/sysadmin-guide/sops/koschei.rst b/docs/sysadmin-guide/sops/koschei.rst new file mode 100644 index 0000000..b7b5418 --- /dev/null +++ b/docs/sysadmin-guide/sops/koschei.rst @@ -0,0 +1,219 @@ +.. title: Koschei SOP +.. slug: infra-koschei +.. date: 2016-09-29 +.. taxonomy: Contributors/Infrastructure + +=========== +Koschei SOP +=========== + +Koschei is a continuous integration system for RPM packages. +Koschei runs package scratch builds after dependency change or +after time elapse and reports package buildability status to +interested parties. + +Production instance: https://apps.fedoraproject.org/koschei +Staging instance: https://apps.stg.fedoraproject.org/koschei + +Contents +-------- +1. Contact information +2. Deployment +3. Description +4. Configuration +5. Disk usage +6. Database +7. Managing koschei services +8. Suspespending koschei operation +9. Limiting Koji usage +10. Fedmsg notifications +11. Setting admin announcement +12. Adding package groups +13. Set package static priority + +Contact Information +------------------- +Owner + mizdebsk, msimacek +Contact + #fedora-admin +Location + Fedora Cloud +Purpose + continuous integration system + + +Deployment +---------- + sudo rbac-playbook groups/koschei-backend.yml + sudo rbac-playbook groups/koschei-web.yml + +Description +----------- +Koschei is deployed on two separate machines - koschei-backend and koschei-web + +Frontend (koschei-web) is a Flask WSGi application running with httpd. +It displays information to users and allows editing package groups and +changing priorities. + +Backend (koschei-backend) consists of multiple services: + +- koschei-watcher - listens to fedmsg events for complete builds and + changes build states in the database. Additionally listens to + repo-done events which are enqueued to be processed by + koschei-resolver + +- koschei-resolver - resolves package dependencies in given repo using + hawkey and compares them with previous iteration to get a dependency + diff. There are two types of resolutions: + + build resolution + resolves complete build in the repo in which it + was done on Koji. Produces the dependency differences visible in the + frontend. + new repo resolution + resolves all packages in newest repo available + in Koji. The output is a base for scheduling new builds. + +- koschei-scheduler - schedules new builds based on multiple criteria: + + dependency priority + dependency changes since last build valued by + their distance in the dependency graph. + manual and static priorities + set manually in the frontend. Manual + priority is reset after each build, static priority persists + time priority + time since last build (logarithmical formula) + +- koschei-polling - polls the same types of events as koschei-watcher + without reliance on fedmsg + + +Configuration +------------- +Koschei configuration is in ``/etc/koschei/config-backend.cfg`` and +``/etc/koschei/config-frontend.cfg``, and is merged with the default +configuration in ``/usr/share/koschei/config.cfg`` (the ones in etc +overrides the defaults in usr). Note the merge is recursive. The +configuration contains all configurable items for all Koschei services +and the frontend. The alterations to configuration that aren't +temporary should be done through ansible playbook. Configuration +changes have no effect on already running services -- they need to be +restarted, which happens automatically when using the playbook. + + +Disk usage +---------- +Koschei doesn't keep on disk anything that couldn't be recreated +easily - all important data is stored in PostgreSQL database, +configuration is managed by Ansible, code installed by RPM and so on. + +To speed up operation and reduce load on external servers, Koschei +caches some data obtained from services it integrates with. Most +notably, YUM repositories downloaded from Koji are kept in +``/var/cache/koschei/repodata``. Each repository takes about 100 MB +of disk space. Maximal number of repositories kept at time is +controlled by ``cache_l2_capacity`` parameter in +``config-backend.cfg`` (``config-backend.cfg.j2`` in Ansible). If +repodata cache starts to consume too much disk space, that value can +be decreased - after restart, koschei-resolver will remove least +recently used cache entries to respect configured cache capacity. + + +Database +-------- +Koschei needs to connect to a PostgreSQL database, other database +systems are not supported. Database connection is specified in the +configuration under the "database_config" key that can contain the +following keys: username, password, host, port, database. + +After an update of koschei, the database needs to be migrated to new +schema. This is handled using alembic:: + + alembic -c /usr/share/koschei/alembic.ini upgrade head + +The backend services need to be stopped during the migration. + + +Managing koschei services +------------------------- +Koschei services are systemd units managed through systemctl. They can +be started and stopped independently in any order. The frontend is run +using httpd. + + +Suspespending koschei operation +------------------------------- +For stopping builds from being scheduled, stopping the koschei-scheduler +service is enough. For planned Koji outages, it's recommended to stop +koschei-scheduler. It is not necessary, as koschei can recover +from Koji errors and network errors automatically, but when Koji +builders are stopped, it may cause unexpected build failures that would +be reported to users. Other services can be left running as they +automatically restart themselves on Koji and network errors. + + +Limiting Koji usage +------------------- +Koschei is by default limited to 30 concurrently running builds. This +limit can be changed in the configuration under +"koji_config"/"max_builds" key. There's also Koji load monitoring, that +prevents builds from being scheduled when Koji load is higher that +certain threshold. That should prevent scheduling builds during mass +rebuilds, so it's not necessary to stop scheduling during those. + + +Fedmsg notifications +-------------------- +Koschei optionally supports sending fedmsg notifications for package +state changes. The fedmsg dispatch can be turned on and off in the +configuration (key "fedmsg-publisher"/"enabled"). Koschei doesn't supply +configuration for fedmsg, it lets the library to load it's own (in +/etc/fedmsg.d/). + + +Setting admin announcement +-------------------------- +Koschei can display announcement in web UI. This is mostly useful to +inform users about outages or other problems. + +To set announcement, run as koschei user:: + + koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage" + +or:: + + koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild" + +To clear announcement, run as koschei user:: + + koschei-admin clear-notice + + +Adding package groups +--------------------- +Packages can be added to one or more group. Currently, only Koschei +admins can add new groups. + +To add new group named "mynewgroup", run as koschei user: + + koschei-admin add-group mynewgroup + +To add new group named "mynewgroup" and populate it with some +packages, run as koschei user: + + koschei-admin add-group mynewgroup pkg1 pkg2 pkg3 + + +Set package static priority +--------------------------- +Some packages are more or less important and can have higher or lower +priority. Any user can change manual priority, which is reset after +package is rebuilt. Admins can additionally set static priority, which +is not affected by package rebuilds. + +To set static priority of package "foo" to value "100", run as +koschei user:: + + koschei-admin set-priority --static foo 100 diff --git a/docs/sysadmin-guide/sops/layered-image-buildsys.rst b/docs/sysadmin-guide/sops/layered-image-buildsys.rst new file mode 100644 index 0000000..650cf3d --- /dev/null +++ b/docs/sysadmin-guide/sops/layered-image-buildsys.rst @@ -0,0 +1,283 @@ +.. title: Layered Image Build System +.. slug: layered-image-buildsys +.. date: 2016-12-15 +.. taxonomy: Contributors/Infrastructure + +========================== +Layered Image Build System +========================== + +The `Fedora Layered Image Build System`_, often referred to as `OSBS`_ +(OpenShift Build Service) as that is the upstream project that this is based on, +is used to build Layered Container Images in the Fedora Infrastructure via Koji. + + +Contents +======== + +1. Contact Information +2. Overview +3. Setup +4. Outage + + +Contact Information +=================== + +Owner + Adam Miller (maxamillion) + +Contact + #fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main, sysadmin-releng + +Location + osbs-control01, osbs-master01, osbs-node01, osbs-node02 + registry.fedoraproject.org, candidate-registry.fedoraproject.org + + osbs-control01.stg, osbs-master01.stg, osbs-node01.stg, osbs-node02.stg + registry.stg.fedoraproject.org, candidate-registry.stg.fedoraproject.org + + x86_64 koji buildvms + +Purpose + Layered Container Image Builds + + +Overview +======== + +The build system is setup such that Fedora Layered Image maintainers will submit +a build to Koji via the ``fedpkg container-build`` command a ``docker`` +namespace within `DistGit`_. This will trigger the build to be scheduled in +`OpenShift`_ via `osbs-client`_ tooling, this will create a custom +`OpenShift Build`_ which will use the pre-made buildroot `Docker`_ image that we +have created. The `Atomic Reactor`_ (``atomic-reactor``) utility will run within +the buildroot and prep the build container where the actual build action will +execute, it will also maintain uploading the `Content Generator`_ metadata back +to `Koji`_ and upload the built image to the candidate docker registry. This +will run on a host with iptables rules restricting access to the docker bridge, +this is how we will further limit the access of the buildroot to the outside +world verifying that all sources of information come from Fedora. + +Completed layered image builds are hosted in a candidate docker registry which +is then used to pull the image and perform tests with `Taskotron`_. The +taskotron tests are triggered by a `fedmsg`_ message that is emitted from +`Koji`_ once the build is complete. Once the test is complete, taskotron will +send fedmsg which is then caught by the `RelEng Automation`_ Engine that will +run the Automatic Release tasks in order to push the layered image into a stable +docker registry in the production space for end users to consume. + +For more information, please consult the `RelEng Architecture Document`_. + + +Setup +===== + +The Layered Image Build System setup is currently as follows (more detailed view +available in the `RelEng Architecture Document`_): + +:: + + === Layered Image Build System Overview === + + +--------------+ +-----------+ + | | | | + | koji hub +----+ | batcave | + | | | | | + +--------------+ | +----+------+ + | | + V | + +----------------+ V + | | +----------------+ + | koji builder | | +-----------+ + | | | osbs-control01 +--------+ | + +-+--------------+ | +-----+ | | + | +----------------+ | | | + | | | | + | | | | + | | | | + V | | | + +----------------+ | | | + | | | | | + | osbs-master01 +------------------------------+ [ansible] + | +-------+ | | | | + +----------------+ | | | | | + ^ | | | | | + | | | | | | + | V V | | | + | +-----------------+ +----------------+ | | | + | | | | | | | | + | | osbs-node01 | | osbs-node02 | | | | + | | | | | | | | + | +-----------------+ +----------------+ | | | + | ^ ^ | | | + | | | | | | + | | +-----------+ | | + | | | | + | +------------------------------------------+ | + | | + +-------------------------------------------------------------+ + + +Deployment +---------- +The osbs-control01 host is where the `ansible-ansbile-openshift-ansible`_ role +is called from the `osbs-cluster.yml`_ playbook in order to configure the +OpenShift Cluster where OSBS is deployed on top of. + + +Operation +--------- +Koji Hub will schedule the containerBuild on a koji builder via the +koji-containerbuild-hub plugin, the builder will then submit the build in +OpenShift via the koji-containerbuild-builder plugin which uses the osbs-client +python API that wraps the OpenShift API along with a custom OpenShift Build JSON +payload. + +The Build is then scheduled in OpenShift and it's logs are captured by the koji +plugins. Inside the buildroot, atomic-reactor will upload the built container +image as well as provide the metadata to koji's content generator. + + +Outage +====== + +If Koji is down, then builds can't be scheduled but repairing Koji is outside +the scope of this document. + +If either the candidate-registry.fedoraproject.org or registry.fedoraproject.org +Container Registries are unavailable, but repairing those is also outside the +scope of this document. + +OSBS Failures +------------- + +OpenShift Build System itself can have various types of failures that are known +about and the recovery procedures are listed below. + +Ran out of disk space +~~~~~~~~~~~~~~~~~~~~~ + +Docker uses a lot of disk space, and while the osbs-nodes have been alloted what +is considered to be ample disk space for builds (since they are automatically +cleaned up periodically) it is possible this will run out. + +To resolve this, run the following commands: + +:: + + # These command will clean up old/dead docker containers from old OpenShift + # Pods + + $ for i in $(sudo docker ps -a | awk '/Exited/ { print $1 }'); do sudo docker rm $i; done + + $ for i in $(sudo docker images -q -f 'dangling=true'); do sudo docker rmi $i; done + + + # This command should only be run on osbs-master01 (it won't work on the + # nodes) + # + # This command will clean up old builds and related artifacts in OpenShift + # that are older than 30 days (We can get more aggressive about this if + # necessary, the main reason these still exist is in the event we need to + # debug something. All build info we care about is stored in Koji.) + + $ oadm prune builds --orphans --keep-younger-than=720h0m0s --confirm + +A node is broken, how to remove it from the cluster? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a node is having an issue, the following command will effectively remove it +from the cluster temporarily. + +In this example, we are removing osbs-node01 + +:: + + $ oadm manage-node osbs-node01.phx2.fedoraproject.org --schedulable=true + + +Container Builds are unable to access resources on the network +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Sometimes the Container Builds will fail and the logs will show that the +buildroot is unable to access networked resources (docker registry, dnf repos, +etc). + +This is because of a bug in OpenShift v1.3.1 (current upstream release at the +time of this writing) where an OpenVSwitch flow is left behind when a Pod is +destroyed instead of the flow being deleted along with the Pod. + +Method to confirm the issue is unfortunately multi-step since it's not +a cluster-wide issue but isolated to the node experiencing the problem. + +First in the koji createContainer task there is a log file called +openshift-incremental.log and in there you will find a key:value in some JSON +output similar to the following: + +:: + + 'openshift_build_selflink': u'/oapi/v1/namespaces/default/builds/cockpit-f24-6`` + + +The last field of the value, in this example ``cockpit-f24-6`` is the OpenShift +build identifier. We need to ssh into ``osbs-master01`` and get information +about which node that ran on. + +:: + + # On osbs-master01 + # Note: the output won't be pretty, but it gives you the info you need + + $ sudo oc get build cockpit-f25-3 -o yaml | grep osbs-node + + +Once you know what machine you need, ssh into it and run the following: + +:: + + $ sudo docker run --rm -ti buildroot /bin/bash' + + # now attempt to run a curl command + + $ curl https://google.com + # This should get refused, but if this node is experiencing the networking + # issue then this command will hang and eventually time out + +How to fix: + +Reboot the affected node that's experiencing the issue, when the node comes back +up OpenShift will rebuild the flow tables on OpenVSwitch and things will be back +to normal. + +:: + + systemctl reboot + + + + + +.. CITATIONS/LINKS +.. _fedmsg: http://www.fedmsg.com/en/latest/ +.. _Koji: https://fedoraproject.org/wiki/Koji +.. _Docker: https://github.com/docker/docker/ +.. _OpenShift: https://www.openshift.org/ +.. _Taskotron: https://taskotron.fedoraproject.org/ +.. _docker-registry: https://docs.docker.com/registry/ +.. _RelEng Automation: https://pagure.io/releng-automation +.. _osbs-client: https://github.com/projectatomic/osbs-client +.. _docker-distribution: https://github.com/docker/distribution/ +.. _Atomic Reactor: https://github.com/projectatomic/atomic-reactor +.. _DistGit: + https://fedoraproject.org/wiki/Infrastructure/VersionControl/dist-git +.. _OpenShift Build: + https://docs.openshift.org/latest/dev_guide/builds.html +.. _Content Generator: + https://fedoraproject.org/wiki/Koji/ContentGenerators +.. _RelEng Architecture Document: + https://docs.pagure.org/releng/layered_image_build_service.html +.. _osbs-cluster: + https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/groups/osbs-cluster.yml +.. _ansible-ansible-openshift-ansible: + https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/ansible-ansible-openshift-ansible diff --git a/docs/sysadmin-guide/sops/linktracking.rst b/docs/sysadmin-guide/sops/linktracking.rst new file mode 100644 index 0000000..67470aa --- /dev/null +++ b/docs/sysadmin-guide/sops/linktracking.rst @@ -0,0 +1,79 @@ +.. title: Link Tracking SOP +.. slug: infra-link-tracking +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +============= +Link tracking +============= + +Using link tracking is [43]an easy way for us to find out how people are +getting to our download page. People might click over to our download page +from any of a number of areas, and knowing the relative usage of those +links can help us understand what materials we're producing are more +effective than others. + +Adding links +============ + +Each link should be constructed by adding ? to the URL, followed by a +short code that includes: + +* an indicator for the link source (such as the wiki release notes) +* an indicator for the Fedora release in specific (such as F15 for the + final, or F15a for the Alpha test release) + +So a link to get.fp.o from the one-page release notes would become +http://get.fedoraproject.org/?opF15. + +FAQ +=== +I want to copy a link to my status update for social networking, or my blog. + If you're posting a status update to identi.ca, for example, use + the link tracking code for status updates. Don't copy a link + straight from an announcement that includes link tracking from the + announcement. You can copy the link itself but remember to change + the portion after the ? to instead use the st code for status + updates and blogs, followed by the Fedora release version (such as + F16a, F16b, or F16), like this:: + + http://fedoraproject.org/get-prerelease?stF16a + +I want to point people to the announcement from my blog. Should I use the announcement link tracking code? + The actual URL link itself is the announcement URL. Add the link + tracking code for blogs, which would start with ?st and end with + the Fedora release version, like this:: + + http://fedoraproject.org/wiki/F16_release_announcement?stF16a + +The codes +========= + +.. note:: + Additions to this table are welcome. +=============================================== ========== +Link source Code +=============================================== ========== +Email announcements an +----------------------------------------------- ---------- +Wiki announcements wkan +----------------------------------------------- ---------- +Front page fp +----------------------------------------------- ---------- +Front page of wiki wkfp +----------------------------------------------- ---------- +The press release Red Hat makes rhpr +----------------------------------------------- ---------- +http://redhat.com/fedora rhf +----------------------------------------------- ---------- +Test phase release notes on wkrn +----------------------------------------------- ---------- +Official release notes rn +----------------------------------------------- ---------- +Official installation guide ig +----------------------------------------------- ---------- +One-page release notes op +----------------------------------------------- ---------- +Status links (blogs, social media) st +=============================================== ========== + diff --git a/docs/sysadmin-guide/sops/loopabull.rst b/docs/sysadmin-guide/sops/loopabull.rst new file mode 100644 index 0000000..4f8a12a --- /dev/null +++ b/docs/sysadmin-guide/sops/loopabull.rst @@ -0,0 +1,122 @@ +.. title: Loopabull +.. slug: loopabull +.. date: 2017-01-17 +.. taxonomy: Contributors/Infrastructure + + +.. ########################################################################## +.. NOTE: This document is currently under construction. The service described + herein is not yet in production. +.. ########################################################################## + + +========= +Loopabull +========= + +`Loopabull`_ is an event-driven `Ansible`_-based automation engine. This is used +for various tasks, originally slated for `Release Engineering Automation`_. + +Contents +======== + +1. Contact Information +2. Overview +3. Setup +4. Outage + + +Contact Information +=================== + +Owner + Adam Miller (maxamillion) + +Contact + #fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main, sysadmin-releng + +Location + + TBD + +Purpose + Event Driven Automation of tasks within the Fedora Infrastructure and Fedora + Release Engineering + + +Overview +======== + +The `loopabull`_ system is setup such that an event will take place within the +infrastructure and a `fedmsg`_ is sent, then loopabull will consume that +message, trigger an `Ansible`_ `playbook`_ that shares a name with the fedmsg +topic, and provide the payload of the fedmsg to the playbook as `extra +variables`_. + + +Setup +===== + +The setup is relatively simple, the Overview above describes it and a more +detailed version can be found in the `releng docs`. + +:: + + +-----------------+ +-------------------------------+ + | | | | + | fedmsg +------------>| Looper | + | | | (fedmsg handler plugin) | + | | | | + +-----------------+ +-------------------------------+ + | + | + +-------------------+ | + | | | + | | | + | Loopabull +<-------------+ + | (Event Loop) | + | | + +---------+---------+ + | + | + | + | + V + +----------+-----------+ + | | + | ansible-playbook | + | | + +----------------------+ + +Deployment +---------- + +TBD + + +Outage +====== + +In the event that loopabull isn't responding or isn't running playbooks as it +should be, the following scenarios should be approached. + +Network Interruption +-------------------- + +Sometimes if the network is interrupted, the loopabull service will hang because +the fedmsg listener will hold a dead socket open. The service simply needs to be +restarted at that point. + +:: + + systemctl restart loopabull.service + +.. CITATIONS/LINKS +.. _Ansible: https://www.ansible.com/ +.. _fedmsg: http://www.fedmsg.com/en/latest/ +.. _loopabull: https://github.com/maxamillion/loopabull +.. _playbook: http://docs.ansible.com/ansible/playbooks.html +.. _Release Engineering Automation: https://pagure.io/releng-automation +.. _releng docs: https://docs.pagure.org/releng/automation_engine.html +.. _extra variables: + https://github.com/ansible/ansible/blob/devel/docs/man/man1/ansible-playbook.1.asciidoc.in diff --git a/docs/sysadmin-guide/sops/mailman.rst b/docs/sysadmin-guide/sops/mailman.rst new file mode 100644 index 0000000..b171357 --- /dev/null +++ b/docs/sysadmin-guide/sops/mailman.rst @@ -0,0 +1,119 @@ +.. title: Mailman Infrastructure SOP +.. slug: infra-mailmain +.. date: 2016-10-07 +.. taxonomy: Contributors/Infrastructure + +========================== +Mailman Infrastructure SOP +========================== + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main, sysadmin-tools, sysadmin-hosted + +Location + phx2 + +Servers + mailman01, mailman02, mailman01.stg + +Purpose + Provides mailing list services. + +Description +=========== + +Mailing list services for Fedora projects are located on the +mailman01.phx2.fedoraproject.org server. + +Common Tasks +============ + +Creating a new mailing list +--------------------------- + +* Log into mailman01 +* ``sudo -u mailman mailman3 create @lists.fedora(project|hosted).org --owner @fedoraproject.org --notify`` + + .. note :: + Note that list names should make sense, and not contain the words 'fedora' + or 'list' - the fact that it has to do with Fedora and that it's a list + are both obvious from the domain of the email address. + + .. important:: + Please make sure to add a valid description to the newly + created list. (to avoid [no description available] on listinfo index) + +Removing content from archives +============================== + +We don't. + +It's not easy to remove content from the archives and it's generally +useless as well because the archives are often mirrored by third parties +as well as being in the INBOXs of all of the people on the mailing list at +that time. Here's an example message to send to someone who requests +removal of archived content:: + + Greetings, + + We're sorry to say that we don't remove content from the mailing list archives. + Doing so is a non-trivial amount of work and usually doesn't achieve anything + because the content has already been disseminated to a wide audience that we do + not control. The emails have gone out to all of the subscribers of the mailing + list at that time and also (for a great many of our lists) been copied by third + parties (for instance: http://markmail.org and http://gmane.org). + + Sorry we cannot help further, + + Mailing lists and their owners + +Checking Ownership +================== + +Are you in need of checking who owns a certain mailing list without having +to search around on list's frontpages? + +If yes, mailman have a nice tool that will help us gaining our result in a +few seconds: + +Get a full list of all the mailing lists hosted on the server: (either +fedoraproject.org or fedorahosted.org):: + + sudo /usr/lib/mailman/bin/list_admins -a + +See which lists are owned by example@example.com:: + + sudo /usr/lib/mailman/bin/list_admins -a | grep example@example.com + +Troubleshooting and Resolution +============================== + +List Administration +------------------- + +Specific users are marked as 'site admins' in the database. + +Please file a issue if you feel you need to have this access. + +Restart Procedure +----------------- + +If the server needs to be restarted mailman should come back on it's own. +Otherwise each service on it can be restarted:: + + sudo service mailman3 restart + sudo service postfix restart + +How to delete a mailing list +============================ + +Delete a list, but keep the archives:: + + sudo -u mailman mailman3 remove + diff --git a/docs/sysadmin-guide/sops/making-ssl-certificates.rst b/docs/sysadmin-guide/sops/making-ssl-certificates.rst new file mode 100644 index 0000000..6b39f52 --- /dev/null +++ b/docs/sysadmin-guide/sops/making-ssl-certificates.rst @@ -0,0 +1,57 @@ +.. title: Infrastructure SSL Certificate Creation SOP +.. slug: infra-ssl-create +.. date: 2012-07-17 +.. taxonomy: Contributors/Infrastructure + +============================ +SSL Certificate Creation SOP +============================ + +Every now and then you will need to create an SSL certificate for a +Fedora Service. + +Creating a CSR for a new server. +================================ + +Know your hostname, ie `lists.fedoraproject.org``:: + + export ssl_name= + + +Create the cert. 8192 does not work with various boxes so we use 4096 currently.:: + + openssl genrsa -out ${ssl_name}.pem 4096 + openssl req -new -key ${ssl_name}.pem -out $(ssl_name}.csr + + Country Name (2 letter code) [XX]:US + State or Province Name (full name) []:NM + Locality Name (eg, city) [Default City]:Raleigh + Organization Name (eg, company) [Default Company Ltd]:Red Hat + Organizational Unit Name (eg, section) []:Fedora Project + Common Name (eg, your name or your server's hostname) + []:lists.fedorahosted.org + Email Address []:admin@fedoraproject.org + + Please enter the following 'extra' attributes + to be sent with your certificate request + A challenge password []: + An optional company name []: + +send the CSR to the signing authority and wait for a cert. +place all three into private directory so that you can make certs in +the future. + +Creating a temporary self-signed certificate. +============================================= + +Repeat the steps above but add in the following:: + + openssl x509 -req -days 30 -in ${ssl_name}.csr -signkey ${ssl_name}.pem -out ${ssl_name}.cert + Signature ok + subject=/C=US/ST=NM/L=Raleigh/O=Red Hat/OU=Fedora + Project/CN=lists.fedorahosted.org/emailAddress=admin@fedoraproject.org + +Getting Private key + +We only want a self-signed certificate to be good for a short time so 30 +days sounds good. diff --git a/docs/sysadmin-guide/sops/massupgrade.rst b/docs/sysadmin-guide/sops/massupgrade.rst new file mode 100644 index 0000000..40f491a --- /dev/null +++ b/docs/sysadmin-guide/sops/massupgrade.rst @@ -0,0 +1,438 @@ +.. title: Mass Upgrade Infrastructure SOP +.. slug: infra-mass-upgrade +.. date: 2013-07-29 +.. taxonomy: Contributors/Infrastructure + +=============================== +Mass Upgrade Infrastructure SOP +=============================== + +Every once in a while, we need to apply mass upgrades to our servers for +various security and other upgrades. + +Contents +-------- + +1. Contact Information +2. Preparation +3. Staging +4. Special Considerations + + * Disable builders + * Post reboot action + * Schedule autoqa01 reboot + * Bastion01 and Bastion02 and openvpn server + * Special yum directives + +5. Update Leader +6. Group A reboots +7. Group B reboots +8. Group C reboots +9. Doing the upgrade +10. Doing the reboot +11. Aftermath + +Contact Information +------------------- + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin-main, + infrastructure@lists.fedoraproject.org, #fedora-noc +Location: + All over the world. +Servers: + all +Purpose: + Apply kernel/other upgrades to all of our servers + +Preparation +=========== + +1. Determine which host group you are going to be doing updates/reboots + on. + + Group "A" + servers that end users will see or note being down + and anything that depends on them. + Group "B" + servers that contributors will see or note being + down and anything that depends on them. + Group "C" + servers that infrastructure will notice are down, + or are redundent enough to reboot some with others taking the + load. + +2. Appoint an 'Update Leader' for the updates. +3. Follow the [61]Outage Infrastructure SOP and send advance notification + to the appropriate lists. Try to schedule the update at a time when + many admins are around to help/watch for problems and when impact for + the group affected is less. Do NOT do multiple groups on the same day + if possible. +4. Plan an order for rebooting the machines considering two factors: + + * Location of systems on the kvm or xen hosts. [You will normally + reboot all systems on a host together] + * Impact of systems going down on other services, operations and + users. Thus since the database servers and nfs servers are the + backbone of many other systems, they and systems that are on the + same xen boxes would be rebooted before other boxes. + +5. To aid in organizing a mass upgrade/reboot with many people helping, + it may help to create a checklist of machines in a gobby document. +6. Schedule downtime in nagios. +7. Make doubly sure that various app owners are aware of the reboots + +Staging +======= + Any updates that can be tested in staging or a pre-production environment + should be tested there first. Including new kernels, updates to core + database applications / libraries. Web applications, libraries, etc. + +Special Considerations +====================== + +While this may not be a complete list, here are some special things that +must be taken into account before rebooting certain systems: + +Disable builders +---------------- + +Before the following machines are rebooted, all koji builders should be +disabled and all running jobs allowed to complete: + + * db04 + * nfs01 + * kojipkgs02 + +Builders can be removed from koji, updated and re-added. Use:: + + koji disable-host NAME + + and + + koji enable-host NAME + +.. note:: you must be a koji admin + +Additionally, rel-eng and builder boxes may need a special version of rpm. +Make sure to check with rel-eng on any rpm upgrades for them. + +Post reboot action +------------------ + +The following machines require post-boot actions (mostly entering +passphrases). Make sure admins that have the passphrases are on hand for +the reboot: + + * backup-2 (LUKS passphrase on boot) + * sign-vault01 (NSS passphrase for sigul service) + * sign-bridge01 (NSS passphrase for sigul bridge service) + * serverbeach* (requires fixing firewall rules): + +Each serverbeach host needs 3 or 4 iptables rules added anytime it's +rebooted or libvirt is upgraded:: + + iptables -I FORWARD -o virbr0 -j ACCEPT + iptables -I FORWARD -i virbr0 -j ACCEPT + iptables -t nat -I POSTROUTING -s 192.168.122.3/32 -j SNAT --to-source 66.135.62.187 + +.. note:: The source is the internal guest ips, the to-source is the external ips that + map to that guest ip. If there are multiple guests, each one needs + the above SNAT rule inserted. + +Schedule autoqa01 reboot +------------------------ +There is currently an autoqa01.c host on cnode01. Check with QA folks +before rebooting this guest/host. + +Bastion01 and Bastion02 and openvpn server +------------------------------------------ + +We need one of the bastion machines to be up to provide openvpn for all +machines. Before rebooting bastion02, modify: +``manifests/nodes/bastion0*.phx2.fedoraproject.org.pp`` files to start openvpn +server on bastion01, wait for all clients to re-connect, reboot bastion02 +and then revert back to it as openvpn hub. + +Special yum directives +---------------------- + +Sometimes we will wish to exclude or otherwise modify the yum.conf on a +machine. For this purpose, all machines have an include, making them read +[62]http://infrastructure.fedoraproject.org/infra/hosts/FQHN/yum.conf.include +from the infrastructure repo. If you need to make such changes, add them +to the infrastructure repo before doing updates. + +Update Leader +============= + +Each update should have a Leader appointed. This person will be in charge +of doing any read-write operations, and delegating to others to do tasks. +If you aren't specficially asked by the Leader to reboot or change +something, please don't. The Leader will assign out machine groups to +reboot, or ask specific people to look at machines that didn't come back +up from reboot or aren't working right after reboot. It's important to +avoid multiple people operating on a single machine in a read-write manner +and interfering with changes. + +Group A reboots +=============== + +Group A machines are end user critical ones. Outages here should be +planned at least a week in advance and announced to the announce list. + +List of machines currently in A group (note: this is going to be +automated) + +These hosts are grouped based on the virt host they reside on: + +* torrent02.fedoraproject.org +* ibiblio02.fedoraproject.org + +* people03.fedoraproject.org +* ibiblio03.fedoraproject.org + +* collab01.fedoraproject.org +* serverbeach09.fedoraproject.org + +* db05.phx2.fedoraproject.org +* virthost03.phx2.fedoraproject.org + +* db01.phx2.fedoraproject.org +* virthost04.phx2.fedoraproject.org + +* db-fas01.phx2.fedoraproject.org +* proxy01.phx2.fedoraproject.org +* virthost05.phx2.fedoraproject.org + +* ask01.phx2.fedoraproject.org +* virthost06.phx2.fedoraproject.org + +These are the rest: + +* bapp02.phx2.fedoraproject.org +* bastion02.phx2.fedoraproject.org +* app05.fedoraproject.org +* backup02.fedoraproject.org +* bastion01.phx2.fedoraproject.org +* fas01.phx2.fedoraproject.org +* fas02.phx2.fedoraproject.org +* log02.phx2.fedoraproject.org +* memcached03.phx2.fedoraproject.org +* noc01.phx2.fedoraproject.org +* ns02.fedoraproject.org +* ns04.phx2.fedoraproject.org +* proxy04.fedoraproject.org +* smtp-mm03.fedoraproject.org +* batcave02.phx2.fedoraproject.org +* mm3test.fedoraproject.org +* packages02.phx2.fedoraproject.org + +Group B reboots +--------------- +This Group contains machines that contributors use. Announcements of +outages here should be at least a week in advance and sent to the +devel-announce list. + +These hosts are grouped based on the virt host they reside on: + +* db04.phx2.fedoraproject.org +* bvirthost01.phx2.fedoraproject.org + +* nfs01.phx2.fedoraproject.org +* bvirthost02.phx2.fedoraproject.org + +* pkgs01.phx2.fedoraproject.org +* bvirthost03.phx2.fedoraproject.org + +* kojipkgs02.phx2.fedoraproject.org +* bvirthost04.phx2.fedoraproject.org + +These are the rest: + +* koji04.phx2.fedoraproject.org +* releng03.phx2.fedoraproject.org +* releng04.phx2.fedoraproject.org + +Group C reboots +--------------- +Group C are machines that infrastructure uses, or can be rebooted in such +a way as to continue to provide services to others via multiple machines. +Outages here should be announced on the infrastructure list. + +Group C hosts that have proxy servers on them: + +* proxy02.fedoraproject.org +* ns05.fedoraproject.org +* hosted-lists01.fedoraproject.org +* internetx01.fedoraproject.org + +* app01.dev.fedoraproject.org +* darkserver01.dev.fedoraproject.org +* fakefas01.fedoraproject.org +* proxy06.fedoraproject.org +* osuosl01.fedoraproject.org + +* proxy07.fedoraproject.org +* bodhost01.fedoraproject.org + +* proxy03.fedoraproject.org +* smtp-mm02.fedoraproject.org +* tummy01.fedoraproject.org + +* app06.fedoraproject.org +* noc02.fedoraproject.org +* proxy05.fedoraproject.org +* smtp-mm01.fedoraproject.org +* telia01.fedoraproject.org + +* app08.fedoraproject.org +* proxy08.fedoraproject.org +* coloamer01.fedoraproject.org + + Other Group C hosts: + +* ask01.stg.phx2.fedoraproject.org +* app02.stg.phx2.fedoraproject.org +* proxy01.stg.phx2.fedoraproject.org +* releng01.stg.phx2.fedoraproject.org +* value01.stg.phx2.fedoraproject.org +* virthost13.phx2.fedoraproject.org + +* db-fas01.stg.phx2.fedoraproject.org +* pkgs01.stg.phx2.fedoraproject.org +* packages01.stg.phx2.fedoraproject.org +* virthost11.phx2.fedoraproject.org + +* app01.stg.phx2.fedoraproject.org +* koji01.stg.phx2.fedoraproject.org +* db02.stg.phx2.fedoraproject.org +* fas01.stg.phx2.fedoraproject.org +* virthost10.phx2.fedoraproject.org + + +* autoqa01.qa.fedoraproject.org +* autoqa-stg01.qa.fedoraproject.org +* bastion-comm01.qa.fedoraproject.org +* batcave-comm01.qa.fedoraproject.org +* virthost-comm01.qa.fedoraproject.org + +* compose-x86-01.phx2.fedoraproject.org + +* compose-x86-02.phx2.fedoraproject.org + +* download01.phx2.fedoraproject.org +* download02.phx2.fedoraproject.org +* download03.phx2.fedoraproject.org +* download04.phx2.fedoraproject.org +* download05.phx2.fedoraproject.org + +* download-rdu01.vpn.fedoraproject.org +* download-rdu02.vpn.fedoraproject.org +* download-rdu03.vpn.fedoraproject.org + +* fas03.phx2.fedoraproject.org +* secondary01.phx2.fedoraproject.org +* memcached04.phx2.fedoraproject.org +* virthost01.phx2.fedoraproject.org + +* app02.phx2.fedoraproject.org +* value03.phx2.fedoraproject.org +* virthost07.phx2.fedoraproject.org + +* app03.phx2.fedoraproject.org +* value04.phx2.fedoraproject.org +* ns03.phx2.fedoraproject.org +* darkserver01.phx2.fedoraproject.org +* virthost08.phx2.fedoraproject.org + +* app04.phx2.fedoraproject.org +* packages02.phx2.fedoraproject.org +* virthost09.phx2.fedoraproject.org + +* hosted03.fedoraproject.org +* serverbeach06.fedoraproject.org + +* hosted04.fedoraproject.org +* serverbeach07.fedoraproject.org + +* collab02.fedoraproject.org +* serverbeach08.fedoraproject.org + +* dhcp01.phx2.fedoraproject.org +* relepel01.phx2.fedoraproject.org +* sign-bridge02.phx2.fedoraproject.org +* koji03.phx2.fedoraproject.org +* bvirthost05.phx2.fedoraproject.org + +* (disable each builder in turn, update and reenable). +* ppc11.phx2.fedoraproject.org +* ppc12.phx2.fedoraproject.org + +* backup03 + +Doing the upgrade +================= + +If possible, system upgrades should be done in advance of the reboot (with +relevant testing of new packages on staging). To do the upgrades, make +sure that the Infrastructure RHEL repo is updated as necessary to pull in +the new packages ([63]Infrastructure Yum Repo SOP) + +On batcave01, as root run:: + + func-yum [--host=hostname] update + +..note: --host can be specified multiple times and takes wildcards. + +pinging people as necessary if you are unsure about any packages. + +Additionally you can see which machines still need rebooted with:: + + sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py | grep yes + +You can also see which machines would need a reboot if updates were all +applied with:: + + sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py after-updates | grep yes + +Doing the reboot +================ + +In the order determined above, reboots will usually be grouped by the +virtualization hosts that the servers are on. You can see the guests per +virt host on batcave01 in /var/log/virthost-lists.out + +To reboot sets of boxes based on which virthost they are we've written a special +script which facilitates it:: + + func-vhost-reboot virthost-fqdn + +ex:: + + sudo func-vhost-reboot virthost13.phx2.fedoraproject.org + +Aftermath +========= + +1. Make sure that everything's running fine +2. Reenable nagios notification as needed +3. Make sure to perform any manual post-boot setup (such as entering + passphrases for encrypted volumes) +4. Close outage ticket. + + +Non virthost reboots: +--------------------- + +If you need to reboot specific hosts and make sure they recover - consider using:: + + sudo func-host-reboot hostname hostname1 hostname2 ... + +If you want to reboot the hosts one at a time waiting for each to come back before rebooting the next +pass a -o to func-host-reboot. + + + diff --git a/docs/sysadmin-guide/sops/mastermirror.rst b/docs/sysadmin-guide/sops/mastermirror.rst new file mode 100644 index 0000000..fced0fc --- /dev/null +++ b/docs/sysadmin-guide/sops/mastermirror.rst @@ -0,0 +1,81 @@ +.. title: Master Mirror Infrastructure SOP +.. slug: infra-master-mirror +.. date: 2011-12-22 +.. taxonomy: Contributors/Infrastructure + +================================ +Master Mirror Infrastructure SOP +================================ + +Contents +======== + +1. Contact Information +2. PHX Master Mirror Setup +3. RDU I2 Master Mirror Setup +4. Raising Issues + + +Contact Information +=================== + +Owner: + Red Hat IS +Contact: + #fedora-admin, Red Hat ticket +Location: + PHX +Servers: + server[1-5].download.phx.redhat.com +Purpose: + Provides the master mirrors for Fedora distribution + + +PHX Master Mirror Setup +======================= + +The master mirrors are accessible as:: + + download1.fedora.redhat.com -> CNAME to download3.fedora.redhat.com + download2.fedora.redhat.com -> currently no DNS entry + download3.fedora.redhat.com -> 209.132.176.20 + download4.fedora.redhat.com -> 209.132.176.220 + download5.fedora.redhat.com -> 209.132.176.221 + +from the outside. download.fedora.redhat.com is a round robin to the above + IPs. + +The external IPs correspond to internal load balancer IPs that balance +between server[1-5]:: + + 209.132.176.20 -> 10.9.24.20 + 209.132.176.220 -> 10.9.24.220 + 209.132.176.221 -> 10.9.24.221 + +The load balancers then balance between the below Fedora IPs on the rsync +servers:: + + 10.8.24.21 (fedora1.download.phx.redhat.com) - server1.download.phx.redhat.com + 10.8.24.22 (fedora2.download.phx.redhat.com) - server2.download.phx.redhat.com + 10.8.24.23 (fedora3.download.phx.redhat.com) - server3.download.phx.redhat.com + 10.8.24.24 (fedora4.download.phx.redhat.com) - server4.download.phx.redhat.com + 10.8.24.25 (fedora5.download.phx.redhat.com) - server5.download.phx.redhat.com + + +RDU I2 Master Mirror Setup +========================== + +.. note:: This section is awaiting confirmation from RH - information here may + not be 100% accurate yet. + +download-i2.fedora.redhat.com (rhm-i2.redhat.com) is a round robin +between:: + + 204.85.14.3 - 10.11.45.3 + 204.85.14.5 - 10.11.45.5 + + +Raising Issues +============== + +Issues with any of this setup should be raised in a helpdesk ticket. diff --git a/docs/sysadmin-guide/sops/memcached.rst b/docs/sysadmin-guide/sops/memcached.rst new file mode 100644 index 0000000..91693a0 --- /dev/null +++ b/docs/sysadmin-guide/sops/memcached.rst @@ -0,0 +1,79 @@ +.. title: Memcached Infrastructure SOP +.. slug: infra-memcached +.. date: 2013-06-29 +.. taxonomy: Contributors/Infrastructure + +============================ +Memcached Infrastructure SOP +============================ + +Our memcached setup is currently only used for wiki sessions. With +mediawiki, sessions stored in files over NFS or in the DB are very slow. +Memcached is a non-blocking solution for our session storage. + +Contents +======== + +1. Contact Information +2. Checking Status +3. Flushing Memcached +4. Restarting Memcached +5. Configuring Memcached + +Contact Information +=================== +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main, sysadmin-web groups + +Location + PHX + +Servers + memcached03, memcached04 + +Purpose + Provide caching for Fedora web applications. + +Checking Status +=============== + +Our memcached instances are currently firewalled to only allow access from +wiki application servers. To check the status of an instance, use:: + + echo stats | nc memcached0{3,4} 11211 + +from an allowed host. + + +Flushing Memcached +================== +Sometimes, wrong contents get cached, and the cache should be flushed. +To do this, use:: + + echo flush_all | nc memcached0{3,4} 11211 + +from an allowed host. + + +Restarting Memcached +==================== +Note that restarting an memcached instance will drop all sessions stored +on that instance. As mediawiki uses hashing to distribute sessions across +multiple instances, restarting one out of two instances will result in +about half of the total sessions being dropped. + +To restart memcached:: + + sudo /etc/init.d/memcached restart + +Configuring Memcached +===================== +Memcached is currently setup as a role in the ansible git repo. The main +two tunables are the MAXCONN (the maximum number of concurrent +connections) and CACHESIZE (the amount memory to use for storage). These +variables can be set through $memcached_maxconn and $memcached_cachesize +in ansible. Additionally, other options (as described in the memcached +manpage) can be set via $memcached_options. diff --git a/docs/sysadmin-guide/sops/mirrorhiding.rst b/docs/sysadmin-guide/sops/mirrorhiding.rst new file mode 100644 index 0000000..9158591 --- /dev/null +++ b/docs/sysadmin-guide/sops/mirrorhiding.rst @@ -0,0 +1,45 @@ +.. title: Mirror Hiding Infrastructure SOP +.. slug: infra-mirror-hiding +.. date: 2011-08-23 +.. taxonomy: Contributors/Infrastructure + +================================ +Mirror hiding Infrastructure SOP +================================ + +At times, such as release day, there may be a conflict between Red Hat +trying to release content for RHEL, and Fedora trying to release Fedora. +One way to limit the pain to Red Hat on release day is to hide +download.fedora.redhat.com from the publiclist and mirrorlist redirector, +which will keep most people from downloading the content from Red Hat +directly. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main, sysadmin-web group +Location + Phoenix +Servers + app3, app4 +Purpose + Hide Public Mirrors from the publiclist / mirrorlist redirector + +Description +=========== +To hide a public mirror, so it doesn't appear on the publiclist or the +mirrorlist, simply go into the MirrorManager administrative web user +interface, at [45]https://admin.fedoraproject.org/mirrormanager. Fedora +sysadmins can see all Sites and Hosts. For each Site and Host, there is a +checkbox marked "private", which if set, will hide that Site (and all its +Hosts), or just that single Host, such that it won't appear on the public +lists. + +To make a private-marked mirror public, simply clear the "private" +checkbox again. + +This change takes effect at the top of each hour. + diff --git a/docs/sysadmin-guide/sops/mirrormanager-S3-EC2-netblocks.rst b/docs/sysadmin-guide/sops/mirrormanager-S3-EC2-netblocks.rst new file mode 100644 index 0000000..c3e2eef --- /dev/null +++ b/docs/sysadmin-guide/sops/mirrormanager-S3-EC2-netblocks.rst @@ -0,0 +1,28 @@ +.. title: Infrastructure AWS Mirroring SOP +.. slug: infra-aws-mirror +.. date: 2014-12-05 +.. taxonomy: Contributors/Infrastructure + +=========== +AWS Mirrors +=========== + +Fedora Infrastructure mirrors EPEL content (/pub/epel) into Amazon +Simple Storage Service (S3) in multiple regions, to make it fast for +EC2 CentOS/RHEL users to get EPEL content from an effectively local +mirror. + +For this to work, we have private mirror entries in MirrorManager, one +for each region, which include the EC2 netblocks for that region. + +Amazon updates their list of network blocks roughly monthly, as they +consume additional address space. Therefore, we need to make the +corresponding changes into MirrorManager's entries for same. + +Amazon publishes their list of network blocks on their forum site, +with the subject "Announcement: Amazon EC2 Public IP Ranges". As of +November 2014, this was +https://forums.aws.amazon.com/ann.jspa?annID=1701 + +As of November 19, 2014, Amazon publishes it as a JSON file we can download. +http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html diff --git a/docs/sysadmin-guide/sops/mirrormanager.rst b/docs/sysadmin-guide/sops/mirrormanager.rst new file mode 100644 index 0000000..8872236 --- /dev/null +++ b/docs/sysadmin-guide/sops/mirrormanager.rst @@ -0,0 +1,104 @@ +.. title: MirrorManager Infrastucture SOP +.. slug: infra-mirrormanager +.. date: 2012-04-24 +.. taxonomy: Contributors/Infrastructure + +================================ +MirrorManager Infrastructure SOP +================================ + +Mirrormanager manages mirrors for fedora distribution. + +Contents + +1. Contact Information +2. Description + + 1. Release Preparation + +3. Troubleshooting and Resolution + + 1. Regenerating the Publiclist + 2. Hung admin.fedoraproject.org/mirrormanager + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main, sysadmin-web + +Location + Phoenix + +Servers + app01, app02, app03, app04, app05, app06, bapp02 + +Purpose + Manage mirrors for Fedora distribution + +Description +=========== + +Mirrormanager handles our mirroring system. It keeps track of lists of +valid mirrors and handles handing out metalink urls to end users to download +packages from. + +On the backend app server (bapp01 or bapp02), mirrormanager runs crawlers to +check mirror contents, a job to update the public lists and other +housekeeping jobs. This data is then synced to the app* servers to serve to +end users. + +Release Preparation +=================== + +MirrorManager should automatically detect the new release version, and +will create a new Version() object in the database. This is visible on the +Version page in the web UI, and on mirrors.fp.o. + +If the versioning scheme changes, it's possible this will fail. If so, +contact the Mirror Wrangler. + +Troubleshooting and Resolution +============================== + +Regenerating the Publiclist +--------------------------- + +On bapp02:: + + sudo -u mirrormanager -i + +then:: + + /usr/share/mirrormanager/server/update-mirrorlist-server > /tmp/mirrormanager-mirrorlist.log 2>&1 && \ + /usr/share/mirrormanager/mm_sync_out + +To make this take effect immediately, you may need to remove the cache on +the proxies:: + + # As root on proxy0[1-7] + rm -rf /srv/cache/mod_cache/* + +Hung admin.fedoraproject.org/mirrormanager +------------------------------------------ + +This generally happens when an app server loses connection to db2. + +1. on bapp02 and app[1-6], su up, and restart apache. + +2. on bapp02, if crawlers and update-master-directory-list are likewise + hung, kill them too. You may need to delete stale + ``/var/lock/mirrormanager/*`` lockfiles as well. + +Restarting mirrorlist_server +---------------------------- + +mirrorlist_server on the app* machines is managed via supervisord. If you want +to restart it, use:: + + supervisorctl restart + + diff --git a/docs/sysadmin-guide/sops/mote.rst b/docs/sysadmin-guide/sops/mote.rst new file mode 100644 index 0000000..7914c5b --- /dev/null +++ b/docs/sysadmin-guide/sops/mote.rst @@ -0,0 +1,107 @@ +.. title: mote SOP +.. slug: infra-mote +.. date: 2015-06-13 +.. taxonomy: Contributors/Infrastructure + +=========== +mote SOP +=========== + +mote is a MeetBot log wrangler, providing +an user-friendly interface for viewing logs produced +by Fedora's IRC meetings. + +Production instance: http://meetbot.fedoraproject.org/ +Staging instance: http://meetbot.stg.fedoraproject.org + +Contents +-------- +1. Contact information +2. Deployment +3. Description +4. Configuration +5. Database +6. Managing mote +7. Suspespending mote operation +8. Changing mote's name and category definitions + +Contact Information +------------------- +Owner + cydrobolt +Contact + #fedora-admin +Location + Fedora Infrastructure +Purpose + IRC meeting coordination + + +Deployment +---------- +If you have access to rbac-playbook:: + + sudo rbac-playbook groups/value.yml + +Forcing Reload +-------------- + +There is a playbook that can force mote to update its cache +in case it gets stuck somehow:: + + sudo rbac-playbook manual/rebuild/mote.yml + +Doing Upgrades +-------------- + +Put a new copy of the mote rpm in the infra repo and run:: + + sudo rbac-playbook manual/upgrade/mote.yml + +Description +----------- +mote is a Python webapp running on Flask with mod_wsgi. +It can be used to view past logs, browse meeting minutes, or +glean other information relevant to Fedora's IRC meetings. +It employs a JSON file store cache, in addition to a +memcached store which is currently not in use with +Fedora infrastructure. + + +Configuration +------------- +mote configuration is located in ``/etc/mote/config.py``. The +configuration contains all configurable items for all mote services. +Alterations to configuration that aren't temporary should be done through ansible playbooks. +Configuration changes have no effect on running services -- they +need to be restarted, which can be done using the playbook. + + +Database +-------- +mote does not currently utilise any databases, although it uses a +file store in Fedora Infrastructure and has an optional memcached store +which is currently unused. + +Managing mote +------------------------- +mote is ran using mod_wsgi and httpd, hence, you must +manage the ``httpd`` service to change mote's status. + +Suspespending mote operation +------------------------------- +mote can be stopped by stopping the ``httpd`` service. +:: + service httpd stop + +Changing mote's name and category definitions +------------------------------------------------ +mote uses a set of JSON name and category definitions to provide +friendly names, aliases, and listings on its interface. +These definitions can be located in mote's GitHub repository, +and need to be pulled into ansible in order to be deployed. + +These files are ``name_mappings.json`` and ``category_mappings.json``. +To deploy an update to these definitions, place the updated name and +category mapping files in ``ansible/roles/mote/templates``. Run +the playbook in order to deploy your changes. diff --git a/docs/sysadmin-guide/sops/nagios.rst b/docs/sysadmin-guide/sops/nagios.rst new file mode 100644 index 0000000..7afebb0 --- /dev/null +++ b/docs/sysadmin-guide/sops/nagios.rst @@ -0,0 +1,94 @@ +.. title: Infrastructure Nagios SOP +.. slug: infra-nagios +.. date: 2012-07-09 +.. taxonomy: Contributors/Infrastructure + +============================ +Fedora Infrastructure Nagios +============================ + +Contact Information +=================== + +Owner + sysadmin-main, sysadmin-noc +Contact + #fedora-admin, #fedora-noc +Location + Anywhere +Servers + noc01, noc02, noc01.stg, batcave01 +Purpose + This SOP is to describe nagios configurations + +Configuration +============= + +Fedora Project runs two nagios instances, nagios (noc01) +https://admin.fedoraproject.org/nagios and nagios-external (noc02) +http://admin.fedoraproject.org/nagios-external, you must be in +the 'sysadmin' group to access them. + +Apart from the two production instances, we are currently running a staging +instance for testing-purposes available through SSH at noc01.stg. + +nagios (noc01) + The nagios configuration on noc01 should only monitor general host statistics + ansible status, uptime, apache status (up/down), SSH etc. + + The configurations are found in nagios ansible module: ansible/roles/nagios + +nagios-external (noc02) + The nagios configuration on noc02 is located outside of our main datacenter + and should monitor our user websites/applications (fedoraproject.org, FAS, + PackageDB, Bodhi/Updates). + + The configurations are found in nagios ansible role: roles/nagios + + +.. note:: + Production and staging instances through SSH: + Please make sure you are into 'sysadmin' and 'sysadmin-noc' FAS groups + before trying to access these hosts. + + See SSH Access SOP + +NRPE +---- + +We are currently using NRPE to execute remote Nagios plugins on any host of +our network. + +A great guide about it and its usage mixed up with some nice images about +its structure can be found at: +http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf + +Understanding the Messages +========================== + +General: +-------- + +Nagios notifications are generally easy to read, and follow this consistent +format:: + + ** PROBLEM/ACKNOWLEDGEMENT/RECOVERY alert - hostname/Check is WARNING/CRITICAL/OK ** + ** HOST DOWN/UP alert - hostname ** + +Reading the message will provide extra information on what is wrong. + +Disk Space Warning/Critical: +---------------------------- + +Disk space warnings normally include the following information:: + + DISK WARNING/CRITICAL/OK - free space: mountpoint freespace(MB) (freespace(%) inode=freeinodes(%)): + +A message stating "(1% inode=99%)" means that the diskspace is critical not +the inode usage and is a sign that more diskspace is required. + +Further Reading +--------------- + +* Ansible SOP +* Outages SOP diff --git a/docs/sysadmin-guide/sops/netapp.rst b/docs/sysadmin-guide/sops/netapp.rst new file mode 100644 index 0000000..e83c7ef --- /dev/null +++ b/docs/sysadmin-guide/sops/netapp.rst @@ -0,0 +1,143 @@ +.. title: Infrastructure Netapp SOP +.. slug: infra-netapp +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +========================= +Netapp Infrastructure SOP +========================= + +Provides primary mirrors and additional storage in PHX2 + +Contents +======== + +1. Contact Information +2. Description +3. Public Mirrors + + 1. Snapshots + +4. PHX NFS Storage + + 1. Access + 2. Snapshots + +5. iscsi + + 1. Updating LVM + 2. Mounting ISCSI + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main, releng + +Location + Phoenix, Tampa Bay, Raleigh + +Servers + batcave01, virt servers, application servers, builders, releng boxes + +Purpose + Provides primary mirrors and additional storage in PHX2 + +Description +=========== + +At present we have three netapps in our infrastructure. One in TPA, RDU +and PHX. For purposes of visualization its easiest to think of us as +having 4 netapps, 1 TPA, 1 RDU and 1 PHX for public mirrors. And an +additional 1 in PHX used for additional storage not related to the public +mirrors. + +Public Mirrors +============== + +The netapps are our primary public mirrors. The canonical location for the +mirrors is currently in PHX. From there it gets synced to RDU and TPA. + +Snapshots +--------- + +Snapshots on the PHX netapp are taken hourly. Unfortunately the way it is +setup only Red Hat employees can access this mirror (this is scheduled to +change when PHX becomes the canonical location but that will take time to +setup and deploy). The snapshots are available, for example, on wallace in:: + + /var/ftp/download.fedora.redhat.com/.snapshot/hourly.0 + +PHX NFS Storage +=============== + +There is a great deal of storage in PHX over NFS from the netapp there. +This storage includes the public mirror. The majority of this storage is +koji however there are a few gig worth of storage that goes to wiki +attachments and other storage needs we have in PHX. + +You can access all of the nfs share shares at:: + + batcave01:/mnt/fedora + +or:: + + ntap-fedora-a.storage.phx2.redhat.com:/vol/fedora/ + +Access +-------- + The netapp is provided by RHIS and as a result they also control access. + Access is controlled by IP mostly and some machines have root squashed. + Worst case scenario if batcave01 is not accessible, just bring another box + up under its IP address and use that for an emergency. + +Snapshots +--------- + +There are hourly and nightly snapshots on the netapp. They are available in:: + + batcave01:/mnt/fedora/.snapshot + +iscsi +===== + +We have iscsi deployed in a number of locations in our infrastructure for +xen machines. To get a list of what xen machines are deployed with iscsi, +just run lvs:: + + lvs /dev/xenGuests + +Live migration is possible though not fully supported at this time. Please +shut a xen machine down and bring it up on another host. Memory is the +main issue here. + +Updating LVM +------------- + +iscsi is mounted all over the place and if one xen machine creates a +logical volume the other xen machines will have to pick up those changes. +To do this run:: + + pvscan + vgscan + lvscan + vgchange -a y + +Mounting ISCSI +-------------- + +On reboots sometimes the iscsi share is not remounted. This should be +automated in the future but for now run:: + + iscsiadm -m discovery -tst -p ntap-fedora-b.storage.phx2.redhat.com:3260 + sleep 1 + iscsiadm -m node -T iqn.1992-08.com.netapp:sn.118047036 -p 10.5.88.21:3260 -l + sleep 1 + pvscan + vgscan + lvscan + vgchange -a y + diff --git a/docs/sysadmin-guide/sops/new-hosts.rst b/docs/sysadmin-guide/sops/new-hosts.rst new file mode 100644 index 0000000..66ca64b --- /dev/null +++ b/docs/sysadmin-guide/sops/new-hosts.rst @@ -0,0 +1,296 @@ +.. title: Infrastucture DNS Host Addition SOP +.. slug: infra-dns-add +.. date: 2014-05-22 +.. taxonomy: Contributors/Infrastructure + +===================== +DNS Host Addition SOP +===================== + +You should be able to follow these steps in order to create a new set of +hosts in infrastructure. + +Walkthrough +=========== + +Get a DNS repo checkout on batcave01 +------------------------------------ +:: + + git clone /git/dns + cd dns + +An example always helps, so you can use git grep for something that has +been recently added to the data center/network that you want:: + + git grep badges-web01 + built/126.5.10.in-addr.arpa:69 IN PTR badges-web01.stg.phx2.fedoraproject.org. + [...lots of other stuff in built/ ignore these as they'll be generated later...] + master/126.5.10.in-addr.arpa:69 IN PTR badges-web01.stg.phx2.fedoraproject.org. + master/126.5.10.in-addr.arpa:101 IN PTR badges-web01.phx2.fedoraproject.org. + master/126.5.10.in-addr.arpa:102 IN PTR badges-web02.phx2.fedoraproject.org. + master/168.192.in-addr.arpa:109.1 IN PTR badges-web01.vpn.fedoraproject.org + master/168.192.in-addr.arpa:110.1 IN PTR badges-web02.vpn.fedoraproject.org + master/phx2.fedoraproject.org:badges-web01.stg IN A 10.5.126.69 + master/phx2.fedoraproject.org:badges-web01 IN A 10.5.126.101 + master/phx2.fedoraproject.org:badges-web02 IN A 10.5.126.102 + master/vpn.fedoraproject.org:badges-web01 IN A 192.168.1.109 + master/vpn.fedoraproject.org:badges-web02 IN A 192.168.1.110 + +So those are the files we need to edit. In the above example, two of +those files are for the host on the PHX network. The other two are for +the host to be able to talk over the VPN. Although the VPN is not +always needed, the common case is that the host will need it. (If any +clients *need to connect to it via the proxy servers* or it is not +hosted in PHX2 it will need a VPN connection). An common exception is +here the staging environment: since we only have one proxy server in +staging and it is in PHX2, a VPN connection is not typically needed for +staging hosts. + +Edit the zone file for the reverse lookup first (the \*in-addr.arpa file) +and find ips to use. The ips will be listed with a domain name of +"unused." If you're configuring a web application server, you probably +want two hosts for stg and at least two for production. Two in +production means that we don't need downtime for reboots and updates. +Two in stg means that we'll be less likely to encounter problems related +to having multiple web application servers when we take a change tested +in stg into production:: + + -105 IN PTR unused. + -106 IN PTR unused. + -107 IN PTR unused. + -108 IN PTR unused. + +105 IN PTR elections01.stg.phx2.fedoraproject.org. + +106 IN PTR elections02.stg.phx2.fedoraproject.org. + +107 IN PTR elections01.phx2.fedoraproject.org. + +108 IN PTR elections02.phx2.fedoraproject.org. + +Edit the forward domain (phx2.fedoraproject.org in our example) next:: + + elections01.stg IN A 10.5.126.105 + elections02.stg IN A 10.5.126.106 + elections01 IN A 10.5.126.107 + elections02 IN A 10.5.126.108 + +Repeat these two steps if you need to make them available on the VPN. +Note: if your stg hosts are in PHX2, you don't need to configure VPM for +them as all our stg proxy servers are in PHX2. + +Also remember to update the Serial at the top of all zone files. + +Once the files are edited, you need to run a script to build the zones. +But first, commit the changes you just made to the "source":: + + git add . + git commit -a -m 'Added staging and production elections hosts.' + +Once that is committed, you need to run a script to build the zones and +then push them to the dns servers.:: + + ./do-domains # This builds the files + git add . + git commit -a -m 'done build' + git push + + $ sudo -i ansible ns\* -a '/usr/local/bin/update-dns' # This tells the dns servers to load the new files + +Make certs +========== + +WARNING: If you already had a clone of private, make VERY sure to do a +git pull first! It's quite likely somebody else added a new host without +you noticing it, and you cannot merge the keys repos manually. (seriously, +don't: the index and serial files just wouldn't match up with the certificate, +and you would revoke the wrong certificate upon revocation). + + + +When doing 2 factor auth for sudo, the hosts that we connect from need +to have valid SSL Certs. These are currently stored in the private repo:: + + git clone /git/ansible-private && chmod 0700 ansible-private + cd ansible-private/files/2fa-certs + . ./vars + ./build-and-sign-key $FQDN # ex: elections01.stg.phx2.fedoraproject.org + +The $FQDN should be the phx2 domain name if it's in phx2, vpn if not in +phx2, and if it has no vpn and is not in phx2 we should add it to the +vpn.:: + + git add . + git commit -a + git push + + +NOTE: Make sure to re-run vars from the vpn repo. If you forget to do that, +You will just (try to) generate a second pair of 2fa certs, since the +./vars script create an environment var to the root key directory, which +is different. + +Servers that are on the VPN also need certs for that. These are also stored +in the private repo:: + + cd ansible-private/files/vpn/openvpn + . ./vars + ./build-and-sign-key $FQDN # ex: elections01.phx2.fedoraproject.org + ./build-and-sign-key $FQDN # ex: elections02.phx2.fedoraproject.org + +The $FQDN should be the phx2 domain name if it's in phx2, and just +fedoraproject.org if it's not in PHX2 (note that there is never .vpn +in the FQDN in the openvpn keys). Now commit and push.:: + + git add . + git commit -a + git push + + +ansible +======= +:: + + git clone /git/ansible + cd ansible + +To see an example:: + + git grep badges-web01 (example) + find . -name badges-web01\* + find . -name badges-web'\'*' + +inventory +--------- + +The ansible inventory file lists all the hosts that ansible knows about +and also allows you to create sets of hosts that you can refer to via a +group name. For a typical web application server set of hosts we'd +create things like this:: + + [elections] + elections01.phx2.fedoraproject.org + elections02.phx2.fedoraproject.org + + [elections-stg] + elections01.stg.phx2.fedoraproject.org + elections02.stg.phx2.fedoraproject.org + + [... find the staging group and add there: ...] + + [staging] + db-fas01.stg.phx2.fedoraproject.org + elections01.stg.phx2.fedoraproject.org + electionst02.stg.phx2.fedoraproject.org + +The hosts should use their fully qualified domain names here. The rules +are slightly different than for 2fa certs. If the host is in PHX2, use +the .phx2.fedoraproject.org domain name. If they aren't in PHX2, then +they usually just have .fedoraproject.org as their domain name. (If in +doubt about a not-in-PHX2 host, just ask). + + +VPN config +---------- + +If the machine is in VPN, create a file in ansible at +roles/openvpn/server/files/ccd/$FQDN with contents like: + + ifconfig-push 192.168.1.X 192.168.0.X + +Where X is the last octet of the DNS IP address assigned to the host, +so for example for elections01.phx2.fedoraproject.org that would be: + + ifconfig-push 192.168.1.44 192.168.0.44 + + +Work in progress +================ +From here to the end of file is still being worked on + +host_vars and group_vars +------------------------ + +ansible consults files in inventory/group_vars and inventory/host_vars to set parameters that can be used in templates and playbooks. You may need to edit these + +It's usually easy to copy the host_vars and group_vars from an existing host that's similar to the one you are working on and then modify a few names to make it work. For instance, for a web application server:: + + cd ~/ansible/inventory/group_vars + cp badges-web elections + +Change the following:: + + - fas_client_groups: sysadmin-noc,sysadmin-badges + + fas_client_groups: sysadmin-noc,sysadmin-web + +(You can change disk size, mem_size, number of cpus, and ports too if you need them). + +Some things will definitely need to be defined differently for each host in a +group -- notably, ip_address. You should use the ip_address you claimed in +the dns repo:: + + cd ~/ansible/inventory/host_vars + cp badges-web01.stg.phx2.fedoraproject.org elections01.stg.phx2.fedoraproject.org + + +The host will need vmhost declaration. There is a script in +``ansible/scripts/vhost-info`` that will report how much free memory and how many +free cpus each vmhost has. You can use that to inform your decision. +By convention, staging hosts go on virthost12. + +Each vmhost has a different volume group. To figure out what volume group that is, +execute the following command on the virthost.:: + + vgdisplay + +You mant want to run "lsblk" to check that the volume group you expect is the one +actually used for virtual guests. + + +.. note:: + | 19:16:01 3. add ./inventory/host_vars/FQDN host_vars for the new host. + | 19:16:56 that will have in it ip addresses, dns resolv.conf, ks url/repo, volume group to make the host lv in, etc etc. + | 19:17:10 4. add any needed vars to inventory/group_vars/ for the group + | 19:17:33 this has memory size, lvm size, cpus, etc + | 19:17:45 5. add tasks/virt_instance_create.yml task to top of group/host playbook + | 19:18:10 6. run the playbook and it will go to the virthost you set, create the lv, guest, install it, wait for it to come up, then continue configuring it. + +mailman.yml + copy it from another file. + +:: + + ./ans-vhost-freemem --hosts=virtost\* + + +group vars + +- vmhost (of the host that will host the VM) +- kickstart info (url of the kickstart itself and the repo) +- datacenter (although most likely won't change) + +The host playbook is rather basic + +- Change the name +- Most things won't change much + +:: + + ansible-playbook /srv/web/infra/ansible/infra/ansible/playbooks/grous/mailman.yml + +Adding a new proxy or webserver +=============================== + +When adding a new web server other files must be edited by hand +currently until templates replace them. These files cover getting httpd +logs from the server onto log01 so that log analysis can be done. + + roles/base/files/syncHttpLogs.sh + roles/epylog/files/merged/modules.d/rsyncd.conf + roles/hosts/files/staging-hosts + roles/mediawiki123/templates/LocalSettings.php.fp.j2 + +There are also nagios files which will need to be edited but that should +be done following the nagios document. + +References +========== + +* The making a new instance section of: http://meetbot.fedoraproject.org/meetbot/fedora-meeting-1/2013-07-17/infrastructure-ansible-meetup.2013-07-17-19.00.html diff --git a/docs/sysadmin-guide/sops/nonhumanaccounts.rst b/docs/sysadmin-guide/sops/nonhumanaccounts.rst new file mode 100644 index 0000000..338ead3 --- /dev/null +++ b/docs/sysadmin-guide/sops/nonhumanaccounts.rst @@ -0,0 +1,156 @@ +.. title: Non-human Accounts Infrastructure SOP +.. slug: infra-nonhuman-accounts +.. date: 2015-03-23 +.. taxonomy: Contributors/Infrastructure + +===================================== +Non-human Accounts Infrastructure SOP +===================================== + +We have many non-human accounts for various services, used by our web +applications and certain automated scripts. + +Contents +======== + +1. Contact Information +2. FAS Accounts +3. Bugzilla Accounts +4. PackageDB Owners +5. Koji Accounts + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin +Persons: + sysadmin-main +Purpose: + Provide Non-human accounts to our various services + +FAS Accounts +============ + +A FAS account should be created when a script or application needs... + +* to query FAS information +* filesystem privileges associated with a group in FAS +* bugzilla privileges associated with the "fedorabugs" group. + +Be sure to check if Infrastructure already has a general-purpose account +that can be used before creating a new one. + +Creating a FAS account +---------------------- + +1. Go through the normal user creation process at + [50]https://admin.fedoraproject.org/accounts/ + + 1. Set the name to: (naming convention here) + 2. Set the email to the contact email for the account (this may need + to be done manually if the contact email is an @fedoraproject.org + address) + +2. Have a FAS admin set the account status to "bot" and set its UID below + 10000. Make sure to check that this does not break any group + references or file ownerships first. + + * On db-fas01, using ``$ sudo -u postgres psql fas2`` + + - Set it to a bot account so its not inactivated:: + + => UPDATE people SET status='bot' WHERE username='username'; + + - delete references to the current uid:: + + => delete from visit_identity where user_id in (select id from + people where username = 'username'); + + - Find the last used id in the range we use for bots:: + + => select id, username from people where id < 10000 order by id; + + - Set the account to use the newid This should be one more than + the largest id returned by the previous query:: + + => UPDATE people SET id=NEWID WHERE username='username'; + +3. Get the account into any necessary groups for permissions that it may + need. Common ones include: + + * Wiki editing: cla_done + * Access to SSH keys for third party users: thirdparty + * Access to SSH keys and password hashes for _internal_ fasClient + runs: fas-systems + +4. Document this account at: + https://fedoraproject.org/wiki/PackageDB_admin_requests#Pseudo-users_and_Groups_for_SIGs + + +Alternative +----------- + +This can also be achieve using SQL statements directly: + + - Find the last used id in the range we use for bots:: + + => select id, username from people where id < 10000 order by id; + + - Insert the new user:: + + => insert into people (id,username,human_name,password,email,status) + values (id, 'name','small description', 'something', + 'contact email', 'bot'); + + - Find your own user id:: + + => select id, username from people where username='your username'; + + - Find the id of the most used groups:: + + => select id, name from groups where name + in ('cla_done', 'packager', 'fedorabugs'); + + - Add the groups required:: + + => insert into person_roles(person_id, group_id, role_type, sponsor_id) + values (new_user_id, group_id, 'user', your_own_user_id); + +The final steps remains the same though: document this account at: + https://fedoraproject.org/wiki/PackageDB_admin_requests#Pseudo-users_and_Groups_for_SIGs + + +Bugzilla Accounts +================= + +A Bugzilla account should be created when a script or application needs... + +* to query or file Fedora bugs automatically + +Please make sure to coordinate with the QA and Bug Triaging teams if the +script or application involves making mass changes to bugs. + +If a bugzilla account needs "fedorabugs" permissions, follow the above +steps for a FAS Account first, then follow these instructions with the +email address you entered above. If the bugzilla account will not need +"fedorabugs" permissions but will still require an @fedoraproject.org +email, create an alias for that account first. + +1. Create a bugzilla account as normal at + [51]https://bugzilla.redhat.com/, using proper contact email for the + account. +2. Document this account at (insert location here) + +PackageDB Owners +================ + +Tie together FAS account and Bugzilla account info here + +Koji Accounts +============= + +TODO + diff --git a/docs/sysadmin-guide/sops/nuancier.rst b/docs/sysadmin-guide/sops/nuancier.rst new file mode 100644 index 0000000..7173eba --- /dev/null +++ b/docs/sysadmin-guide/sops/nuancier.rst @@ -0,0 +1,159 @@ +.. title: Nuancier SOP +.. slug: infra-nuancier +.. date: 2016-03-11 +.. taxonomy: Contributors/Infrastructure + +============= +Nuancier SOP +============= + +Nuancier is the web application used by the design team and the community to +submit and vote on the supplemental wallpapers provided with each version of +Fedora. + +Contents +======== + +1. Contact Information +2. Documentation Links + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + https://apps.fedoraproject.org/nuancier +Servers + nuancier01, nuancier02, nuancier01.stg, nuancier02.stg +Purpose + Provide a system to submit and vote on supplemental wallpapers + + +Create a new election +===================== + +* Login + +* Go to the `Admin` panel via the menu at the top + +* Click on `Create a new election`. + +* Complete the form: + + Election name + A short name used in all the pages, most often since we have one election + per release it has been of the form `Fedora XX` + + Name of the folder containing the pictures + This just links the election with the folder where the images will be + uploaded on disk. Keep it simple, safe, something like `fXX` will do. + + Year + The year when the election will be happening, this will just give some quick + sorting option + + Submission start date (in UTC) + The date from which the people will be able to submit wallpapers for the + election. The submission starts on the exact day at midnight UTC. + + Start date (in UTC) + The date when the election starts (and thus the submissions end). There is + no buffer between when the submissions end and when the votes start which + means admins have to keep up with the submissions as they are done. + + End date (in UTC) + The date when the election ends. There are no embargo on the results, they + are available right after the election ends. + + URL to claim a badge for voting + The URL at which someone can claim a badge. This URL is displayed on the + voting page as well as ones people have voted. Which means that having the + badge does not ensure people voted, at max it ensures people visited + nuancier during a voting phase. + + Number of votes an user can make + The number of wallpapers an user can choose/vote on. This was made as they + was a debate in the design team if having everyone vote on all 16 wallpapers + was a good idea or not. + + Number of candidate an user can upload + Restricts the number of wallpapers an user can submit for an election to + prevent people from uploading tens of wallpapers in one election. + +Review an election +================== + +Admins must do that regularly during a submission phase to avoid candidates from +piling up. + +* Login + +* Go to the `Admin` panel via the menu at the top + +* Find the election of interest in the list and click on `Review` + +If the images are not showing, you can generate the thumbnails using the button +`(Re-)generate cache`. + +On the review page, you will be able to filter the candidates by `Approved`, +`Pending`, `Rejected` or see them `All` (default). + +You can then check the images one by one, select their checkbox and then either +`Approve` or `Deny` all the ones you selected. + +.. note:: Rejections must be motivated in the `Reason for rejection / Comments` + input field. This motivation is then sent by email to the user + explaining why a wallpaper they submitted was not accepted into the + election. + + +Vote on an election +=================== + +Once an election is opened, a link announcing it will be available from the front +page and in the page listing the elections (`Elections` tab in the menu) a green +check-mark will appear on the `Votes` column while a red forbidden sign will +appear on the `Submissions` column. + +You can then click on the election name which will take you on the voting page. + +There, enlarge the image by clicking on them and make your choice by clicking on +the bottom right corner of the image. + +On the column on the right the total number of vote available will appear. +If you need to change remove a wallpaper from your selection, simply click on it +in the right column. + +As long as you have not picked the maximum number of candidates allowed, you can +cast your vote multiple times (but not on the same candidates of course). + + +View all the candidates of an election +====================================== + +All the candidates of an election are only accessible once the election is over. +If you wish to see all the images uploaded, simply go to the `Elections` tab and +click on the election name. + + +View the results of an election +=============================== + +The results of an election are accessible immediately after the end of it. +To see them, simply click the `Results` tab in the menu. + +There you can click on the name of the election to see the wallpaper ordered by +their number of votes or on `stats` to view some stats about the election (such +as the number of participants, the number of voters, votes or the evolution of +the votes over time). + + +Miscellaneous +============= + +Nuancier uses a gluster volume shared between the two hosts (in prod and in stg) +where are stored the images, making sure they are available to both frontends. +This may make things a little trickier sometime, be aware of it. diff --git a/docs/sysadmin-guide/sops/openvpn.rst b/docs/sysadmin-guide/sops/openvpn.rst new file mode 100644 index 0000000..a16d523 --- /dev/null +++ b/docs/sysadmin-guide/sops/openvpn.rst @@ -0,0 +1,137 @@ +.. title: OpenVPN SOP +.. slug: infra-openvpn +.. date: 2011-12-16 +.. taxonomy: Contributors/Infrastructure + +=========== +OpenVPN SOP +=========== + +OpenVPN is our server->server VPN solution. It is deployed in a routeless +manner and uses ansible managed keys for authentication. All hosts should +be given static IP's and a hostname.vpn.fedoraproject.org DNS address. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main + +Location + Phoenix + +Servers + bastion (vpn.fedoraproject.org) + +Purpose + Provides vpn solution for our infrastructure. + +Add a new host +=============== + +Create/sign the keys +-------------------- +From batcave01 check out the private repo:: + + # This is to ensure that the clone is not world-readable at any point. + RESTORE_UMASK=$(umask -p) + umask 0077 + git clone /git/private + $RESTORE_UMASK + cd private/vpn/openvpn + +Next prepare your environment and run the build-key script. This example +is for host "proxy4.fedora.phx.redhat.com":: + + . ./vars + ./build-key $FQDN # ./revoke-full $FQDN to revoke keys that are no longer used. + git add . + git commit -a + git push + +Create Static IP +---------------- + +Giving static IP's out in openvpn is mostly painless. Take a look at other +examples but each host gets a file and 2 IP's.:: + + git clone /git/ansible + vi ansible/roles/openvpn/server/files/ccd/$FQDN + +The file format should look like this:: + + ifconfig-push 192.168.1.314 192.168.0.314 + +Basically the first IP is the IP that is contactable over the vpn and +should always take the format "192.168.1.x" and the PtPIP is the same ip +on a different network: "192.168.0.x" + +Commit and install:: + + git add . + git commit -m "What have you done?" + git push + +And then push that out to bastion:: + + sudo -i ansible-playbook $(pwd)/playbooks/groups/bastion.yml -t openvpn + +Create DNS entry +---------------- + +After you have your static IP ready, just add the entry to DNS:: + + git clone /git/dns && cd dns + vi master/168.192.in-addr.arpa + # pick out an ip that's unused + vi master/vpn.fedoraproject.org + git commit -m "What have you done?" + ./do-domains + git commit -m "done build." + git push + +And push that out to the name servers with:: + + sudo -i ansible ns\* -a "/usr/local/bin/update-dns" + +Update resolv.conf on the client +-------------------------------- +To make sure traffic actually goes over the VPN, make sure the search line +in /etc/resolv.conf looks like:: + + search vpn.fedoraproject.org fedoraproject.org + +for external hosts and:: + + search phx2.fedoraproject.org vpn.fedoraproject.org fedoraproject.org + +for PHX2 hosts. + +Remove a host +============= +:: + # This is to ensure that the clone is not world-readable at any point. + RESTORE_UMASK=$(umask -p) + umask 0077 + git clone /git/private + $RESTORE_UMASK + cd private/vpn/openvpn + +Next prepare your environment and run the build-key script. This example +is for host "proxy4.fedora.phx.redhat.com":: + + . ./vars + ./revoke-full $FQDN + git add . + git commit -a + git push + + +TODO +==== +Deploy an additional VPN server outside of PHX. OpenVPN does support +failover automatically so if configured properly, when the primary VPN +server goes down all hosts should connect to the next host in the list. diff --git a/docs/sysadmin-guide/sops/orientation.rst b/docs/sysadmin-guide/sops/orientation.rst new file mode 100644 index 0000000..09081ec --- /dev/null +++ b/docs/sysadmin-guide/sops/orientation.rst @@ -0,0 +1,170 @@ +.. title: Infrastucture Orientation SOP +.. slug: infra-orientation +.. date: 2016-10-20 +.. taxonomy: Contributors/Infrastructure + +============================== +Orientation Infrastructure SOP +============================== + +Basic orientation and introduction to the sysadmin group. Welcome aboard! + +Contents +======== + +1. Contact Information +2. Description +3. Welcome to the team + + 1. Time commitment + 2. Prove Yourself + +4. Doing Work + + 1. Ansible + +5. Our Setup +6. Our Rules + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Purpose + Provide basic orientation and introduction to the sysadmin group + +Description +=========== + +Fedora's Infrastructure team is charged with keeping all the lights on, +improving pain points, expanding services, designing new services and +partnering with other teams to help with their needs. The team is highly +dynamic and primarily based in the US. This is only significant in that +most of us work during the day in US time. We do have team members all +over the globe though and generally have decent coverage. If you happen to +be one of those who is not in a traditional US time zone you are +encouraged to be around, especially in #fedora-admin during those times +when we have less coverage. Even if it is just to say "I can't help with +that but $ADMIN will be and he should be here in about 3 hours". + +The team itself is generally friendly and honest. Don't be afraid to +disagree with someone, even if you're new and they're an old timer. Just +make sure you ask yourself what is important to you and make sure to +provide data, we like that. We generally communicate on irc.freenode.net +in #fedora-admin. We have our weekly meetings on IRC and its the quickest +way to get in touch with everyone. Secondary to that we use the mailing +list. After that its our ticketing system and talk.fedoraproject.org. + +*Welcome to the team!* + +Time commitment +=============== + +Often times this is the biggest reason for turnover in our group. Some +groups like sysadmin-web and certainly sysadmin-main require a huge time +commitment. Don't be surprised if you see people working between 10-30 +hours a week on various tasks and that's the volunteers. Your time +commitment is something personal to each individual and its something that +you should take some serious thought about. In general it's almost +impossible to be a regular part of the team without at least 5-10 hours a +week dedicated to the Infrastructure team. + +Also note, if you are going to be away, let us know. As a volunteer we +can't possibly ask you to always be around all the time. Even if you're in +the middle of a project and have to stop, let us know. Nothing is worse +then thinking someone is working on something or will be around and +they're just not. Really, we all understand, got a test coming up? Busier +at work then normal? Going on a vacation? It doesn't matter, just let us +know when you're going to be gone and what you're working on so it doesn't +get forgotten. + +Additionally don't forget that its worth it to discuss with your employer +about giving time during work. They may be all for it. + +Prove Yourself +============== + +This is one of the most difficult aspects of getting involved with our +team. We can't just give access to everyone who asks for it and often +actually doing work without access is difficult. Some of the best things +you can do are: + +* Keep bugging people for work. It shows you're committed. +* Go through bugs, look at stale bugs and close bugs that have been fixed +* Try to duplicate bugs on your workstation and fix them there + +Above all stick with it. Part of proving yourself is also to show the time +commitment it actually does take. + +Doing Work +========== +Once you've been sponsored for a team its generally your job to find what +work needs to be done in the ticketing system. Be proactive about this. +The tickets can be found at: + +https://pagure.io/fedora-infrastructure/issues + +When you find a ticket that interests you contact your sponsor or the +ticket owner and offer help. While you're getting used to the way things +work, don't be offput by someone saying no or you can't work on that. It +happens, sometimes its a security thing, sometimes its a "I'm half way +through it and I'm not happy with where it is thing." Just move on to the +next ticket and go from there. + +Also don't be surprised if some of the work involved includes testing on +your own workstation. Just setup a virtual environment and get to work! +There's a lot of work that can be done to prove yourself that involves no +access at all. Doing this kind of work is a sure fire way to get in to +more groups and get more involved. Don't be afraid to take on tasks you +don't already know how to do. But don't take on something you know you +won't be able to do. Ask for help when you need it and keep in contact +with your sponsor so you know + +Ansible +======= + +Things we do gets done in Ansible. It is important that you not make changes directly on +servers. This is for many reasons but just always make changes in +Ansible. If you want to get more familiar with Ansible, set it +up yourself and give it a try. The docs are available at +https://docs.ansible.com/ + +Our Setup +========= + +Most of our work is done via bastion.fedoraproject.org. That host has +access to our other hosts, many of which are all over the globe. We have a +vpn solution setup so that knowing where the servers physically are is +only important when troubleshooting things. When you first get granted +access to one of the sysadmin-* groups, the first place you should turn is +bastion.fedoraproject.org then from there ssh to batcave01. + +We also have an architecture repo available in our git repo. To get a copy +of this repo just:: + + dnf install git + git clone https://pagure.io/fedora-infrastructure.git + +This will allow you to look through (and help fix) some of our scripts as +well as have access to our architectural documentation. Become familiar +with those docs if you're curious. There's always room to do better +documentation so if you're interested just ping your sponsor and ask about +it. + +Our Rules +========= +The Fedora Infrastructure Team does have some rules. First is the security +policy. Please ensure you are compliant with: + +https://infrastructure.fedoraproject.org/csi/security-policy/ + +before logging in to any of our servers. Many of those items rely on the +honor system. + +Additionally note that any of the software we deploy must be available in +Fedora. There are some rare exceptions to this (particularly as it relates +to specific applications to Fedora). But each exception is taken on a case +by case basis. diff --git a/docs/sysadmin-guide/sops/outage.rst b/docs/sysadmin-guide/sops/outage.rst new file mode 100644 index 0000000..bdccf2c --- /dev/null +++ b/docs/sysadmin-guide/sops/outage.rst @@ -0,0 +1,282 @@ +.. title: Outage Infrastructure SOP +.. slug: infra-outage +.. date: 2015-04-23 +.. taxonomy: Contributors/Infrastructure + +========================= +Outage Infrastructure SOP +========================= + +What to do when there's an outage or when you're planning to take an +outage. + +Contents +======== + +1. Contact Information +2. Users (No Access) + + 1. Planned Outage + + 1. Contacts + + 2. Unplanned Outage + + 1. Check first + 2. Reporting or participating in an outage + +5. Infrastructure Members (Admin Access) + + 1. Planned Outage + + 1. Planning + 2. Preparations + 3. Outage + 4. Post outage cleanup + + 2. Unplanned Outage + + 1. Determine Severity + 2. First Steps + 3. Fix it + 4. Escalate + 5. The Resolution + 6. The Aftermath + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main group +Location + Anywhere +Servers + Any +Purpose + This SOP is generic for any outage +Emergency: + https://admin.fedoraproject.org/pager + + +Users (No Access) +================= + +.. note:: + Don't have shell access? Doesn't matter. Stop by and stay in #fedora-admin + if you have any expertise in what is going on, please assist. Random users + have helped the team out countless numbers of times. Any time the team + doesn't have to go to the docs to look up an answer is a time they can be + spending fixing what's busted. + +Planned Outage +-------------- + +If a planned outage comes at a terrible time, just let someone know. The +Infrastructure Team does its best to keep outages out of the way but if +there's a mass rebuild going on that we don't know about and we schedule a +koji outage, let someone know. + +Contacts +```````` + +Pretty much all coordination occurs in #fedora-admin on irc.freenode.net. +Stop by there to watch more about what's going on. Just stay on topic. + +Unplanned Outage +---------------- + +Check first +``````````` + +Think something is busted? Please check with others to see if they are +also having issues. This could even include checking on another computer. +When reporting an outage remember that the admins will typically drop +everything they are doing to check what the problem is. They won't be +happy to find out your cert has expired or you're using the wrong +username. Additionally, check the status dashboard +(http://status.fedoraproject.org) to verify that there is no previously +reported outage that may be causing and/or related to your issue. + +Reporting or participating in an outage +``````````````````````````````````````` + +If you think you've found an outage, get as much information as you can +about it at a glance. Copy any errors you get to http://pastebin.ca/. +Use the following guidelines: + +Don't be general. + * BAD: "The wiki is acting slow" + * Good: "Whenever I try to save https://fedoraproject.org/wiki/Infrastructure, + I get a proxy error after 60 seconds" + +Don't report an outage that's already been reported. + * BAD: "/join #fedora-admin; Is the build system broken?" + * Good: "/join #fedora-admin; wait a minute or two; I noticed I + can't submit builds, here's the error I get:" + +Don't suggest drastic or needless changes during an outage (send it to the list) + * "Why don't you just use lighttpd?" + * "You could try limiting MaxRequestsPerChild in Apache" + +Don't get off topic or be too chatty + * "Transformers was awesome, but yeah, I think you guys know what to do next" + +Do research the technologies we're using and answer questions that may come up. + * BAD: "Can't you just fix it?" + * Good: "Hey guys, I think this is what you're looking for: + http://httpd.apache.org/docs/2.2/mod/mod_mime.html#addencoding" + +If no one can be contacted after 10 minutes or so please see the section +below called Determine Severity to determine whether or not someone +should get paged. + + +Infrastructure Members (Admin Access) +===================================== + +The Infrastructure Members section is specifically written for members +with access to the servers. This could be admin access to a box or even a +specific web application. Basically anyone with access to fix the problem. + +Planned Outage +-------------- + +Any outage that is intentionally caused by a team member is a planned +outage. Even if it has to happen in the next 5 minutes. + +Planning +````````` + +All major planned outages should occur with at least 1 week notice. This +is not always possible, use best judgment. Please use our standard outage +template at: https://fedoraproject.org/wiki/Infrastructure/OutageTemplate. +Make sure to have another person review your template/announcement to +check times and services affected. Make sure to send the announcement to +the lists that are affected by the outage: announce, devel-announce, etc. + +Always create a ticket in the ticketing system: +https://fedoraproject.org/wiki/Infrastructure/Tickets +Send an email to the fedora-infrastructure-list with more details if +warranted. + +Remember to follow an existing SOP as much as possible. If anything is +missing from the SOP please add it. + +Preparations +````````````` + +Remember to schedule an outage in nagios. This is important not just so +notifications don't get sent but also important for trending and +reporting. https://admin.fedoraproject.org/nagios/ + +Outage +`````` + +Prior to beginning an outage to any monitored service on +http://status.fedoraproject.org please push an update to reflect the outage +(see status-fedora SOP). + +Report all information in #fedora-admin. Coordination is extremely +important, it's rare for our group to meet in person and IRC is our only +real-time communication device. If a web site is out please put up some +sort of outage page in its place. + +Post outage cleanup +```````````````````` + +Once the outage is over ensure that all services are up and running. +Ensure all nagios services are back to green. Notify everyone in +#fedora-admin to scan our services for issues. Once all services are +cleared update the status.fp.o dashboard. If the outage included a +new feature or major change for a group, please notify that group that the +change is ready. Make sure to close the ticket for the outage when it's +over. + +Once the services are restored, an update to the status dashboard should be +pushed to show the services are restored. + +.. important:: + Additionally update any SOP's that may have changed in the course of the + outage + +Unplanned Outage +---------------- +Unplanned outages happen, stay cool. As a team member never be afraid to +do something because you think you'll get in trouble over it. Be smart, +don't be reckless, and never say "I shouldn't do this". If an unorthodox +method or drastic change will fix the problem, do it, document it, and let +the team know. Messes can always be cleaned up after the outage. + +Determine Severity +`````````````````` + +Some outages require immediate fixing, some don't. A page should never go +out because someone can't sign the cla. Most of our admins are in US time, +use your best judgment. If it's bad enough to warrant an emergency page, +page one of the admins at: https://admin.fedoraproject.org/pager + +Use the following as loose guidelines, just use your best judgment. + +* BAD: "I can't see the Recent Changes on the wiki." +* Good: "The entire wiki is not viewable" + +* BAD: I cannot sign the CLA +* Good: I can't change my password in the account system, + I have admin access and my laptop was just stolen + +* BAD: I can't access awstats for fedoraproject.org +* Good: The mirrors list is down. + +* BAD: I think someone misspelled some words on the webpage +* Good: The web page has been hacked and I think someone + notified slashdot. + +First Steps +``````````` + +After an outage has been verified, acknowledge the outage in nagios: +https://admin.fedoraproject.org/nagios/, update the related system on the +status dashboard (see the status-fedora SOP) and verify changes at +http://status.fedoraproject.org, then head in to #fedora-admin +to figure out who is around and coordinate the next course of action. +Consult any relevent SOP's for corrective actions. + +Fix it +``````` +Fix it, Fix it, Fix it! Do whatever needs to be done to fix the problem, +just don't be stupid about it. + +Escalate +````````` +Can't fix it? Don't wait, Escalate! All of the team members have expertise +with some areas of our environment and weaknesses in other areas. Never be +afraid to tap another team member. Sometimes it's required, sometimes it's +not. The last layer of defense is to page someone. At present our team is +small enough that a full escalation path wouldn't do much good. Consult +the contact information on each SOP for more information. + +The Resolution +``````````````` +Once the services are restored, an update to the status dashboard should be +pushed to show the services are restored. + +The Aftermath +`````````````` +With any outage there will be questions. Please try as hard as possible to +answer the following questions and send them to the +fedora-infrastructure-list. + +1. What happened? +2. What was affected? +3. How long was the outage? +4. What was the root cause? + +.. important:: Number 4 is especially important. If a kernel build keeps failing because + of issues with koji caused by a database failure caused by a full + filesystem on db1. Don't say koji died because of a db failure. Any time a + root cause is discovered and not being monitored by nagios, add it if + possible. Most failures can be prevented or mitigated with proper + monitoring. + diff --git a/docs/sysadmin-guide/sops/packagedatabase.rst b/docs/sysadmin-guide/sops/packagedatabase.rst new file mode 100644 index 0000000..907f78a --- /dev/null +++ b/docs/sysadmin-guide/sops/packagedatabase.rst @@ -0,0 +1,323 @@ +.. title: Package Database Infrastucture SOP +.. slug: infra-packagedb +.. date: 2013-04-30 +.. taxonomy: Contributors/Infrastructure + +=================================== +Package Database Infrastructure SOP +=================================== + + +The PackageDB is used by Fedora developers to manage package ownership and +acls. It controls who is allowed to commit to a package and who gets +notification of changes to packages. + +PackageDB project Trac: [45]https://fedorahosted.org/packagedb/ + +Contents +i======= + +1. Contact Information +2. Troubleshooting and Resolution +3. Common Actions + + 1. Adding a new Pseudo User as a package owner + 2. Renaming a package + 3. Removing a package + 4. Add a new release + 5. Update App DB for a release going final + 6. Orphaning all the packages for a user + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin + +Persons + abadger1999 + +Location + Phoenix + +Servers + admin.fedoraproject.org Click on one of the [55]current haproxy + servers to see the physical servers + +Purpose + Manage package ownership + +Troubleshooting and Resolution +============================== + +Common Actions +============== + +Adding a new Pseudo User as a package owner +------------------------------------------- + +Sometimes you want to have a mailing list own a package so that bugzilla +email is assigned to the mailing list. Doing this requires adding a new +pseudo user to the account system and assigning that person as the package +maintainer. + +.. warning:: pseudo users often have a dash in their name. We create email + aliases via ansible that have dashes in their name in order to not + collide with fas usernames (users cannot create usernames with dashes + via the webui). Make sure that any pseudo-users you create do not + clash with existing email aliases. + +In the following examples, replace ("xen", "kernel-xen-2.6") with the +packages you are assigning to the new user and 9902 to the userid you +select in step 2 + +* Log into fas-db01. +* Log into the db as a user that can make changes:: + + $ psql -U postgres fas2 + fas2> + + * Find the current pseudo-users:: + + fas2> select id, username from people where id < 10000 order by id; + id | username + ------+------------------ + 9900 | orphan + 9901 | anaconda-maint + + * Create a new account with the next available id after 9900:: + + fas2> insert into people (id, username, human_name, password, email) + values (9902, 'xen-maint', 'Xen Maintainers', '*', 'xen-maint@redhat.com'); + +* Connect to the pkgdb as a user that can make changes:: + + $ psql -U pkgdbadmin -h db01 pkgdb + pkgdb> + +* Add the current package owner as a comaintainer of the package. If + this user is not currently on he acls for the package you can use the + following database queries:: + + insert into personpackagelisting (username, packagelistingid) + select pl.owner, pl.id from packagelisting as pl, package as p + where p.id = pl.packageid and p.name in ('xen', 'kernel-xen-2.6'); + insert into personpackagelistingacl (personpackagelistingid, acl, statuscode) + select ppl.id, 'build', 3 from personpackagelisting as ppl, packagelisting as pl, package as p + where p.id = pl.packageid and pl.id = ppl.packagelistingid and pl.owner = ppl.username + and p.name in ('xen', 'kernel-xen-2.6'); + insert into personpackagelistingacl (personpackagelistingid, acl, statuscode) + select ppl.id, 'commit', 3 from personpackagelisting as ppl, packagelisting as pl, package as p + where p.id = pl.packageid and pl.id = ppl.packagelistingid + and pl.owner = ppl.username + and p.name in ('xen', 'kernel-xen-2.6'); + insert into personpackagelistingacl (personpackagelistingid, acl, statuscode) + select ppl.id, 'approveacls', 3 from personpackagelisting as ppl, packagelisting as pl, package as p + where p.id = pl.packageid and pl.id = ppl.packagelistingid + and pl.owner = ppl.username + and p.name in ('xen', 'kernel-xen-2.6'); + + + If the owner is in the acls, you will need to figure out which packages + already acls and only add the new acls for that one. + +* Reassign the pseudo-user to be the new owner:: + + update packagelisting set owner = 'xen-maint' from package as p + where packagelisting.packageid = p.id and p.name in ('xen', 'kernel-xen-2.6'); + +Renaming a package +------------------- + +On db2:: + + sudo -u postgres psql pkgdb + select * from package where name = 'OLDNAME'; + [Make sure only the package you want is selected] + update package set name = 'NEWNAME' where name = 'OLDNAME'; + +On cvs-int:: + + CVSROOT=/cvs/pkgs cvs co CVSROOT + sed -i 's/OLDNAME/NEWNAME/g' CVSROOT/modules + cvs commit -m 'Rename OLDNAME => NEWNAME' + cd /cvs/pkgs/rpms + mv OLDNAME NEWNAME + cd NEWNAME + find . -name 'Makefile,v' -exec sed -i 's/NAME := OLDNAME/NAME := NEWNAME/' \{\} \; + cd ../../devel + rm OLDNAME + ln -s ../rpms/NEWNAME/devel . + +If the package has existed long enough to have been added to koji, run +something like the following to "retire" the old name in koji.:: + + koji block-pkg dist-f12 OLDNAME + +Removing a package +================== + +.. warning:: + Do not remove a package if it has been built for a fedora release or if + you are not also willing to remove the cvs directory. + +When a package has been added due to a typo, it can be removed in one of +two ways: marking it as a mistake with the "removed" status or deleting it +from the db entirely. Marking it as removed is easier and is explained +below. + +On db2:: + + sudo -u postgres psql pkgdb + pkgdb=# select id, name, summary, statuscode from package where name = 'b'; + id | name | summary | statuscode + ------+------+--------------------------------------------------+----------- + 6618 | b | A simple database interface to MS-SQL for Python | 3 + (rows 1) + +- Make sure there is only one package returned and it is the correct one. +- Statuscode 3 is "approved" and it's what we're changing from +- You'll also need the id for later:: + + pkgdb=# BEGIN; + pkgdb=# update package set statuscode = 17 where name = 'b'; + UPDATE 1 + +- Make sure only a single package was changed.:: + + pkgdb=# COMMIT; + + pkgdb=# select id, packageid, collectionid, owner, statuscode from packagelisting where packageid = 6618; + id | packageid | collectionid | owner | statuscode + -------+-----------+--------------+--------+----------- + 42552 | 6618 | 19 | 101437 | 3 + 38845 | 6618 | 15 | 101437 | 3 + 38846 | 6618 | 14 | 101437 | 3 + 38844 | 6618 | 8 | 101437 | 3 + (rows 4) + +- Make sure the output here looks correct (packageid is all the same, etc). +- You'll also need the ids for later:: + + pkgdb=# BEGIN; + pkgdb=# update packagelisting set statuscode = 17 where packageid = 6618; + UPDATE 4 + -- Make sure the same number of rows were committed as you saw before. + pkgdb=# COMMIT; + + pkgdb=# select * from personpackagelisting where packagelistingid in (38844, 38846, 38845, 42552); + id | userid | packagelistingid. + ----+--------+------------------ + (0 rows) + +- In this case there are no comaintainers so we don't have to do anymore. If + there were we'd have to treat them like groups handled next:: + + pkgdb=# select * from grouppackagelisting where packagelistingid in (38844, 38846, 38845, 42552); + id | groupid | packagelistingid. + -------+---------+------------------ + 39229 | 100300 | 38844 + 39230 | 107427 | 38844 + 39231 | 100300 | 38845 + 39232 | 107427 | 38845 + 39233 | 100300 | 38846 + 39234 | 107427 | 38846 + 84481 | 107427 | 42552 + 84482 | 100300 | 42552 + (8 rows) + + pkgdb=# select * from grouppackagelistingacl where grouppackagelistingid in (39229, 39230, 39231, 39232, 39233, 39234, 84481, 84482); + +- The results of this are usually pretty long. so I've omitted everything but the rows + (24 rows) +- For groups it's typically 3 (one for each of commit, build, and checkout) * +- number of grouppackagelistings. In this case, that's 24 so this matches our expectations.:: + + pkgdb=# BEGIN; + pkgdb=# update grouppackagelistingacl set statuscode = 13 where grouppackagelistingid in (39229, 39230, 39231, 39232, 39233, 39234, 84481, 84482); + +- Make sure only the number of rows you saw before were updated:: + + pkgdb=# COMMIT; + + If the package has existed long enough to have been added to koji, run + something like the following to "retire" it in koji.:: + + koji block-pkg dist-f12 PKGNAME + +Add a new release +================= + +To add a new Fedora Release, ssh to db02 and do this:: + + sudo -u postgres psql pkgdb + +- This adds the release for Package ACLs:: + + insert into collection (name, version, statuscode, owner, koji_name) values('Fedora', '13', 1, 'jkeating', 'dist-f13'); + insert into branch select id, 'f13', '.fc13', Null, 'f13' from collection where name = 'Fedora' and version = '13'; + +- If this is for mass branching we probably need to advance the branch information for devel as well.:: + + update branch set disttag = '.fc14' where collectionid = 8; + +- This adds the new release's repos for the App DB:: + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386', 'Fedora 13 - i386', 'development/13/i386/os', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386-d', 'Fedora 13 - i386 - Debug', 'development/13/i386/debug', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386-tu', 'Fedora 13 - i386 - Test Updates', 'updates/testing/13/i386/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-i386-tud', 'Fedora 13 - i386 - Test Updates Debug', 'updates/testing/13/i386/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64', 'Fedora 13 - x86_64', 'development/13/x86_64/os', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64-d', 'Fedora 13 - x86_64 - Debug', 'development/13/x86_64/debug', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64-tu', 'Fedora 13 - x86_64 - Test Updates', 'updates/testing/13/x86_64/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-13-x86_64-tud', 'Fedora 13 - x86_64 - Test Updates Debug', 'updates/testing/13/x86_64/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '13'; + +Update App DB for a release going final +======================================= + +When a Fedora release goes final, the repositories for it change where +they live. The repo definitions allow the App browser to sync information +from the yum repositories. The PackageDB needs to be updated for the new +areas:: + + BEGIN; + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-i386-u', 'Fedora 14 - i386 - Updates', 'updates/14/i386/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-i386-ud', 'Fedora 14 - i386 - Updates Debug', 'updates/14/i386/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; + update repos set url='releases/14/Everything/i386/os/' where shortname = 'F-14-i386'; + update repos set url='releases/14/Everything/i386/debug/' where shortname = 'F-14-i386-d'; + + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-x86_64-u', 'Fedora 14 - x86_64 - Updates', 'updates/14/x86_64/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; + insert into repos (shortname, name, url, mirror, active, collectionid) select 'F-14-x86_64-ud', 'Fedora 14 - x86_64 - Updates Debug', 'updates/14/x86_64/debug/', 'http://download.fedoraproject.org/pub/fedora/linux/', true, c.id from collection as c where c.name = 'Fedora' and c.version = '14'; + update repos set url='releases/14/Everything/x86_64/os/' where shortname = 'F-14-x86_64'; + update repos set url='releases/14/Everything/x86_64/debug/' where shortname = 'F-14-x86_64-d'; + COMMIT; + +Orphaning all the packages for a user +===================================== + +This can be done in the database if you don't want to send email:: + + $ ssh db02 + $ sudo -u postgres psql pkgdb + pkgdb> select * from packagelisting where owner = 'xulchris'; + pkgdb> -- Check that the list doesn't look suspicious.... There should be a record for every fedora release * package + pkgdb> BEGIN; + pkgdb> update packagelisting set owner = 'orphan', statuscode = 14 where owner = 'xulchris'; + pkgdb> -- If the right number of rows were changed + pkgdb> COMMIT; + +.. note:: + Note that if you do it via pkgdb-client or the python-fedora API instead, + you'll want to only orphan the packages on non-EOL branches that exist to + cut down on the amount of email that's sent. That entails figuring out + what branches you need to do this on. diff --git a/docs/sysadmin-guide/sops/pdc.rst b/docs/sysadmin-guide/sops/pdc.rst new file mode 100644 index 0000000..0de23c9 --- /dev/null +++ b/docs/sysadmin-guide/sops/pdc.rst @@ -0,0 +1,133 @@ +.. title: PDC SOP +.. slug: infra-pdc +.. date: 2016-04-07 +.. taxonomy: Contributors/Infrastructure + +======= +PDC SOP +======= + +Store metadata about composes we produce and "component groups". + +App: https://pdc.fedoraproject.org/ +Source for frontend: https://github.com/product-definition-center/product-definition-center +Source for backend: https://github.com/fedora-infra/pdc-updater + +Contact Information +------------------- + +Owner + Release Engineering, Fedora Infrastructure Team +Contact + #fedora-apps, #fedora-releng, #fedora-admin, #fedora-noc +Servers + pdc-web0{1,2}, pdc-backend01 +Purpose + Store metadata about composes and "component groups" + +Description +----------- + +The Product Definition Center (PDC) is a webapp and API designed for storing and +querying product metadata. We automatically populate our instance with data +from our existing releng tools/processes. It doesn't do much on its own, but +the goal is to enable us to develop more sane tooling down the road for future +releases. + +The webapp is a django app running on pdc-web0{1,2}. Unlike most of our other +apps, it does not use OpenID for authentication, but it instead uses SAML2. It +uses `mod_auth_mellon` to achieve this (in cooperation with ipsilon). The +webapp allows new data to be POST'd to it by admin users. + +The backend is a `fedmsg-hub` process running on pdc-backend01. It listens for +new composes over fedmsg and then POSTs data about those composes to PDC. It +also listens for changes to the fedora atomic host git repo in pagure and +updates "component groups" in PDC to reflect what rpm components constitute +fedora atomic host. + + +For long-winded history and explanation, see the original Change document: +https://fedoraproject.org/wiki/Changes/ProductDefinitionCenter + +Upgrading the Software +---------------------- + +There is an upgrade playbook in ``playbooks/manual/upgrade/pdc.yml`` which will +upgrade both the frontend and the backend if new packages are available. +Database schema upgrades should be handled automatically with a run of that +playbook. + +Logs +---- + +Logs for the frontend are in `/var/log/httpd/error_log` on pdc-web0{1,2}. + +Logs for the backend can be accessed with `journalctl -u fedmsg-hub -f` on +pdc-backend01. + +Restarting Services +------------------- + +The frontend runs under apache. So either `apachectl graceful` or `systemctl +restart httpd` should do it. + +The backend runs as a fedmsg-hub, so `systemctl restart fedmsg-hub` should +restart it. + +Scripts +------- + +The pdc-updater package (installed on pdc-backend01) provides three scripts: + +- pdc-updater-audit +- pdc-updater-retry +- pdc-updater-initialize + +A possible failure scenario is that we will lose a fedmsg message and the +backend will not update the frontend with info about that compose. To detect +this, we provide the `pdc-updater-audit` command (which gets run once daily by +cron with emails sent to the releng-cron list). It compare all of the entries +in PDC with all of the entries in kojipkgs and then raises an alert if there is +a discrepancy. + +Another possible failure scenario is that the fedmsg message is published and +received correctly, but there is some processing error while handling it. The +event occurred, but the import to the PDC db failed. The `pdc-updater-audit` +script should detect this discrepancy, and then an admin will need to manually +repair the problem and retry the event with the `pdc-updater-retry` command. + +If doomsday occurs and the whole thing is totally hosed, you can delete the db +and re-ingest all information available from releng with the +``pdc-updater-initialize`` tool. (Creating the initial schema needs to happen +on pdc-web01 with the standard django settings.py commands.) + +Manually Updating Information +----------------------------- + +In general, you shouldn't have to do these things. pdc-updater will +automatically create new releases and update information, but if you ever need +to manipulate PDC data, you can do it with the pdc-client tool. A copy is +installed on pdc-backend01 and there are some credentials there you'll need, so +ssh there first. + +Make sure that you are root so that you can read `/etc/pdc.d/fedora.json`. + +Try listing all of the releases:: + + $ pdc -s fedora release list + +Deactivating an EOL release:: + + $ pdc -s fedora release update fedora-21-updates --deactivate + +.. note:: There are lots more attribute you can manipulate on a release (you can change + the type, and rename them, etc..) See `pdc --help` and `pdc release --help` for + more information. + +Listing all composes:: + + $ pdc -s fedora compose list + +We're not sure yet how to flag a compose as the Gold compose, but when we do, +the answer should appear here: +https://github.com/product-definition-center/product-definition-center/issues/428 diff --git a/docs/sysadmin-guide/sops/pesign-upgrade.rst b/docs/sysadmin-guide/sops/pesign-upgrade.rst new file mode 100644 index 0000000..575682a --- /dev/null +++ b/docs/sysadmin-guide/sops/pesign-upgrade.rst @@ -0,0 +1,66 @@ +.. title: Pesign Upgrades and Reboots +.. slug: infra-pesign-maintenance +.. date: 2013-05-29 +.. taxonomy: Contributors/Infrastructure + +======================= +Pesign upgrades/reboots +======================= + +Fedora has (currently) 2 special builders. These builders are used to +build a small set of packages that need to be signed for secure boot. +These packages include: grub2, shim, kernel, pesign-test-app + +When rebooting or upgrading pesign on these machines, you have to +follow a special process to unlock the signing keys. + +Contact Information +=================== + +Owner + Fedora Release Engineering, Kernel/grub2/shim/pesign maintainers +Contact + #fedora-admin, #fedora-kernel +Servers + bkernel01, bkernel02 +Purpose + Upgrade or restart singning keys on kernel/grub2/shim builders + +Procedure +=========== + +0. Coordinate with pesign maintainers or pesign-test-app commiters as well +as releng folks that have the pin to unlock the signing key. + +1. remove builder from koji:: + + koji disable-host bkernel01.phx2.fedoraproject.org + +2. Make sure all builds have completed. + +3. Stop existing processes:: + + service pcscd stop + service pesign stop + +4. Perform updates or reboots. + +5. Restart services (if you didn't reboot):: + + service pcscd start + service pesign start + +6. Unlock signing key:: + + pesign-client -t "OpenSC Card (Fedora Signer)" -u + (enter pin when prompted) + +7. Make sure no builds are in progress, then Re-add builder to koji, remove other builder:: + + koji enable-host bkernel01.phx2.fedoraproject.org + koji disable-host bkernel02.phx2.fedoraproject.org + +8. Have a commiter send a build of pesign-test-app and make sure it's signed correctly. + +9. If so, repeat process with second builder. + diff --git a/docs/sysadmin-guide/sops/planetsubgroup.rst b/docs/sysadmin-guide/sops/planetsubgroup.rst new file mode 100644 index 0000000..fcec6c7 --- /dev/null +++ b/docs/sysadmin-guide/sops/planetsubgroup.rst @@ -0,0 +1,68 @@ +.. title: Fedora Planet Subgroup SOP +.. slug: infra-planet-subgroup +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +================================== +Planet Subgroup Infrastructure SOP +================================== + +Fedora's planet infrastructure produces planet configs out of users' +``~/.planet`` files in their homedirs on fedorapeople.org. You can also create +subgroups of users into other planets. This document explains how to setup +new subgroups. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Servers + batcave01/ planet.fedoraproject.org +Purpose + provide easy setup of new planet groups on + planet.fedoraproject.org + +following: + +The Setup + +1. on batcave01:: + + cp -a configs/system/planet/grouptmpl configs/system/planet/newgroupname + +2. cd to the new directory + +3. run:: + + perl -pi -e "s/%%groupname/newgroupname/g" fpbuilder.conf base_config planet-group.cron templates/* + + replacing newgroupname with the groupname you want + +4. git add the whole dir + +5. edit ``manifests/services/planet.pp`` + +6. copy and paste everything from begging to end of the design team group, to use as a template. + +7. modify what you copied replacing design with the new group name + +8. save it + +9. check everything in + +10. run ansible on planet and check if it works + +Use +=== + +Tell the requester to then copy their current .planet file to +.planet.newgroupname. For example with the design team:: + + cp ~/.planet ~/.planet.design + +This will then show up on the new feed - +http://planet.fedoraproject.org/design/ + diff --git a/docs/sysadmin-guide/sops/privatefedorahosted.rst b/docs/sysadmin-guide/sops/privatefedorahosted.rst new file mode 100644 index 0000000..24ea3aa --- /dev/null +++ b/docs/sysadmin-guide/sops/privatefedorahosted.rst @@ -0,0 +1,69 @@ +.. title: Fedorahosted Private Tickets SOP +.. slug: infra-fedorahosted-private-tickets +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +=============================================== +Private fedorahosted tickets Infrastructure SOP +=============================================== + +Provides for users only viewing tickets they are involved with. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-hosted + +Location + + +Servers + hosted1 + +Purpose + Provides for users only viewing tickets they are involved with. + +Description +=========== + +Fedora Hosted Projects have the option of setting ticket permissions so +that only users involved with tickets can see them. This plugin requires +someone in sysadmin-hosted to set it up, and requires justification to +use. The only current implementation is a request tracking system at +[45]https://fedorahosted.org/famnarequests for tracking requests for North +American ambassadors since mailing addresses, etc will be put in there. + +Implementation +============== + +On hosted1:: + + sudo -u apache vim /srv/web/trac/projects//conf/trac.ini + +Add the following to the appropriate sections of ``trac.ini``:: + + [privatetickets] + group_blacklist = anonymous, authenticated + + [components] + privatetickets.* = enabled + + [trac] + permission_policies = PrivateTicketsPolicy, DefaultPermissionPolicy, LegacyAttachmentPolicy + +.. note:: For projects not currently using plugins, you'll have to add the + [components] section, and you'll need to add the permission_policies to + the [trac] section. + +Next, someone with TRAC_ADMIN needs to grant TICKET_VIEW_SELF (a new +permission) to authenticated. This permission allows users to view tickets +that they are either owner, CC, or reporter on. There are other options +more fully described at [46]the upstream site. + +Make sure that TICKET_VIEW is removed from anonymous, or else this plugin +will have no effect. + diff --git a/docs/sysadmin-guide/sops/publictest-dev-stg-production.rst b/docs/sysadmin-guide/sops/publictest-dev-stg-production.rst new file mode 100644 index 0000000..5bda0f5 --- /dev/null +++ b/docs/sysadmin-guide/sops/publictest-dev-stg-production.rst @@ -0,0 +1,88 @@ +.. title: Infrastucture Machine Classifications +.. slug: infra-machine-classes +.. date: 2011-10-30 +.. taxonomy: Contributors/Infrastructure + +===================================== +Fedora Infrastructure Machine Classes +===================================== + +Contact Information +=================== + +Owner + sysadmin-main, application developers +Contact + sysadmin-main +Location + Everywhere we have machines. +Servers + publictest, dev, staging, production +Purpose + Explain our use of various types of machines. + +Introduction +============ + +This document explains what are various types of machines are used for in +the life cycle of providing an application or resource. + +Public Test machines +==================== + +publictest instances are used for early investigation into a resource or application. +At this stage the application might not be packaged yet, and we want to see if it's +worth packaging and starting it on the process to be available in production. +These machines are accessable to anyone in the sysadmin-test group, and coordination +of use of instances is done on an ad-hock basis. These machines are re-installed +every cycle cleanly, so all work must be saved before this occurs. + +Authentication must not be against the production fas server. We have +fakefas.fedoraproject.org setup for these systems instead. + +.. note:: We're planning on merging publictest into the development servers. + Environment-wise they'll be mostly the same (one service per machine, a + group to manage them, no proxy interaction, etc) Service by service we'll + assign timeframes to the machines before being rebuilt, decommissioned if + no progress, etc. + +Development +=========== + +These instances are for applications that are packaged and being investigated for +deployment. Typically packages and config files are modified locally to get the +application or resource working. No caching or proxies are used. Access is to a +specific sysadmin group for that application or resource. These instances can +be re-installed on request to 'start over' getting configration ready. + +Some services hosted on dev systems are for testing new programs. These will +usually be associated with an RFR and have a limited lifetime before the new +service has to prove itself worthy of continued testing, to be moved on to +stg, or have the machine decommissioned. Other services are for developing +existing services. They are handy if the setup of the service is tricky or +lengthy and the person in charge wants to maintain the .dev server so that +newer contributors don't have to perform that setup in order to work on the +service. + +Authentication must not be against the production fas server. We have +fakefas.fedoraproject.org setup for these systems instead. + +.. note:: fakefas will be renamed fas01.dev at some point in the future + +Staging +======= + +These instances are used to integrate the application or resource into ansible +as well as proxy and caching setups. These instances should use ansible to deploy +all parts of the application or resource possible. Access to these instances +is only to a sysadmin group for that application, who may or may not have sudo +access. Permissions on stg mirror permissions on production (for instance, +sysadmin-web would have access to the app servers in stg the same as +production). + +Production +========== + +These instances are used to serve the ready for deployment application to the public. +All changes are done via ansible and access is restricted. Changes should be done +here only after testing in staging. diff --git a/docs/sysadmin-guide/sops/rdiff-backup.rst b/docs/sysadmin-guide/sops/rdiff-backup.rst new file mode 100644 index 0000000..1b92026 --- /dev/null +++ b/docs/sysadmin-guide/sops/rdiff-backup.rst @@ -0,0 +1,108 @@ +.. title: rdiff-backup Infrastructure SOP +.. slug: infra-rdiff +.. date: 2013-11-01 +.. taxonomy: Contributors/Infrastructure + +================ +rdiff-backup SOP +================ + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + Phoenix +Servers + backup03 and others +Purpose + backups of critical data + +Description +=========== + +We are now running a rdiff-backup of all our critical data on a daily basis. +This allows us to keep incremental changes over time as well has have a recent copy +in case of disaster recovery. + +The backups are run from backup03 every day at 22:10UTC as root. +All config is in ansible. + +The cron job checks out the ansible repo from git, then runs ansible-playbook with +the rdiff-backup playbook. This playbook looks at variables to decide which +machines and partitions to backup. + +- First, machines in the backup_clients group in inventory are operated on. + If a host is not in that group it is not backed up via rdiff-backup. + +- Next, any machines in the backup_clients group will have their /etc and /home + directories backed up by the server running rdiff-backup and using the rdiff-backup + ssh key to access the client. + +- Next, if any of the hosts in backup_clients have a variable set for + host_backup_targets, those directories will also be backed up in the same + manner as above with the rdiff-backup ssh key. + +For each backup an email will be sent to sysadin-backup-members with a summary. + +Backups are stored on a netapp volume, so in addition to the incrementals +that rdiff-backup provides there are netapp snapshots. This netapp volume is +mounted on /fedora_backups and is running dedup on the netapp side. + +Rebooting backup03 +================== + +When backup03 is rebooted, you must restart the ssh-agent and reload the +rdiff-backup ssh key into that agent so backups can take place. + +:: + + sudo -i + ssh-agent -s > sshagent + source sshgent + ssh-add .ssh/rdiff-backup-key + +Adding a new host to backups +============================ + +1. add the host to the backup_clients inventory group in ansible. + +2. If you wish to backup more than /etc and /home, add a variable to: + inventory/host_vars/fqdn like: + host_backup_targets: ['/srv'] + +3. On the client to be backed up, install rdiff-backup. + +4. On the client to be backed up, install the rdiff-backup ssh public key to + ``/root/.ssh/authorized_keys`` + It should be restricted from:: + + from="10.5.126.161,192.168.1.64" + + and command can be restricted to:: + + command="rdiff-backup --server --restrict-update-only" + +Restoring from backups +====================== +rdiff backup keeps a copy of the most recent version of files on disk, so if you +wish to restore the last backup copy, simply rsync from backup03. If you wish an older +incremental, see rdiff-backup man page for how to specify the exact time. + +Retention +========= + +Backups are currently kept forever, but likely down the road we will look at +pruning them some to match available space. + +Public_key: +=========== + +:: + + ssh-dss + AAAAB3NzaC1kc3MAAACBAJr3xqn/hHIXeth+NuXPu9P91FG9jozF3Q1JaGmg6szo770rrmhiSsxso/Ibm2mObqQLCyfm/qSOQRynv6tL3tQVHA6EEx0PNacnBcOV7UowR5kd4AYv82K1vQhof3YTxOMmNIOrdy6deDqIf4sLz1TDHvEDwjrxtFf8ugyZWNbTAAAAFQCS5puRZF4gpNbaWxe6gLzm3rBeewAAAIBcEd6pRatE2Qc/dW0YwwudTEaOCUnHmtYs2PHKbOPds0+Woe1aWH38NiE+CmklcUpyRsGEf3O0l5vm3VrVlnfuHpgt/a/pbzxm0U6DGm2AebtqEmaCX3CIuYzKhG5wmXqJ/z+Hc5MDj2mn2TchHqsk1O8VZM+1Ml6zX3Hl4vvBsQAAAIALDt5NFv6GLuid8eik/nn8NORd9FJPDBJxgVqHNIm08RMC6aI++fqwkBhVPFKBra5utrMKQmnKs/sOWycLYTqqcSMPdWSkdWYjBCSJ/QNpyN4laCmPWLgb3I+2zORgR0EjeV2e/46geS0MWLmeEsFwztpSj4Tv4e18L8Dsp2uB2Q== + root@backup03-rdiff-backup diff --git a/docs/sysadmin-guide/sops/requestforresources.rst b/docs/sysadmin-guide/sops/requestforresources.rst new file mode 100644 index 0000000..e1e4852 --- /dev/null +++ b/docs/sysadmin-guide/sops/requestforresources.rst @@ -0,0 +1,158 @@ +.. title: Infrastructure Request for Resources SOP +.. slug: infra-rfr +.. date: 2015-04-23 +.. taxonomy: Contributors/Infrastructure + +========================= +Request for resources SOP +========================= + +Contents +========= + +1. Contact Information +2. Introduction +3. Pre sponsorship +4. Planning +5. Development Instance +6. Staging Instance +7. Production deployment +8. Maintenance + +Contact Information +==================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + fedoraproject.org/wiki +Servers + dev, stg, production +Purpose + Explains the technical part of Request for Resources + +Introduction +============ + +Once a RFR has a sponsor and has been generally agreed to move forward, +this SOP will describe the technical parts of moving a RFR through the +various steps it needs from idea to implementation. Note that for high +level and non technical requirements, please see the main RFR page. + +A RFR will go through (at least) the following steps, but note that it can +be dropped, removed or reverted at any time in the process and that MUST +items MUST be provided before the next step is possible. + +Pre sponsorship +=============== + +Until a RFR has a sysadmin-main person who is sponsoring and helping with +the request, no further technical action should take place with this SOP. +Please see the main RFR SOP to aquire a sponsor and do the steps needed +before implementation starts. If your resource requires packages to be +complete, please finish your packaging work before moving forward with the +RFR (accepted/approved packages in Fedora/EPEL). If your RFR only has a +single person working on it, please gather at least another person before +moving forward. Single points of failure are to be avoided. + +Requirements for continuing: +---------------------------- + +* MUST have a RFR ticket. + +* MUST have the ticket assigned and accepted by someone in + infrastructure sysadmin-main group. + +Planning +======== + +Once a sponsor is aquired and all needed packages have been packaged and +are available in EPEL, we move on to the planning phase. In this phase +discussion should take place about the application/resource on the +infrastructure list and IRC. Questions about how the resource could be +deployed should be considered: + +* Should the resource be load balanced? + +* Does the resource need caching? + +* Can the resource live on it's own instance to separate it from more + critical services? + +* Who all is involved in maintaining and deploying the instance? + +Requirements for continuing: +---------------------------- + +* MUST discuss/note the app on the infrastructure mailing list and + answer feedback there. + +* MUST determine who is involved in the deployment/maintaining the + resource. + +Development Instance +==================== + +In this phase a development instance is setup for the resource. This +instance is a single virtual host running the needed OS. The RFR sponsor +will create this instance and also create a group 'sysadmin-resource' for +the resource, adding all responsible parties to the group. It's then up to +sysadmin-resource members to setup the resource and test it. Questions +asked in the planning phase should be investigated once the instance is +up. Load testing and other testing should be performed. Issues like +expiring old data, log files, acceptable content, packaging issues, +configuration, general bugs, security profile, and others should be +investigated. At the end of this step a email should be sent to the +infrastucture list explaining the testing done and inviting comment. + +Requirements for continuing: +---------------------------- + +* MUST have RFR sponsor sign off that the resource is ready to move to + the next step. + +* MUST have answered any outstanding questions on the infrastructure + list about the resource. Decisions about caching, load balancing and + how the resource would be best deployed should be determined. + +* MUST add any needed SOP's for the service. Should there be an Update + SOP? A troubleshooting SOP? Any other tasks that might need to be done + to the instance when those who know it well are not available? + +Staging Instance +================ + +The next step is to create a staging instance for the resource. In this +step the resource is fully added to Ansible/configuration management. The +resource is added to caching/load balancing/databases and tested in this +new env. Once initial deployment is done and tested, another email to the +infrastructure list is done to note that the resource is available in +staging. + +Requirements for continuing: +---------------------------- + +* MUST have sign off of RFR sponsor that the resource is fully + configured in Ansible and ready to be deployed. + +* MUST have a deployment schedule for going to production. This will + need to account for things like freezes and availability of + infrastructure folks. + +Production deployment +===================== + +Finally the staging changes are merged over to production and the resource +is deployed. + +Monitoring of the resource is added and confirmed to be effective. + +Maintenance +=========== + +The resource will then follow the normal rules for production. Honoring +freezes, updating for issues or security bugs, adjusting for capacity, +etc. + diff --git a/docs/sysadmin-guide/sops/resultsdb.rst b/docs/sysadmin-guide/sops/resultsdb.rst new file mode 100644 index 0000000..b4d917d --- /dev/null +++ b/docs/sysadmin-guide/sops/resultsdb.rst @@ -0,0 +1,56 @@ +.. title: Infrastucture resultsdb SOP +.. slug: infra-resultsdb +.. date: 2014-09-24 +.. taxonomy: Contributors/Infrastructure + +============= +resultsdb SOP +============= + +store results from taskotron tasks + +Contact Information +=================== + +Owner + Fedora QA Devel, Fedora Infrastructure Team +Contact + #fedora-qa, #fedora-admin, #fedora-noc +Location + PHX2 +Servers + resultsdb-dev01.qa, resultsdb-stg01.qa, resultsdb01.qa +Purpose + store results from taskotron tasks + +Architecture +============ + +ResultsDB as a system is made up of two parts - a results storage API and a +simple html based frontend for humans to view the results accessible through +that API (resultsdb and resultsdb_frontend). + +Deployment +========== + +The only part of resultsdb deployment that isn't currently in the ansible +playbooks is database initialization (disabled due to bug). + +Once the resultsdb app has been installed, initialize the database, run: +resultsdb init_db + +Updating +======== + +Database schema changes are not currently supported with resultsdb and the app +can be updated like any other web application: + +- update app +- restart httpd + +Backup +====== + +All important information in ResultsDB is stored in its database - backing up +that database is sufficient for backup and restoring that database from a +snapshot is sufficient for restoring. diff --git a/docs/sysadmin-guide/sops/reviewboard.rst b/docs/sysadmin-guide/sops/reviewboard.rst new file mode 100644 index 0000000..a257f71 --- /dev/null +++ b/docs/sysadmin-guide/sops/reviewboard.rst @@ -0,0 +1,161 @@ +.. title: ReviewBoard Infrastucture SOP +.. slug: infra-reviewboard +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +============================== +ReviewBoard Infrastructure SOP +============================== + +Review Board is a powerful web-based code review tool that offers +developers an easy way to handle code reviews. It scales well from small +projects to large companies and offers a variety of tools to take much of +the stress and time out of the code review process. + +Contents +-------- + +1. Contact Information +2. File Locations +3. Troubleshooting and Resolution + + * Restarting + +4. Create a new repository in ReviewBoard + + * Creating a new git repository + * Creating a new bzr repository + * Create a default reviewer for a repository + +Contact Information +------------------- + +Owner: + Fedora Infrastructure Team + +Contact: + #fedora-admin, sysadmin-main, sysadmin-hosted + +Location: + ServerBeach + +Servers: + hosted[1-2] + +Purpose: + Provide our fedorahosted users a way to review code. + +File Locations +============== +Main Config File: + hosted[1-2]:/srv/reviewboard/conf/settings_local.py + +ReviewBoard: + hosted[1-2]:/etc/httpd/conf.d/fedorahosted.org/reviewboard.conf + +Upstream: + https://fedorahosted.org/reviewboard/ + +Troubleshooting and Resolution +============================== + +Restarting +---------- + +After an update, to restart reviewboard just restart apache. Doing a +service httpd stop and then a service httpd start should do it. + +Create a new repository in ReviewBoard +====================================== + +Creating a new git repository +----------------------------- + +1. Enter the admin interface. If you have admin privilege, a link will be + visible in the upper-right corner of the dashboard. +2. In the admin dashboard click "Add" next to "Repositories" +3. For the name, enter the Fedora Hosted project short name. (e.g. if the + project is [53]https://fedorahosted.org/sssd, then the repository name + should be sssd) +4. "Show this repository" must be checked. +5. Hosting service is "Custom" +6. Repository type is Git +7. Path should be /srv/git/project_short_name.git + (e.g. /srv/git/sssd.git) +8. Mirror path should be + git://git.fedorahosted.org/git/project_short_name.git + +.. note:: Mirror path is used by client tools such as post-review to + determine to which repository a submission belongs + +9. Raw file URL mask should be left blank +10. Username and Password should both be left blank +11. The bug tracker URL may vary from project to project, but if they are + using the Fedora Hosted Trac bugtracker, it should be + + * Type: Trac + * Bug Tracker URL: [54]https://fedorahosted.org/project_short_name + (e.g. [55]https://fedorahosted.org/sssd) + +12. Do not set a Bug Tracker URL + +Creating a new bzr repository +----------------------------- +1. Go to the admin dashboard to [56]add a new repository. +2. For the name, enter the Fedora Hosted project short name. (e.g. if the + project is [57]https://fedorahosted.org/kitchen, then the repository + name should be kitchen) +3. "Show this repository" must be checked. +4. Hosting service is "Custom" +5. Repository type is Bazaar +6. Path should be /srv/git/project_short_name/branch_name + (e.g. /srv/bzr/kitchen/devel) -- reviewboard doesn't understand how to work + with repository conventions; it just works on branches. +7. Mirror path should be + bzr://bzr.fedorahosted.org/bzr/project_short_name/branch_name + +.. note:: Mirror path is used by client tools such as post-review to + determine to which repository a submission belongs + +8. Username and Password should both be left blank +9. The bug tracker URL may vary from project to project, but if they are + using the Fedora Hosted Trac bugtracker, it should be + + * Type: Trac + * Bug Tracker URL: [58]https://fedorahosted.org/project_short_name + (e.g. [59]https://fedorahosted.org/kitchen) + +10. Do not set a Bug Tracker URL + +Create a default reviewer for a repository +------------------------------------------ + +Reviews should be sent to the project development mailing list unless +otherwise requested. + +1. Enter the admin interface. If you have admin privilege, a link will be + visible in the upper-right corner of the dashboard. +2. In the admin dashboard click "Add" next to "Review Groups" +3. Enter the following values: + + * Name: The project short name + * Display Name: project_short_name Review Group + * Mailing List: Development discussion list for the project + +4. Do not select any users +5. Return to the main admin dashboard and click on "Add" next to "Default + Reviewers" +6. Enter the following values: + + * Name: Something unique and sensible + * File Regular Expression: enter '.*' (without the quotes) + +.. note:: This means that by default, the mailing list should receive + email for reviews of all files in the repository + +7. Under "Default groups", select the group you created above and click + the arrow pointing right. +8. Do not select any default people +9. Under "Repositories", select the repository added above and click the + arrow pointing right. +10. Save your changes. diff --git a/docs/sysadmin-guide/sops/scmadmin.rst b/docs/sysadmin-guide/sops/scmadmin.rst new file mode 100644 index 0000000..1ce632d --- /dev/null +++ b/docs/sysadmin-guide/sops/scmadmin.rst @@ -0,0 +1,294 @@ +.. title: Infrastructure SCM Admin SOP +.. slug: infra-scm-admin +.. date: 2015-01-01 +.. taxonomy: Contributors/Infrastructure + +============= +SCM Admin SOP +============= + +.. warning:: Most information here (probably 1.4 and later) is not updated for + pkgdb2 and therefore not correct anymore. + +Contents +======== + +1. Creating New Packages + + 1. Obtaining process-git-requests + 2. Prerequisites + 3. Running the script + 4. Steps for manual processing + + 1. Using pkgdb-client + 2. Using pkgdb2branch + 3. Update Koji + + 5. Helper Scripts + + 1. mkbranchwrapper + 2. setup_package + + 6. Pseudo Users for SIGs + +2. Deprecate Packages +3. Undeprecate Packages +4. Performing mass comaintainer requests + +Creating New Packages +===================== + +Package creation is mostly automatic and most details are handled by a script. + +Obtaining process-git-requests +------------------------------ + +The script is not currently packaged; lives in the rel-eng +git repository. You can check it out with:: + + git clone https://git.fedorahosted.org/git/releng + +and keep this up to date by running:: + + git pull + +occasionally somewhere in the checked-out tree occasionally before +processing new requests. + +The script lives in ``scripts/process-git-requests``. + +Prerequisites +------------- + +You must have the python-bugzilla and python-fedora packages installed. + +Before running process-git-requests, you should run:: + + bugzilla login + +The "Username" you will be prompted for is the email address attached to +your bugzilla account. This will obtain a cookie so that the script can +update bugzilla tickets. The cookie is good for quite some time (at least +a month); if you wish to remove it, delete the ``~/.bugzillacookies`` file. + +It is also advantageous to have your Fedora ssh key loaded so that you can +ssh into pkgs.fedoraproject.org without being prompted for a password. + +It perhaps goes without saying that you will need unfirewalled and +unproxied access to ports 22, 80 and 443 on various Fedora machines. + +Running the script +------------------ + +Simply execute the process-git-requests script and follow the prompts. It +can provide the text of all comments in the bugzilla ticket for inspection +and will perform various useful checks on the ticket and the included SCM +request. If there are warnings present, you will need to accept them +before being allowed to process the request. + +Note that the script only looks at the final request in a ticket; this +permits users to tack on a new request at any time and re-raise the +fedora-cvs flag. Packagers do not always understand this, though, so it is +necessary to read through the ticket contents to make sure that's the +request matches reality. + +After a request has been accepted, the script will create the package in +pkgdb (which may require your password) and attempt to log into the SCM +server to create the repository. If this does not succeed, the package +name is saved and when you finish processing a command line will be output +with instructions on creating the repositories manually. If you hit Crtl-C +or the script otherwise aborts, you may miss this information. If so, see +below for information on running pkgdb2branch.py on the SCM server; you +will need to run it for each package you created. + +Steps for manual processing +--------------------------- + +It is still useful to document the process of handling these requests +manually in the case that process-git-requests has issues. + +1. Check Bugzilla Ticket to make sure it looks ok +2. Add the package information to the packagedb with pkgdb-client +3. Use pkgdb2branch to create the branches on the cvs server + + .. warning:: Do not run multiple instances of pkgdb2branch in parallel! + This will cause them to fail due to mismatching 'modules' files. It's not + a good idea to run addpackage, mkbranchwrapper, or setup_package by + themselves as it could lead to packages that don't match their packagedb + entry. + +4. Update koji. + +Using pkgdb-client +`````````````````` + +Use pkgdb-client to update the pkgdb with new information. For instance, +to add a new package::: + + pkgdb-client edit -u toshio -o terjeros \ + -d 'Python module to extract EXIF information' \ + -b F-10 -b F-11 -b devel python-exif + +To update that package later and add someone to the initialcclist do:: + + pkgdb-client edit -u toshio -c kevin python-exif + +To add a new branch for a package:: + + pkgdb-client edit -u toshio -b F-10 -b EL-5 python-exif + +To allow provenpackager to edit a branch:: + + pkgdb-client edit -u toshio -b devel -a provenpackager python-exif + +To remove provenpackager commit rights on a branch:: + + pkgdb-client edit -u toshio -b EL-5 -b EL-4 -r provenpackager python-exif + +More options can be found by running ``pkgdb-client --help`` + +You must be in the cvsadmin group to use pkgdb-client. It can be run on a +non-Fedora Infrastructure box if you set the PACKAGEDBURL environment +variable to the public URL:: + + export PACKAGEDBURL=https://admin.fedoraproject.org/pkgdb + +.. note:: + You may be asked to CC fedora-perl-devel-list on a perl package. This can + be done with the username "perl-sig". This is presently a user, not a + group so it cannot be used as an owner or comaintainer, only for CC. + +Using pkgdb2branch +------------------ + +Use pkgdb2branch.py to create branches for a package. pkgdb2branch.py +takes a list of package names on the command line and creates the branches +that are specified in the packagedb. The script lives in /usr/local/bin on +the SCM server (pkgs.fedoraproject.org) and must be run there. + +For instance, ``pkgdb2branch.py python-exif qa-assistant`` will create branches +specified in the packagedb for python-exif and qa-assistant. + +pkgdb2branch can only be run from pkgs.fedoraproject.org. + +Update Koji +----------- + +Optionally you can synchronize pkgdb and koji by hand: it is done +automatically hourly by a cronjob. There is a script for this in the +admin/ directory of the CVSROOT module. + +Since dist-f13 and later inherit from dist-f12, and currently dist-f12 is +the basis of our stack, it's easiest to just call:: + + ./owner-sync-pkgdb dist-f12 + +Just run ``./owners-sync-pkgdb`` for usage output. + +This script requires that you have a properly configured koji client +installed. + +owner-sync-pkgdb requires the koji client libraries which are not +available on the cvs server. So you need to run this from one of your +machines. + +Helper Scripts +============== + +These scripts are invoked by the scripts above, doing some of the heavy +lifting. They should not ordinarily be called on their own. + +mkbranchwrapper +--------------- + +``/usr/local/bin/mkbranchwrapper`` is a shell script which takes a list of +packages and branches. For instance:: + + mkbranchwrapper foo bar EL-5 F-11 + +will create modules foo and bar for devel if they don't exist and branch +them for the other 4 branches passed to the script. If the devel branch +exists then it just branches. If there is no branches passed the module is +created in devel only. + +``mkbranchwrapper`` has to be run from cvs-int. + +.. important:: mkbranchwrapper is not used by any current programs. Use pkgdb2branch instead. + +setup_package +------------- + +``setup_package`` creates a new blank module in devel only. It can be run from +any host. To create a new package run:: + + setup_package foo + +setup_package needs to be called once for each package. it could be +wrapped in a shell script similar to:: + + #!/bin/bash + + PACKAGES="" + + for arg in $@; do + PACKAGES="$PACKAGES $arg" + done + + echo "packages=$PACKAGES" + + for package in $PACKAGES; do + ~/bin/setup_package $package + done + + +then call the script with all branches after it. + +.. note:: setup_package is currently called from pkgdb2branch. + +Pseudo Users for SIGs +--------------------- + +See [62]Package_SCM_admin_requests#Pseudo-users_for_SIGs for the current list. + +Deprecate Packages +------------------ + +Any packager can deprecate a package. click on the deprecate package +button for the package in the webui. There's currently no ``pkgdb-client`` +command to deprecate a package. + +Undeprecate Packages +-------------------- + +Any cvsadmin can undeprecate a package. Simply use pkgdb-client to assign +an owner and the package will be undeprecated:: + + pkgdb-client -o toshio -b devel qa-assistant + +As a cvsadmin you can also log into the pkgdb webui and click on the +unretire package button. Once clicked, the package will be orphaned rather +than deprecated. + +Performing mass comaintainer requests +------------------------------------- + +* Confirm that the requestor has 'approveacls' on all packages they wish + to operate on. If they do not, they MUST request the change via FESCo. + +* Mail maintainers/co-maintainers affected by the change to inform them + of who requested the change and why. + +* Download a copy of this script: + http://git.fedorahosted.org/git/?p=fedora-infrastructure.git;a=blob;f=scripts/pkgdb_bulk_comaint/comaint.py;hb=HEAD + +* Edit the script to have the proper package owners and package name + pattern. + +* Edit the script to have the proper new comaintainers. + +* Ask someone in ``sysadmin-web`` to disable email sending on bapp01 for the + pkgdb (following the instructions in comments in the script) + +* Copy the script to an infrastructure host (like cvs01) that can + contact bapp01 and run it. + diff --git a/docs/sysadmin-guide/sops/selinux.rst b/docs/sysadmin-guide/sops/selinux.rst new file mode 100644 index 0000000..e9362c3 --- /dev/null +++ b/docs/sysadmin-guide/sops/selinux.rst @@ -0,0 +1,124 @@ +.. title: SELinux Infrastructure SOP +.. slug: infra-selinux +.. date: 2012-03-19 +.. taxonomy: Contributors/Infrastructure + +========================== +SELinux Infrastructure SOP +========================== + +SELinux is a fundamental part of our Operating System but still has a +large learning curve and remains quite intimidating to both developers and +system administrators. Fedora's Infrastructure has been growing at an +unfathomable rate, and is full of custom software that needs to be locked +down. The goal of this SOP is to make it simple to track down and fix +SELinux policy related issues within Fedora's Infrastructure. + +Fully deploying SELinux is still an ongoing task, and can be tracked in +fedora-infrastructure [45]ticket #230. + +Contents +======== + +1. Contact Information +2. Step One: Realizing you have a problem +3. Step Two: Tracking down the violation +4. Step Three: Fixing the violation + + 1. Allowing ports + 2. Toggling an SELinux boolean + 3. Setting custom context + 4. Deploying custom policy modules + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main, sysadmin-web groups +Purpose + To ensure that we are able to fully wield the power of SELinux + within our infrastructure. + +Step One: Realizing you have a problem +======================================= + +If you are trying to find a specific problem on a host go look in the +audit.log per-host on our cental log server. See the syslog SOP for +more information. + +Step Two: Tracking down the violation +===================================== + +Generate SELinux policy allow rules from logs of denied operations. This +is useful for getting a quick overview of what has been getting denied on +the local machine:: + + audit2allow -la + +You can obtain more detailed audit messages by using ausearch to get the +most recent violations:: + + ausearch -m avc -ts recent + +Again -see the syslog SOP for more information here. + +Step Three: Fixing the violation +================================ + +Below are examples of using our current ansible configuration to make +SELinux deployment changes. These constructs are currently home-brewed, +and do not exist in upstream Ansible. For these functions to work, you must +ensure that the host or servergroup is configured with 'include selinux', +which will enable SELinux in permissive mode. Once a host is properly +configured, this can be changed to 'include selinux-enforcing' to enable +SELinux Enforcing mode. + +.. note:: + Most services have $service_selinux manpages that are automatically generated from policy. + +Toggling an SELinux boolean +--------------------------- + +SELinux booleans, which can be viewed by running `semanage boolean -l`, +can easily be configured using the following syntax within your ansible +configuration.:: + + seboolean: name=httpd_can_network_connect_db state=yes persistent=yes + +Setting custom context +---------------------- + +Our infrastructure contains many custom applications, which may utilize +non-standard file locations. These issues can lead to trouble with +SELinux, but they can easily be resolved by setting custom file context.:: + + "file: path=/var/tmp/l10n-data recurse=yes setype=httpd_sys_content_t" + + +Fixing odd errors from the logs +------------------------------- +If you see messages like this in the log reports:: + + restorecon:/etc/selinux/targeted/contexts/files/file_contexts: Multiple same / specifications for /home/fedora. + matchpathcon: / /etc/selinux/targeted/contexts/files/file_contexts: Multiple same / / specifications for /home/fedora. + +Then it is likely you have an overlapping filecontext in your local selinux context configuration - in this case likely one added by ansible accidentally. + +To find it run this:: + + semanage fcontext -l | grep /path/being/complained/about + +sometimes it is just an ordering problem and reversing them solves it +other times it is just an overlap, period. + +look at the context and delete the one you do not want or reorder. + +To delete run:: + + semanage fcontext -d '/entry/you/wish/to/delete' + +This just removes that filecontext - no need to worry about files being deleted. + +Then rerun the triggering command and see if the problem is solved. diff --git a/docs/sysadmin-guide/sops/sigul-upgrade.rst b/docs/sysadmin-guide/sops/sigul-upgrade.rst new file mode 100644 index 0000000..14b8897 --- /dev/null +++ b/docs/sysadmin-guide/sops/sigul-upgrade.rst @@ -0,0 +1,73 @@ +.. title: Sigul Servers Maintenance SOP +.. slug: infra-sigul-mainenance +.. date: 2015-02-04 +.. taxonomy: Contributors/Infrastructure +============================== +Sigul servers upgrades/reboots +============================== + +Fedora currently has 1 sign-bridge and 2 sign-vault machines for primary, there +is a similar setup for secondary architectures. When upgrading or rebooting +these machines, some special steps must be taken to ensure everything is +working as expected. + +Contact Information +------------------- + +Owner + Fedora Release Engineering +Contact + #fedora-admin, #fedora-noc +Servers + sign-vault03, sign-vault04, sign-bridge02, secondary-bridge01.qa +Purpose + Upgrade or restart sign servers + +Description +----------- +0. Coordinate with releng on timing. Make sure no signing is happening, and +none is planned for a bit. + +Sign-bridge02, secondary-bridge01.qa: + + 1. Apply updates or changes + + 2. Reboot virtual instance + + 3. Once it comes back, start the sigul_bridge service and enter empty password. + +Sign-vault03/04: + + 1. Determine which server is currently primary. It's the one that has the + floating ip address for sign-vault02 on it. + + 2. Login to the non primary server via serial or management console. + (There is no ssh access to these servers) + + 3. Take a lvm snapshot:: + + lvcreate --size 5G --snapshot --name YYYMMDD /dev/mapper/vg_signvault04-lv_root + + Replace YYMMDD with todays year, month, day and the vg with the correct name + Then apply updates. + + 4. Confirm the server comes back up ok, login to serial console or management + console and start the sigul_server process. Enter password when prompted. + + 5. On the primary server, down the floating ip address:: + + ip addr del 10.5.125.75 dev eth0 + + 6. On the secondary server, up the floating ip address:: + + ip addr add 10.5.125.75 dev eth0 + + 7. Have rel-eng folks sign some packages to confirm all is working. + + 8. Update/reboot the old primary server and confirm it comes back up ok. + +.. note:: Changes to database + + When making any changes to the database (new keys, etc), it's important to + sync the data from the primary to the secondary server. This process is + currently manual. diff --git a/docs/sysadmin-guide/sops/sshaccess.rst b/docs/sysadmin-guide/sops/sshaccess.rst new file mode 100644 index 0000000..053f2f6 --- /dev/null +++ b/docs/sysadmin-guide/sops/sshaccess.rst @@ -0,0 +1,148 @@ +.. title: SSH Access SOP +.. slug: infra-ssh-access +.. date: 2012-09-24 +.. taxonomy: Contributors/Infrastructure + +============================= +SSH Access Infrastructure SOP +============================= + +Contents +======== + +1. Contact Information +2. Introduction +3. SSH configuration +4. SSH Agent forwarding +5. Troubleshooting + +Contact Information +=================== + +Owner + sysadmin-main +Contact + #fedora-admin or admin@fedoraproject.org +Location + PHX2 +Servers + All PHX2 and VPN Fedora machines +Purpose + Access via ssh to Fedora project machines. + +Introduction +============ + +This page will contain some useful instructions about how you can safely +login into Fedora PHX2 machines successfully using a public key +authentication. As of 2011-05-27, all machines require a SSH key to +access. Password authentication will no longer work. Note that this SOP +has nothing to do with actually gaining access to specific machines. For +that you MUST be in the correct group for shell access to that machine. +This SOP simply describes the process once you do have valid and +appropriate shell access to a machine. + +SSH configuration +================= +First of all: (on your local machine):: + + vi ~/.ssh/config + +.. note:: + This file, and any keys, need to be chmod 600, or you will get a "Bad owner or + permissions" error. The .ssh directory must be mode 700. + +then, add the following:: + + Host bastion.fedoraproject.org + User FAS_USERNAME + ProxyCommand none + ForwardAgent no + Host *.phx2.fedoraproject.org *.qa.fedoraproject.org 10.5.125.* 10.5.126.* 10.5.127.* *.vpn.fedoraproject.org + User FAS_USERNAME + ProxyCommand ssh -W %h:%p bastion.fedoraproject.org + +One slight annoyance with this method is that you must include the +.phx2.fedoraproject.org part when you SSH to Fedora machines in order for +the connection to be tunneled through bastion. + +To avoid this You can add aliases for each of the Fedora machines you login to by +modifying the second Host line:: + + Host *.phx2.fedoraproject.org 10.5.125.* 10.5.126.* 10.5.127.* *.vpn.fedoraproject.org batcave01 noc01 # list all hosts here + +How ProxyCommand works? + +A connection is established to the bastion host + ++-------+ +--------------+ +| you | ---ssh---> | bastion host | ++-------+ +--------------+ +Bastion host establish a connction to the target server + ++--------------+ +--------+ +| bastion host | -------> | server | ++--------------+ +--------+ +Your client then connects through the Bastion and reaches the target server + ++-----+ +--------------+ +--------+ +| you | | bastion host | | server | +| | ===ssh=over=bastion============================> | | ++-----+ +--------------+ +--------+ + +PuTTY SSH configuration +======================= + +You can configure Putty the same way by doing this: + +0.In the session section type batcave01.phx2.fedoraproject.org port 22 +1.In Connection:Data enter your FAS_USERNAME +2.In Connection:Proxy add the proxy settings +.ProxyHostname is bastion.fedoraproject.org +.Port 22 +.Username FAS_USERNAME +.Proxy Command plink %user@%proxyhost %host:%port +3.In Connection:SSH:Auth remember to insert the same key file for authentication you have used on FAS profile + +SSH Agent forwarding +==================== + +You should normally have:: + + ForwardAgent no + +For Fedora hosts (this is the default in OpenSSH). You can override this +on a per-session basis by using '-A' with ssh. SSH agents could be misused +if you connect to a compromised host with forwarding on (the attacker can +use your agent to authenticate them to anything you have access to as long +as you are logged in). Additionally, if you do need SSH agent forwarding +(say for copying files between machines), you should remember to logout as +soon as you are done to not leave your agent exposed. + +Troubleshooting +=============== + +* 'channel 0: open failed: administratively prohibited: open failed' + + If you receive this message for a machine proxied through bastion, then + bastion was unable to connect to the host. This most likely means that + tried to SSH to a nonexistent machine. You can debug this by trying to + connect to that machine from bastion. + +* if your local username is different from the one registered in FAS, + please remember to set up a User variable (like above) where you + specify your FAS username. If that's missing SSH will try to login by + using your local username, thus it will fail. + +* ssh -vv is very handy for debugging what sections are matching and + what are not. + +* If you get access denied several times in a row, please consult with + #fedora-admin. If you try too many times with an invalid config your + IP could be added to denyhosts. + +* If you are running an OpenSSH version less than 5.4, then the -W + option is not available. In that case, use the following ProxyCommand + line instead:: + + ProxyCommand ssh -q bastion.fedoraproject.org exec nc %h %p diff --git a/docs/sysadmin-guide/sops/sshknownhosts.rst b/docs/sysadmin-guide/sops/sshknownhosts.rst new file mode 100644 index 0000000..380cd90 --- /dev/null +++ b/docs/sysadmin-guide/sops/sshknownhosts.rst @@ -0,0 +1,34 @@ +.. title: SSH Known Hosts Infrastructure SOP +.. slug: infra-ssh-known-hosts +.. date: 2015-04-23 +.. taxonomy: Contributors/Infrastructure + +================================== +SSH known hosts Infrastructure SOP +================================== + +Provides Known Hosts file that is globally deployed and publicly available at +https://admin.fedoraproject.org/ssh_known_hosts + +Contact Information +=================== +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin group +Location: + all +Servers: + all +Purpose: + Provides Known Hosts file that is globally deployed. + + +Adding a host alias to the ssh_known_hosts +=========================================== + +If you need to add a host alias to a host in ssh_known_hosts simply +go to the dir for the host in infra-hosts and add a file named host_aliases +to the git repo in that dir. Put one alias per line and save. + +Then the next time fetch-ssh-keys runs it will add those aliases to known hosts. diff --git a/docs/sysadmin-guide/sops/staging-infra.rst b/docs/sysadmin-guide/sops/staging-infra.rst new file mode 100644 index 0000000..b813815 --- /dev/null +++ b/docs/sysadmin-guide/sops/staging-infra.rst @@ -0,0 +1,142 @@ +.. title: Infrastructure Staging SOP +.. slug: infra-staging +.. date: 2012-04-18 +.. taxonomy: Contributors/Infrastructure + +=========== +Staging SOP +=========== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-main +Location + Mostly in PHX2 +Servers + *stg* +Purpose + Staging environment to test changes to apps and create initial Ansible configs. + +Introduction +============ + +Fedora uses a set of staging servers for several purposes: + +* When applications are initially being deployed, the staging version of + those applications are setup with a staging server that is used to create the + initial Ansible configuration for the application/service. + +* Established applications/services use staging for testing. This testing includes: + + - Bugfix updates + - Configuration changes managed by Ansible + - Upstream updates to dependent packages (httpd changes for example) + +Goals +===== + +The staging servers should be self contained and have all the needed databases and such +to function. At no time should staging resources talk to production instances. We use firewall +rules on our production servers to make sure no access is made from staging. + +Staging instances do often use dumps of production databases and data, and +thus access to resources in staging should be controlled as it is in +production. + +DNS and naming +=============== + +All staging servers should be in the ``stg.phx2.fedoraproject.org`` domain. +``/etc/hosts`` files are used on stg servers to override dns in cases where staging resources +should talk to the staging version of a service instead of the production one. +In some cases, one staging server may be aliased to several services or applications that +are on different machines in production. + +Syncing databases +================= + +Syncing FAS +----------- + +Sometimes you want to resync the staging fas server with what's on +production. To do that, dump what's in the production db and then import +it into the staging db. Note that resyncing the information will remove +any of the information that has been added to the staging fas servers. So +it's good to mention that you're doing this on the infra list or to people +who you know are working on the staging fas servers so they can either +save their changes or ask you to hold off for a while. + +On db01:: + + ssh db01 + sudo -u postgres pg_dump -C fas2 |xz -c fas2.dump.xz + scp fas2.dump.xz db02.stg: + +On fas01.stg (postgres won't drop the database if something is accessing it) +(ATM, fas in staging is not load balanced so we only have to do this on one server):: + + $ sudo /etc/init.d/httpd stop + +On db02.stg:: + + $ echo 'drop database fas2' |sudo -u postgres psql + $ xzcat fas2.dump.xz | sudo -u postgres psql + +On fas01.stg:: + + $ sudo /etc/init.d/httpd start + +Other databases behave similarly. + +External access +=============== + +There is http/https access from the internet to staging instances to allow testing. +Simply replace the production resource domain with stg.fedoraproject.org and +it should go to the staging version (if any) of that resource. + +Ansible and Staging +=================== + +All staging machine configurations is now in the same branch +as master/production. + +There is a 'staging' environment - Ansible variable "env" is equal to +"staging" in playbooks for staging things. This variable can be used +to differentiate between producion and staging systems. + +Workflow for staging changes +============================ + +1. If you don't need to make any Ansible related config changes, don't +do anything. (ie, a new version of an app that uses the same config +files, etc). Just update on the host and test. + +2. If you need to make Ansible changes, either in the playbook of the +application or outside of your module: + + - Make use of files ending with .staging (see resolv.conf in global for + an example). So, if there's persistent changes in staging from + production like a different config file, use this. + + - Conditionalize on environment:: + + - name: your task + ... + when: env == "staging" + + - name: production-only task + ... + when: env != "staging" + +- These changes can stay if they are helpful for further testing down + the road. Ideally normal case is that staging and production are + configure in the same host group from the same Ansible playbook. + +Time limits on staging changes +============================== + +There is no hard limit on time spent in staging, but where possible we should +limit the time in staging so we are not carrying changes from production for a +long time and possible affecting other staging work. diff --git a/docs/sysadmin-guide/sops/staging.rst b/docs/sysadmin-guide/sops/staging.rst new file mode 100644 index 0000000..3002414 --- /dev/null +++ b/docs/sysadmin-guide/sops/staging.rst @@ -0,0 +1,146 @@ +.. title: Infrastructure Ansible Staging SOP +.. slug: infra-staging-ansible +.. date: 2015-04-23 +.. taxonomy: Contributors/Infrastructure + +=================== +Ansible Staging SOP +=================== + +Owner + Fedora Infrastructure Team = + +Contact + #fedora-admin, sysadmin-main = + +Location + Mostly in PHX2 = + +Servers + *stg* = + +Purpose + Staging environment to test changes to apps and create initial Ansible configs. = + +Introduction +======================= + +Fedora uses a set of staging servers for several purposes: + +* When applications are initially being deployed, the staging version of + those applications are setup with a staging server that is used to create the + initial Ansible configuration for the application/service. + +* Established applications/services use staging for testing. This testing includes: + + - Bugfix updates + - Configuration changes managed by Ansible + - Upstream updates to dependent packages (httpd changes for example) + +Goals +===== + +The staging servers should be self contained and have all the needed databases and such +to function. At no time should staging resources talk to production instances. We use firewall +rules on our production servers to make sure no access is made from staging. + +Staging instances do often use dumps of production databases and data, and +thus access to resources in staging should be controlled as it is in +production. + +DNS and naming +============== + +All staging servers should be in the ``stg.phx2.fedoraproject.org`` domain. +/etc/hosts files are used on stg servers to override dns in cases where staging resources +should talk to the staging version of a service instead of the production one. +In some cases, one staging server may be aliased to several services or applications that +are on different machines in production. + +Syncing databases +================= + +Syncing FAS +----------- +Sometimes you want to resync the staging fas server with what's on +production. To do that, dump what's in the production db and then import +it into the staging db. Note that resyncing the information will remove +any of the information that has been added to the staging fas servers. So +it's good to mention that you're doing this on the infra list or to people +who you know are working on the staging fas servers so they can either +save their changes or ask you to hold off for a while. + +On db01:: + + $ ssh db01 + $ sudo -u postgres pg_dump -C fas2 |xz -c fas2.dump.xz + $ scp fas2.dump.xz db02.stg: + +On fas01.stg (postgres won't drop the database if something is accessing it) +(ATM, fas in staging is not load balanced so we only have to do this on one server):: + + $ sudo /etc/init.d/httpd stop + +On db02.stg:: + + $ echo 'drop database fas2' |sudo -u postgres psql + $ xzcat fas2.dump.xz | sudo -u postgres psql + +On fas01.stg:: + + $ sudo /etc/init.d/httpd start + +Other databases behave similarly. + +External access +=============== + +There is http/https access from the internet to staging instances to allow testing. +Simply replace the production resource domain with stg.fedoraproject.org and +it should go to the staging version (if any) of that resource. + +Ansible and Staging +=================== + +All staging machine configurations is now in the same branch +as master/production. + +There is a 'staging' environment - Ansible variable "env" is equal to +"staging" in playbooks for staging things. This variable can be used +to differentiate between producion and staging systems. + +Workflow for staging changes +============================ + +1. If you don't need to make any Ansible related config changes, don't + do anything. (ie, a new version of an app that uses the same config + files, etc). Just update on the host and test. + +2. If you need to make Ansible changes, either in the playbook of the + application or outside of your module: + + - Make use of files ending with .staging (see resolv.conf in global for + an example). So, if there's persistent changes in staging from + production like a different config file, use this. + + - Conditionalize on environment: + + - name: your task:: + ... + when: env == "staging" + + - name: production-only task:: + + ... + when: env != "staging" + + - These changes can stay if they are helpful for further testing down + the road. Ideally normal case is that staging and production are + configure in the same host group from the same Ansible playbook. + +Time limits on staging changes +=============================== + +There is no hard limit on time spent in staging, but where possible we should +limit the time in staging so we are not carrying changes from production for a +long time and possible affecting other staging work. diff --git a/docs/sysadmin-guide/sops/stagingservers.rst b/docs/sysadmin-guide/sops/stagingservers.rst new file mode 100644 index 0000000..d206e07 --- /dev/null +++ b/docs/sysadmin-guide/sops/stagingservers.rst @@ -0,0 +1,144 @@ +.. title: Infrastucture Staging Server SOP +.. slug: infra-staging-sop +.. date: 2012-04-18 +.. taxonomy: Contributors/Infrastructure + +==================================== +Fedora Infrastructure Staging Hosts +==================================== + +Owner + Fedora Infrastructure Team + +Contact + #fedora-admin, sysadmin-main + +Location + Mostly in PHX2 + +Servers + *stg* + +Purpose + Staging environment to test changes to apps and create initial Ansible configs. + +Introduction +============ +Fedora uses a set of staging servers for several purposes: + +* When applications are initially being deployed, the staging version of + those applications are setup with a staging server that is used to create the + initial Ansible configuration for the application/service. + +* Established applications/services use staging for testing. This testing includes: + + - Bugfix updates + - Configuration changes managed by Ansible + - Upstream updates to dependent packages (httpd changes for example) + +Goals +===== + +The staging servers should be self contained and have all the needed databases and such +to function. At no time should staging resources talk to production instances. We use firewall +rules on our production servers to make sure no access is made from staging. + +Staging instances do often use dumps of production databases and data, and +thus access to resources in staging should be controlled as it is in +production. + +DNS and naming +================== + +All staging servers should be in the 'stg.phx2.fedoraproject.org' domain. +/etc/hosts files are used on stg servers to override dns in cases where staging resources +should talk to the staging version of a service instead of the production one. +In some cases, one staging server may be aliased to several services or applications that +are on different machines in production. + +Syncing databases +================== + +Syncing FAS +------------ +Sometimes you want to resync the staging fas server with what's on +production. To do that, dump what's in the production db and then import +it into the staging db. Note that resyncing the information will remove +any of the information that has been added to the staging fas servers. So +it's good to mention that you're doing this on the infra list or to people +who you know are working on the staging fas servers so they can either +save their changes or ask you to hold off for a while. + +On db01:: + + $ ssh db01 + $ sudo -u postgres pg_dump -C fas2 |xz -c fas2.dump.xz + $ scp fas2.dump.xz db02.stg: + +On fas01.stg (postgres won't drop the database if something is accessing it) +(ATM, fas in staging is not load balanced so we only have to do this on one server):: + + $ sudo /etc/init.d/httpd stop + +On db02.stg:: + + $ echo 'drop database fas2' |sudo -u postgres psql + $ xzcat fas2.dump.xz | sudo -u postgres psql + +On fas01.stg:: + + $ sudo /etc/init.d/httpd start + +Other databases behave similarly. + +External access +================== + +There is http/https access from the internet to staging instances to allow testing. +Simply replace the production resource domain with stg.fedoraproject.org and +it should go to the staging version (if any) of that resource. + +Ansible and Staging +==================== + +All staging machine configurations is now in the same branch +as master/production. + +There is a 'staging' environment - Ansible variable "env" is equal to +"staging" in playbooks for staging things. This variable can be used +to differentiate between producion and staging systems. + +Workflow for staging changes +============================ + +1. If you don't need to make any Ansible related config changes, don't + do anything. (ie, a new version of an app that uses the same config + files, etc). Just update on the host and test. + +2. If you need to make Ansible changes, either in the playbook of the + application or outside of your module: + + - Make use of files ending with .staging (see resolv.conf in global for + an example). So, if there's persistent changes in staging from + production like a different config file, use this. + + - Conditionalize on environment:: + + - name: your task + ... + when: env == "staging" + + - name: production-only task + ... + when: env != "staging" + + - These changes can stay if they are helpful for further testing down + the road. Ideally normal case is that staging and production are + configure in the same host group from the same Ansible playbook. + +Time limits on staging changes +=============================== + +There is no hard limit on time spent in staging, but where possible we should +limit the time in staging so we are not carrying changes from production for a +long time and possible affecting other staging work. diff --git a/docs/sysadmin-guide/sops/status-fedora.rst b/docs/sysadmin-guide/sops/status-fedora.rst new file mode 100644 index 0000000..4d91874 --- /dev/null +++ b/docs/sysadmin-guide/sops/status-fedora.rst @@ -0,0 +1,87 @@ +.. title: Fedora Status Service SOP +.. slug: infra-fedora-status +.. date: 2015-04-23 +.. taxonomy: Contributors/Infrastructure + +=========================== +Fedora Status Service - SOP +=========================== + +Fedora-Status is the software that generates the page at +http://status.fedoraproject.org/. This page should be kept +up to date with the current status of the services ran by +Fedora Infrastructure. + +This page is hosted at an OpenShift instance. +The upstream repository is fedora-status on FedoraHosted.org. + +Contact Information +------------------- + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, #fedora-noc +Servers: + An OpenShift instance +Purpose: + Give status information to users about the current + status of our public services. +Upstream: + http://git.fedorahosted.org/git/fedora-status.git + +How it works +------------ +To keep this website as stable as can be, the page is +generated at the time of upload by OpenShift. + +As soon as you push to the OpenShift repo, a build hook +will create the HTML page. + +Only members of sysadmin-noc and sysadmin-main can update +the status website. + +Updating the page +----------------- +1. Check out the repo at:: + + ssh://bab5ba6eb9b94f2083fdeefc5e87309b@status-fedora2.rhcloud.com/~/git/status.git/ + +2. cd status +3. Run ./manage.py + +manage.py takes 3+ arguments:: + +[status] "[short summary message]" [service] ([service] .....) + +[service] values can be found on http://status.fedoraproject.org/ on the RIGHT +SIDE of the header of each box. Examples are: 'wiki', 'pkgdb', and 'fedmsg'. + +You can use "-" (dash) to imply "All services" + +It accepts any number of additional services + +[status] should be:: + +'major' - A major service ourage. +'minor' - A minor service outage (e.g. limited/geographical outage) +'scheduled' - The current downtime to this service is related to a scheduled outage +'good' - Everything is fine and the service is functioning 100%. + +[short summary message] is what appears in the body of the box and should tell +users what is happening/why the service is down. + +You can use "-" (dash) to imply "Everything seems to be working." as the +status. + +Examples:: + +./manage.py major "We're performing maintenance on the wiki database" wiki +./manage.py zodbot minor "Some IRC channels are having issues doing XYZ." zodbot +./manage.py good - - # Set all services to good/default. +./manage.py good - wiki # Set wiki status to 'good' with the default message. +./manage.py good - wiki zodbot # Set both wiki and zodbot to good with default message. + +You can use the --general-info flag to set a "global" message, which appears +under the main status bar at the top of the page. Use this for big events that +effect all services, or to announce things like upcoming outages. diff --git a/docs/sysadmin-guide/sops/syslog.rst b/docs/sysadmin-guide/sops/syslog.rst new file mode 100644 index 0000000..16e716b --- /dev/null +++ b/docs/sysadmin-guide/sops/syslog.rst @@ -0,0 +1,163 @@ +.. title: Log Infrastructure SOP +.. slug: infra-syslog +.. date: 2014-09-01 +.. taxonomy: Contributors/Infrastructure + +====================== +Log Infrastructure SOP +====================== + +Logs are centrally referred to our loghost and managed from there by +rsyslog to create several log outputs. + +Epylog provides twice-daily log reports of activities on our systems. +It runs on our central loghost and generates reports on all systems +centrally logging. + +Contact Information +=================== + +Owner: + Fedora Infrastructure Team +Contact: + #fedora-admin, sysadmin-main +Location: + Phoenix +Servers: + log01.phx2.fedoraproject.org +Purpose: + Provides our central logs and reporting + + +Essential data/locations: +========================= + +* Logs compiled using rsyslog on log01 into a single set of logs for all + systems:: + + /var/log/merged/ + + These logs are rotated every day and kept for only 2 days. This set of logs + is only used for immediate analysis and more trivial 'tailing' of + the log file to watch for events. + +* Logs for each system separately in ``/var/log/hosts`` + + These logs are maintained forever, practically, or for as long as we + possibly can. They are broken out into a ``$hostname/$YEAR/$MON/$DAY`` directory + structure so we can locate a specific day's log immediately. + +* Log reports generated by epylog: + Log reports generated by epylog are outputted to /srv/web/epylog/merged + + The reports are accessible via a web browser from https://admin.fedoraproject.org/epylog/merged/ + + This path requires a username and a password to access. To add your username + and password you must first join the sysadmin-logs group then login to + ``log01.phx2.fedoraproject.org`` and run this command:: + + htpasswd -m /srv/web/epylog/.htpasswd $your_username + + when prompted for a password please input a password which is NOT YOUR + FEDORA ACCOUNT SYSTEM PASSWORD. + +.. important:: + + Let's say that again to be sure you got it: + + DO _NOT_ HAVE THIS BE THE SAME AS YOUR FAS PASSWORD + +Configs: +======== + +Epylog configs are controlled by ansible - please see the ansible epylog +module for more details. Specifically the files in ``roles/epylog/files/merged/`` + + +Generating a one-off epylog report: +----------------------------------- +If you wish to generate a specific log report you will need to run the +following command on log01:: + + sudo /usr/sbin/epylog -c /etc/epylog/merged/epylog.conf --last 5h + +You can replace '5h' with other time measurements to control the amount of +time you want to view from the merged logs. This will mail a report +notification to all the people in the sysadmin-logs group. + + +Audit logs, centrally: +---------------------- +We've taken the audit logs and enabled our rsyslogd on the hosts to relay +the audit log contents to our central log server. + +Here's how we did that: + +1. modify the selinux policy so that rsyslogd can read the file(s) in + ``/var/log/audit/audit.log`` + + BEGIN Selinux policy module:: + + module audit_via_syslog 1.0; + + require { + type syslogd_t; + type auditd_log_t; + class dir { search }; + class file { getattr read open }; + + } + + #============= syslogd_t ============== + allow syslogd_t auditd_log_t:dir search; + allow syslogd_t auditd_log_t:file { getattr read open }; + + END selinux policy module + +2. add config to rsyslog on the clients to repeatedly send all changes + to their audit.log file to the central syslog server as local6:: + + # monitor auditd log and send out over local6 to central loghost + $ModLoad imfile.so + + # auditd audit.log + $InputFileName /var/log/audit/audit.log + $InputFileTag tag_audit_log: + $InputFileStateFile audit_log + $InputFileSeverity info + $InputFileFacility local6 + $InputRunFileMonitor + + then modify your emitter to the syslog server to send local6.* there + +3. on the syslog server - setup log destinations for: + + - merged audit logs of all hosts + explicitly drop any non-AVC audit message here) + magic exclude line is:: + + :msg, !contains, "type=AVC" ~ + + + that line must be directly above the log entry you want to filter + and it has a cascade effect on everything below it unless you + disable the filter + + - per-host audit logs - this is everything from audit.log + +4. On the syslog server - we can run audit2allow/audit2why on the audit logs + sent there by doing this:: + + grep 'hostname' /var/log/merged/audit.log | sed 's/^.*tag_audit_log: //' | audit2allow + + the sed is to remove the log prefix garbage from syslog transferring the msg + + +Future: +======= + +- additional log reports for errors from http processes or servers +- SEC + Simple Event Coordinator to report, immediately, on events from a + log stream - available in fedora/epel. +- New report modules within epylog diff --git a/docs/sysadmin-guide/sops/taskotron.rst b/docs/sysadmin-guide/sops/taskotron.rst new file mode 100644 index 0000000..f5f8724 --- /dev/null +++ b/docs/sysadmin-guide/sops/taskotron.rst @@ -0,0 +1,142 @@ +.. title: Taskotron SOP +.. slug: infra-taskotron +.. date: 2014-12-16 +.. taxonomy: Contributors/Infrastructure + +============== +taskotron SOP +============== + +run automated tasks to check items in Fedora + +Contact Information +=================== + +Owner + Fedora QA Devel, Fedora Infrastructure Team + +Contact + #fedora-qa, #fedora-admin, #fedora-noc + +Location + PHX2 + +Servers + - taskotron-dev01.qa + - taskotron-stg01.qa + - taskotron01.qa + - taskotron-client*.qa + +Purpose + run automated tasks on fedora components and report results + of those task runs + +Architecture +============ + +Taskotron is a system for running automated tasks in fedora based on incoming +signals (fedmsgs in our case). + +The system is made up of several components: + +- trigger + +- task execution system + this is a master/slave system, currently using buildbot + +- results storage (covered in the resultsdb SOP) + +- mirror task git repos + +Deploying the Taskotron Master +============================== + +The Taskotron master node is responsible for: + +1) listening for fedmsgs and scheduling tasks as appropriate + +2) coordinating execution of tasks on client nodes + +Before doing the initial deployment, a ssh keypair is needed for the taskotron +clients to check tasks out from the git mirror. Generate a password-less +keypair (needed for deploying clients) and put the contents of the public key +in the 'buildslave_ssh_pubkey' variable. + +When running the ``taskotron-[dev,stg,prod]`` group playbook for the first time, +it will fail partway through during buildmaster configuration because +buildmaster initialization is not part of the playbook (it fails if re-run and +no upgrade is needed). Once you hit that failure, run the following on the +taskmaster node as the buildmaster user, run:: + + buildbot upgrade-master /home/buildmaster/master + +After running the ``upgrade-master`` command, continue the playbook and it +should run to completion. + + +Deploying the Taskotron Clients +=============================== + +Before deploying the taskotron clients, get the host key of the taskotron +master and populate ``buildmaster_pubkey`` in the client's group_vars file. This +will make sure that git checkouts from the master node work without human +intervention. + +Deploying the Taskotron clients is a matter of running the proper group +playbook once the variable files are properly filled in. No additional +configuration is needed. + +Updating +======== + +This part of the SOP can also be used to idle taskotron - just skip the update +and reboot steps but turn off fedmsg-hub and shut down the buildslave +services. The buildslave and fedmsg-hub processes will need to be restarted to +un-idle the system but buildbot will restart anything that was running once the +buildslaves come back up. + +.. note:: it would be wise to update resultsdb while the taskotron system is not + processing jobs - that is covered in a separate SOP. + +There are multiple parts to updating Taskotron: clients, master and git mirrors. + +1. on a non-affected machine, run taskotron-trigger such that it records the +jobs that have been triggered +2. stop fedmsg-hub on the taskotron master so that no new jobs come in +3. wait for buildbot to become idle +4. run ``systemctl stop buildslave`` on all affected clients +5. run the ``update_grokmirror_repos.yml`` playbook on the system to update +6. update and reboot the master node +7. update and reboot the client nodes +8. start the buildslave process on all client nodes (they aren't set to start at boot) + +Once all affected machines are back up, verify that all services have come +back up cleanly and start the process of running any jobs which may have been +missed during the downtime: + +1. there will be a /var/log/taskotron-trigger/jobs.csv file containing jobs +which need to be run on the non-affected machine running taskotron-trigger +mentioned above. Copy the relevant contents of that file to the taskotron +master node as "newjobs.csv" (filename isn't important) +2. on the master node, run 'jobrunner newjobs.csv' + +If the jobs are submitted without error, the update process is done. + + +Backup +====== + +There are two major things which need to be backed up for Taskotron: job data +and the buildmaster database. + +The buildmaster database is a normal postgres dump from the database server. +The job data is stored on the taskotron master node in +/home/buildmaster/master/ directory. The files in 'master/' are not important +but all subdirectories outside of 'templates/' and 'public_html/' are. + +Restore from Backup +=================== + +To restore from backup, load the database dump and restore backed up files to +the provisioned master before starting the buildmaster service. + diff --git a/docs/sysadmin-guide/sops/torrentrelease.rst b/docs/sysadmin-guide/sops/torrentrelease.rst new file mode 100644 index 0000000..4e071a3 --- /dev/null +++ b/docs/sysadmin-guide/sops/torrentrelease.rst @@ -0,0 +1,70 @@ +.. title: Torrent Releases Infrastructure SOP +.. slug: infra-torrent-releases +.. date: 2011-10-03 +.. taxonomy: Contributors/Infrastructure + +=================================== +Torrent Releases Infrastructure SOP +=================================== + + http://torrent.fedoraproject.org/ is our master torrent server for + Fedora distribution. It runs out of ServerBeach. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, sysadmin-torrent group +Location + ibiblio +Servers + torrent.fedoraproject.org +Purpose + Provides the torrent master server for Fedora distribution + +Torrent Release +=============== + +When you want to add a new torrent to the tracker at +[46]http://torrent.fedoraproject.org you need to take the following steps +to have it listed correctly: + +1. login to torrent.fedoraproject.org. If you are unable to do so please + contact the fedora infrastructure group about access. This procedure + requires membership in the torrentadmin group. + +2. upload the files you want to add to the torrent to + torrent.fedoraproject.org:/srv/torrent/new/$yourOrg/ + +3. use sha1sum and verify the file you have uploaded matches the source + +4. organize the files into subdirs (or not) as you would like + +5. run /srv/torrent/new/maketorrent [file-or-dir-to-torrent] + ([file-or-dir-to-torrent]) to generate a .torrent file or files + +6. copy the .torrent file(s) to: /srv/torrent/www/torrents/$yourOrg/ + +7. cd to /srv/torrent/torrent-generator/ or /srv/torrent/spins-generator/ + (depending on if it is an official release or spins release) + +8. add a .ini file in this directory for the content you'll be + torrenting. If you're not doing a normal Fedora release the filename + should in the brackets should be [$yourOrg/File-Of-1.1.torrent] — the + format of each section should be as follows:: + + [Zod-livecd-1-i386.torrent] + description=Fedora Core 6 Zod LiveCD 1 iso image for i386. + size=683M + releasedate=2006-12-22 + group=Fedora Core 6 Zod LiveCD 1 + +9. mv all files from /srv/torrent/new/$yourOrg into + /srv/torrent/btholding/ - this includes the files you uploaded as well + as the .torrent files you've created. + +Your files will be linked on the website and available on the tracker +after this. + diff --git a/docs/sysadmin-guide/sops/unbound.rst b/docs/sysadmin-guide/sops/unbound.rst new file mode 100644 index 0000000..430f95a --- /dev/null +++ b/docs/sysadmin-guide/sops/unbound.rst @@ -0,0 +1,19 @@ +.. title: Infrastructure Unbound SOP +.. slug: infra-unbound +.. date: 2013-11-22 +.. taxonomy: Contributors/Infrastructure + +========================== +Fedora Infra Unbound Notes +========================== + +Sometimes, especially after updates/reboots you will see alerts like this:: + + 18:46:55 < zodbot> PROBLEM - unbound-tummy01.fedoraproject.org/Unbound 443/tcp is WARNING: DNS WARNING - 0.037 seconds response time (dig returned an error status) (noc01) + 18:51:06 < zodbot> PROBLEM - unbound-tummy01.fedoraproject.org/Unbound 80/tcp is WARNING: DNS WARNING - 0.035 seconds response time (dig returned an error status) (noc01) + +To correct this, restart unbound on the relevant node (in the example case +above, unbound-tummy01), by running the restart_unbound Ansible playbook from +batcave01.:: + + sudo -i ansible-playbook /srv/web/infra/ansible/playbooks/restart_unbound.yml --extra-vars="target=unbound-tummy01.fedoraproject.org" diff --git a/docs/sysadmin-guide/sops/virt-image.rst b/docs/sysadmin-guide/sops/virt-image.rst new file mode 100644 index 0000000..e2f7cd8 --- /dev/null +++ b/docs/sysadmin-guide/sops/virt-image.rst @@ -0,0 +1,72 @@ +.. title: Infrastructure +.. slug: no-idea +.. date: 2015-07-09 +.. taxonomy: Contributors/Infrastructure +================================== +Fedora Infrastructure Kpartx Notes +================================== + +How to mount virtual partitions +=============================== + +There can be multiple reasons you need to work with the contents of a +virtual machine without that machine running. + +1. You have decommisioned the system and found you need to get something + that was not backed up. + +2. The system is for some reason unbootable and you need to change some + file to make it work. + +3. Forensics work of some sort. + +In the case of 1 and 2 the following commands and tools are +invaluable. In the case of 3, you should work with the Fedora Security +Team and follow their instructions completely. + +Steps to Work With Virtual System +================================= + +1. Find out what physical server the virtual machine image is on. + A. Log into batcave01.phx2.fedoraproject.org + + B. search for the hostname in the file /var/log/virthost-lists.out:: + + $ grep proxy01.phx2.fedoraproject.org /var/log/virthost-lists.out + virthost05.phx2.fedoraproject.org:proxy01.phx2.fedoraproject.org:running:1 + + C. If the image does not show up in the list then most likely it is + an image which has been decommissioned. You will need to search + the virtual hosts more directly. + + # for i in `awk -F: '{print $1}' /var/log/virthost-lists.out | + sort -u`; do + ansible $i -m shell -a 'lvs | grep proxy01.phx2' + done + +2. Log into the virtual server and make sure the image is shutdown. Even + in cases where the system is not working correctly it may have still + have a running qemu on the physical server. It is best to confirm that + the box is dead. + + # virsh destroy + +3. We will be using the kpartx command to make the guest image ready for + mounting. + + # lvs | grep + # kpartx -l /dev/mapper/- + # kpartx -a /dev/mapper/- + # vgscan + # vgchange -ay /dev/mapper/ + # mount /dev/mapper/ /mnt + +4. Edit the files as needed. + +5. Tear down the tree. + + # umount /mnt + # vgchange -an + # vgscan + # kpartx -d /dev/mapper/- + diff --git a/docs/sysadmin-guide/sops/virt-notes.rst b/docs/sysadmin-guide/sops/virt-notes.rst new file mode 100644 index 0000000..d060c40 --- /dev/null +++ b/docs/sysadmin-guide/sops/virt-notes.rst @@ -0,0 +1,52 @@ +.. title: Infrastucture libvirt tools SOP +.. slug: infra-libvirt +.. date: 2012-04-30 +.. taxonomy: Contributors/Infrastructure +=================================== +Fedora Infrastructure Libvirt Notes +=================================== + +Notes/FAQ on using libvirt/virsh/virt-manager in our environment + +how do I migrate a guest from one virthost to another +===================================================== + +multiple steps: + +1. setup an unpassworded root ssh key to allow communication between + the two virthosts as root. This is only temporary, so, while scary + it is not a big deal. Right now, this also means modifying + the ``/etc/ssh/sshd_config`` to ``permitroot without-password``. + +2. setup storage on the destination end to match the source storage. + If the path to the storage is not the same on both systems + (ie: not the same path into ``/dev/Guests00/myguest``) then take a copy + of the guest xml file from ``/etc/libvirt/qemu`` and modify it so it has + the right path. If you need to do this you need to add ``--xml thisfile.xml`` + to the arguments below AFTER the word 'migrate' + +3. as root on source location:: + + virsh -c qemu:///system migrate --p2p --tunnelled \ + --copy-storage-all myguest \ + qemu+ssh://root@destinationvirthost/system + + This should start the migration process and it will output absolutely + jack-squat on the cli for you to know this. On the destination system + go look in /var/log/libvirt/qemu/myguest.log (tail -f will show you the + progress results as a percentage completed) + + .. note:: + --p2p and --tunnelled are so it goes direct from one host to the other + but uses ssh. + +4. Once the migration is complete you will probably need to run this + on the new virthost:: + + virsh dumpxml myguest > /etc/libvirt/qemu/myguest.xml + virsh destroy myguest + virsh define /etc/libvirt/qemu/myguest.xml + virsh autostart myguest + virsh start myguest + + diff --git a/docs/sysadmin-guide/sops/virtio.rst b/docs/sysadmin-guide/sops/virtio.rst new file mode 100644 index 0000000..e7ad09a --- /dev/null +++ b/docs/sysadmin-guide/sops/virtio.rst @@ -0,0 +1,24 @@ +.. title: Infrastructure virtio SIP +.. slug: infra-virtio +.. date: 2014-05-01 +.. taxonomy: Contributors/Infrastructure + +============ +virtio notes +============ + +We have found that virtio is faster/more stable than emulating other cards +on our VMs. + +To switch a VM to virtio: + +- Remove from DNS if it's a proxy +- Log into the vm and shut it down +- Log into the virthost that the VM is on, and `sudo virsh edit ` +- Add this line to the appropriate bridge interface(s):: + + + +- Save/quit the editor +- `sudo virsh start ` +- Re-add to DNS if it's a proxy diff --git a/docs/sysadmin-guide/sops/voting.rst b/docs/sysadmin-guide/sops/voting.rst new file mode 100644 index 0000000..42e6cb2 --- /dev/null +++ b/docs/sysadmin-guide/sops/voting.rst @@ -0,0 +1,214 @@ +.. title: Voting and Elections Infrastructure SOP +.. slug: infra-voting +.. date: 2014-07-10 +.. taxonomy: Contributors/Infrastructure + +========================= +Voting Infrastructure SOP +========================= + +The live voting instance can be found at +https://admin.fedoraproject.org/voting and the staging instance at +https://admin.stg.fedoraproject.org/voting/ + +The code base can be found at +http://git.fedorahosted.org/git/?p=elections.git + +Contents +======== + +1. Contact Information +2. Creating a new election + + 1. Creating the election + 2. Adding Candidates + 3. Who can vote + +3. Modifying an Election + + 1. Changing the details of an Election + 2. Removing a candidate + 3. Releasing the results of an embargoed election + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin, elections +Location + PHX +Servers + elections0{1,2}, elections01.stg, db02 +Purpose + Provides a system for voting on Fedora matters + +Creating a new election +======================= + +Creating the elections +---------------------- + +* Log in + +* Go to "Admin" in the menu at the top, select "Create new election" and fill + in the form. + +* The "usefas" option results in candidate names being looked up as FAS + usernames an displayed as their real name. + +* An alias should be added when creating a new election as this is used in + the link on the page of listed elections on the frontpage. + +* Complete the election form: + + Alias + A short name for the election. It is the name that will be + used in the templates. + + ``Example: FESCo2014`` + + Summary + A simple name that will be used in the URLs and as in the + links in the application + + ``Example: FESCo elections 2014`` + + Description + A short description about the elections that will be + displayed above the choices in the voting page + + Type + Allow setting the types of elections (more on that below) + + Maxium Range/Votes + Allow setting options for some election type + (more on that below) + + URL + A URL pointing to more information about the election + + ``Example: the wiki page presenting the election`` + + Start Date + The Start of the elections (UTC) + + End Date + The Close of the elections (UTC) + + Number Elected + The number of seats that will be selected among the candidates after the election + + Candidates are FAS users? + Checkbox allowing integration between FAS account + and their names retrieved from FAS. + + Embargo results + If this is set then it will require manual intervention + to release the results of the election + + Legal voters groups + Used to restrict the votes to one or more FAS groups. + + Admin groups + Give admin rights on that election to one or more FAS groups + + +Adding Candidates +================= + +The list of all the elections can be found at voting/admin/ + +Click on the election of interest and and select "Add a candidate". + +Each candidate is added with a name and an URL. The name can be his/her FAS username +(interesting if the checkbox that candidates are FAS users has been checked when creating the calendar) or something else. + +The URL can be a reference to the wiki page where they nominated themselves. + +This will add extra candidates to the available list. + +Who can vote +============ + +If no 'Legal voters groups' have been defined when creating the election, the +election will be opened to anyone having signed the CLA and being in one +other group (commonly referred to CLA+1). + +Modifying an Election +===================== + +Changing the details of an Election + +.. note:: + this page can also be used to verify details of an election before it opens for voting. + +The list of all the elections can be found at ``/voting/admin/`` + +After finding the right election, click on it to have the overview and select +"Edit election" under the description. + +Edit a candidate +================ + +On the election overview page found via ``/voting/admin/`` (and clicking on the +election of interest), next to each candidate is an `[edit]` button allowing +the admins to edit the information relative to the candidate. + +Removing a candidate +==================== + +On the election overview page found via ``/voting/admin/`` (and clicking on the +election of interest), next to each candidate is an `[x]` button allowing +the admins to remove the candidatei from the election. + + +Releasing the results of an embargoed election +============================================== + +Visit the elections admin interface and edit the election to uncheck the +'Embargo results?' checkbox. + +Results +======= + +Admins have early access to the results of the elections (regardless of the +embargo status). + +The list of the closed elections can be found at /voting/archives. + +Find there the election of interest and click on the "Results" link in the +last column of the table. +This will show you the Results page included who was elected based on the +number of seats elected entered when creating the election. + +You may use these information to send out the results email. + +Legacy +====== + +.. note:: + The information below should now be included in the Results page (see above) + but I left them here in case. + +Other things you might need to query +------------------------------------ + +The current election software doesn't retrieve all of the information that +we like to include in our results emails. So we have to query the database +for the extra information. You can use something like this to retrieve the +total number of voters for the election:: + + ELECT e.id, e.shortdesc, COUNT(distinct v.voter) FROM elections AS e LEFT + JOIN votes AS v ON e.id=v.election_id WHERE e.shortdesc in ('FAmSCo - February + 2014') GROUP BY e.id, e.shortdesc; + + +You may also want to include the vote tally per candidate for convenience +when the FPL emails the election results:: + + SELECT e.id, e.shortdesc, c.name, c.novotes FROM elections AS e LEFT JOIN + fvotecount AS c ON e.id=c.election_id WHERE e.shortdesc in ('FAmSCo - February + 2014', 'FESCo - February 2014') ; + diff --git a/docs/sysadmin-guide/sops/wiki.rst b/docs/sysadmin-guide/sops/wiki.rst new file mode 100644 index 0000000..bd32d39 --- /dev/null +++ b/docs/sysadmin-guide/sops/wiki.rst @@ -0,0 +1,41 @@ +.. title: Wiki Infrastructure SOP +.. slug: infra-wiki +.. date: 2012-09-13 +.. taxonomy: Contributors/Infrastructure + +======================= +Wiki Infrastructure SOP +======================= + + Managing our wiki. + +Contact Information +=================== + +Owner + Fedora Infrastructure Team / Fedora Website Team +Contact + #fedora-admin or #fedora-websites on irc.freenode.net +Location: http + //fedoraproject.org/wiki/ +Servers + proxy[1-3] app[1-2,4] +Purpose + Provides our production wiki + +Description +=========== +Our wiki currently runs mediawiki. + +.. important:: + Whenever you changes anything on the wiki (bugfix, configuration, plugins, + ...), please update the page at https://fedoraproject.org/wiki/WikiChanges . + +Dealing with Spammers: +======================= +If you find a spammer is editing pages in the wiki do the following: + +1. admin disable their account in fas, add 'wiki spammer' as the comment +2. block their account in the wiki from editing any additional pages +3. go to the list of pages they've edited and rollback their changes - one by one. If there are many get someone to help you. + diff --git a/docs/sysadmin-guide/sops/zodbot.rst b/docs/sysadmin-guide/sops/zodbot.rst new file mode 100644 index 0000000..97d2fd8 --- /dev/null +++ b/docs/sysadmin-guide/sops/zodbot.rst @@ -0,0 +1,108 @@ +.. title: Zodbot Infrastucture SOP +.. slug: infra-zodbot +.. date: 2014-12-18 +.. taxonomy: Contributors/Infrastructure + +========================= +Zodbot Infrastructure SOP +========================= + +zodbot is a supybot based irc bot that we use in our #fedora channels. + +Contents +======== + +1. Contact Information +2. Description +3. shutdown +4. startup +5. Processing interrupted meeting logs +6. Becoming an admin + +Contact Information +=================== + +Owner + Fedora Infrastructure Team +Contact + #fedora-admin +Location + Phoenix +Servers + value01 +Purpose + Provides our IRC bot + +Description +=========== + +zodbot is a supybot based irc bot that we use in our #fedora channels. +It runs on value01 as the daemon user. We do not config manage the +zodbot.conf because supybot makes changes to it on its own. Therefore it +gets backed up and is treated as data. + +shutdown + ``killall supybot`` + +startup + `` cd /srv/web/meetbot`` + # zodbot current needs to be started in the meetbot directory. + # This requirement will go away in a later meetbot release. + ``sudo -u daemon supybot -d /var/lib/zodbot/conf/zodbot.conf`` + +Startup issues +============== + +If the bot won't connect, with an error like:: + + "Nick/channel is temporarily unavailable" + +found in ``/var/lib/zodbot/logs/messages.log``, hop on Freenode (with your own +IRC client) and do the following:: + + /msg nickserv release zodbot [the password] + +The password can be found on the bot's host in +``/var/lib/zodbot/conf/zodbot.conf`` + +This should allow the bot to connect again. + +Processing interrupted meeting logs +=================================== + +zodbot forgets about meetings if they are in progress when the bot goes +down; therefore, the meetings never get processed. Users may request a +ticket in [52]our Trac instance to have meeting logs processed. + +Trac tickets for meeting log processing should consist of a URL where +zodbot had saved the log so far and an uploaded file containing the rest +of the log. The logs are stored in /srv/web/meetbot. Append the remainder +of the log uploaded to Trac (don't worry too much about formatting; +meeting.py works well with irssi- and XChat-like logs), then run:: + + sudo python /usr/lib/python2.7/site-packages/supybot/plugins/MeetBot/meeting.py replay /path/to/fixed.log.txt + +Close the Trac ticket, letting the user know that the logs are processed +in the same directory as the URL they gave you. + +Becoming an admin +================= + +Register with zodbot on IRC.:: + + /msg zodbot misc help register + +You have to identify to the bot to do any admin type commands, and you +need to have done so before anyone can give you privs. + +After doing this, ask in #fedora-admin on IRC and someone will grant you +privs if you need them. You'll likely be added to the admin group, which +has the following capabilities (the below snippet is from an IRC log +illustrating how to get the list of capabilities). + +:: + + 21:57 < nirik> .list admin + 21:57 < zodbot> nirik: capability add, capability remove, channels, ignore add, + ignore list, ignore remove, join, nick, and part +