#230 SELinux Deployment
Closed: Fixed None Opened 13 years ago by lmacken.

Ideally, we should be running SELinux on all of our infrastructure. This has not been the case due to issues with some of our tools. These issues were "resolved" by disabling SELinux.

This ticket will track the progress of our deployment.

Mike, what do you think about the attached patches?

12:22 mmcgrath> lmacken: seems reasonable to me.

I've committed these patches and did make install.

I've found a series of patches that allow puppet to manage SELinux a bit more directly: http://spook.wpi.edu/

I'm going to see about getting these included in the Fedora/EPEL puppet package as well as getting it pushed upstream.

I've emailed dlutter about getting these patches into Fedora/EPEL in the short-term.

I've also spoken with the guys in #puppet, and filed a ticket in their Trac, to get these patches upstreamed. They're receptive to it, and will be looking at them very soon.

I haven't looked too hard at the puppet selinux patches, but ideally in our environment, what we would want is something like this:
* Ability to specify custom context for custom paths. For example, with the current Transifex deployment:
semanage fcontext -a -t httpd_sys_content_t '/var/tmp/l10n-data(/.*)?'
restorecon -Rv /var/tmp/l10n-data
* Ability to set booleans, for example:
setsebool -P use_nfs_home_dirs=1
* Ability to insert custom SELinux policy modules, ie:
semodule -i sobby.pp

Looking at [http://spook.wpi.edu http://spook.wpi.edu], it looks like that should suit our needs just fine for now...

== Stuff that needs to get done ==
* Update selinux-policy* on all of our machines to the latest U2 packages ASAP. Since U2 came out this week, hopefully all of our machines will get this automatically.
* Get SELinux puppet changes upstream. Brett Lentz has been doing a great job writing up unit tests for these patches, and working with the puppet upstream on this.
* Churn through the violations for all of our machines, fixing apps, selinux policy, and tuning various contexts/booleans/modules to our environment (documenting these changes along the way). Dan Walsh and I are in the process of doing this.

So, I sat down with Dan earlier today and we took a look at the SELinux situation on a few of our machines: app1, proxy1, and bastion.

== bastion ==
* selinux-policy was upgraded to 2.4.6-137.el5, which went out in the RHEL5 U2 update, released this week. This contains thousands of policy fixes, and I believe it fixed most if not all of the violations that we saw on this box.

== app servers ==
Ok, so we have a ton of custom apps in a bunch of non-standard locations.
* we use /srv/web for a lot of things. If we were to change this to /srv/www, a lot of problems would go away. If not, we could set a custom context to that directory. In the mean time, the following contexts were set on app1:
semanage fcontext -a -t httpd_sys_content_t '/srv/web(/.)?'
semanage fcontext -a -t httpd_sys_content_t '/var/tmp/l10n-data(/.
semanage fcontext -a -t httpd_var_run_t '/var/run/mod_fcgid(/.*)?'
* For machines using NFS, we'll have to do something like (set on app1):
setsebool -P use_nfs_home_dirs=1 allow_mount_anyfile=1 allow_mounton_anydir=1
* We'll also need to set a custom context for /var/db/group.db (and other files), which are created by the fasClient.
== Proxy Servers ==
The following changes were made on proxy1, which fixed various SELinux violations
setsebool -P httpd_can_network_connect_db=1 httpd_can_network_relay=1

Allow proxies to talk to our TG apps

semanage port -a -t http_port_t -p tcp 8081-8089.
* SNMP seems to be running the puppet init script on the proxies. If this is intended, we'll have to create a custom policy module for this. Dan created one on proxy1 and installed it.

== Policy issues found that are getting fixed upstream ==
* allow httpd_rotatelogs_t self:capability dac_override;
* mod_fcgi awareness

== Goals ==
So, the long-term goal is to be able to handle all of these cases within our puppet manifests. The short term goal is to get a few machines working in permissive mode flawlessly, fixing our apps and upstream selinux policy in the process. By documenting all of the changes along the way, we should be able to easily drop them into our puppet configuration when the time comes.

I'll be keeping a close eye on these machines, and will be touching base with Dan Walsh again next week.

The current patch does not utilize semanage, but I've been talking with the patch's developer about adding that functionality after the current patch is accepted.

Currently, the patch adds properties to the standard File object for managing a file/directory's context. This means that any contexts set by puppet will not be maintained if the filesystem needs to be relabeled. Puppet would need to reset these contexts after a relabel.

I went ahead an updated our selinux puppet manifest to use a custom restorecond.conf, until puppet can handle this stuff on its own. This will help resolve a ton of AVCs that Dan and I came across today.

So we made a lot of progress today. We added some awesomely-hackish stuff in our puppet filetype configurations, but the end result lets us do stuff like this:

selinux_bool { 'httpd_can_network_connect_db': bool => 'on' }
selinux_bool { 'httpd_can_network_relay': bool => 'on' }
selinux_bool { 'use_nfs_home_dirs': bool => 'on' }
selinux_bool { 'allow_mount_anyfile': bool => 'on' }
selinux_bool { 'allow_mounton_anydir': bool => 'on' }

semanage_fcontext { '/var/tmp/l10n-data(/.*)?': type => 'httpd_sys_content_t' }
semanage_fcontext { '/var/log/bodhi(/.*)?': type => 'httpd_log_t' }
semanage_port { '8081-8089': type => 'http_port_t', proto => 'tcp' }


app1-3 should be all set. On app4, mirrormanager seems to be putting a socket in /tmp with an improper context. Ways to resolve this are to either: have mirrormanager set the proper context of the socket (httpd_var_run_t), or to create it in /var/run/httpd.

proxy1-3 should be all set.

people1 should be good, except for an snmp port issue -- which is going to get fixed in upstream policy.

This won't be done for F10, moving to F11

Just to bring this ticket up to speed...

We're currently in the 'collect avcs, fix issues, push out new policy, repeat' phase of the game. I send Dan Walsh all of our AVCs on a weekly basis, and we sit down frequently and go through them, fixing as many issues as we can along the way.

Our puppet tweaks seem to be working fine, for now, although the latest version of puppet contains various SELinux-related enhancements that we will need to assess once we upgrade.

bastion, people1, and planet1 are all in Enforcing mode.

There are plenty of other machines that haven't had a single AVC denial in months, so I will be going through them shortly flipping them to Enforcing mode.

The following server groups are now fully enforcing:

  • gateway
  • people
  • planet
  • fas
  • collab
  • releng
  • db
  • torrent
  • dns

I will be keeping a close eye on these machines, and I encourage anyone
that is interested to do the same. I threw together a little tool that
I've been using to monitor & manage SELinux on our machines. It uses
func, and allows you to do the following:

Get the SELinux status:

selinux-overlord.py --status
Display all enforced denials:
selinux-overlord.py --enforced-denials
Dump all raw AVCs to disk. Each minion will have it's own file:
selinux-overlord.py --dump-avcs
Upgrade the SELinux policy RPMs:
selinux-overlord.py --upgrade-policy
It defaults to querying all minions, but you can specify groups of them
if you wish:
selinux-overlord.py --status app db
This script should ideally be it's own func module, but in the mean time
I added it to the fedora-infrastructure git repository:


= SELinux Status =


{'Disabled': ['cnode01.fedoraproject.org',
'Enforcing': ['backup02.fedoraproject.org',
'Permissive': ['app01.dev.fedoraproject.org',

There is a problem somewhere along the path from selinux modules in puppet to the servers. For instance, we have this same AVC for almost all servers: (tl;dr: nagios is trying to read /proc/mdstat)

type=AVC msg=audit(1349727068.094:16004): avc:  denied  { read } for  pid=8199 comm="python" name="mdstat" dev=proc ino=4026531977 scontext=system_u:system_r:nrpe_t:s0 tcontext=system_u:object_r:proc_mdstat_t:s0 tclass=file

[after two months of desperately poking around ...] I think the problem is that not all nodes in puppet include nrpe. (nrpe itself is included in global.pp which is not included in many hosts such as lockbox01 and value01. So the question is which hosts should not include global.pp? Let me know about that and I'll create the patch.

I think if we solve this one, the next common AVC would be nfs trying to read /srv on hosted02.

Ah, digging around in puppet it looks like some nodes include global via a service group...

so, value01.stg doesn't include global directly, but it does have:

include valueadd

which in turn ( manifests/services/valueadd.pp ) has:

include global

So, global is getting included. ;(

I think we may need to bring in some big selinux guns here and ask Dan Walsh to look...

adds open, getattr, and ioctl perms to nrpe_t over proc_mdstat_t

Current status:

disabled / enforcing / permissive

2 / 207 / 166

115 of those permissive hosts are builders. We need to test out if we can move them to enforcing or if that causes some problem.

Do we need to leave this ticket open anymore?

I'm going to close this as we are deploying all the new instances in ansible with selinux=enforcing, and have moved builders to permissive fine.

Login to comment on this ticket.