#8167 Adding topic authorization to our RabbitMQ instances
Closed: Fixed 9 months ago by abompard. Opened 4 years ago by abompard.

Community applications will want to publish messages on the bus (like election in its future CommunityShift home). Currently, any read-write account can publish to any topic, which can be a security issue.

Starting with RabbitMQ 3.7.0, topic authorization is possible, but the version we are running is 3.6.0 since that's what's in EPEL7. If we want to have topic authorizations, we need to upgrade RabbitMQ. Making an infra-specific package seems like a bad idea because of the maintenance it involves. The other way would be to upgrade the servers to RHEL8.


Turns out, rhel8 doesn't seem to have the rabbitmq server in it, only the client libraries. :(

Will have to ponder on a solution here...

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: rabbitmq

4 years ago

Metadata Update from @cverna:
- Issue tagged with: backlog

4 years ago

A Fedora host?

@abompard do you have any experience/clues on how to upgrade a rabbitmq cluster
from an OS version to another?

I don't... yet! But @jcline may know, and it's very probably in the docs. Others have had this need before us.

EPEL actually comes with 3.3 or something, we are getting it from the OpenStack channel as far as I know. From what I understand 3.7 will come with the next OpenStack release, no idea on the timeline.

Upgrade docs are https://www.rabbitmq.com/upgrade.html

OS15 comes with 3.7.x... for rhel8.

So, we need fasClient working on rhel8 and enough epel8 stuff for us to run things on rhel8 and then we can use the newer one from os15.

I believe fasClient is now working on RHEL8.

So we need to figure out the dependency list we want in EPEL8.

So, we updated staging with newer version and rhel8.

I guess the next step is to adjust the rabbitmq config and get that all working, then move to prod?

Metadata Update from @cverna:
- Issue untagged with: backlog
- Issue tagged with: high-gain, low-trouble

4 years ago

This is live in prod now also because the iad2 datacenter prod has the updated version.

Metadata Update from @smooge:
- Issue tagged with: dev

3 years ago

hello, what's the status of this? Is this planned work?

I would love to see it fixed to see localization events part of our event system (see: https://pagure.io/fedora-infrastructure/issue/8291).

For reference, the topic authorization doc is here: https://www.rabbitmq.com/access-control.html#topic-authorisation

Reading this doc I can see:

Topic authorisation targets protocols like STOMP and MQTT, which are structured around topics and use topic exchanges under the hood.
[..]
The concept of topic authorisation only really makes sense for the topic-oriented protocols such as MQTT and STOMP. In AMQP 0-9-1, for example, consumers consume from queues and thus the standard resource permissions apply.

The doc also points to https://github.com/rabbitmq/rabbitmq-auth-backend-amqp whose last release was 2 years ago.

This doesn't look so good :(

Metadata Update from @pingou:
- Issue priority set to: Next Meeting (was: Waiting on Assignee)

3 years ago

I've made some tests and it seems to work fine :-) I just merged a change in fedora-messaging to handle the "Forbidden" responses that the server might send when publishing on a forbidden topic. So before we can use this I'll need to cut a new release, rebuild the corresponding packages, and rebase the VMs of all applications that send messages so that they pick up the new version. Ideally, those applications would also handle the new exception to log and drop the message.

How do we want to set this up in Ansible? I see the following possibilities:

  • Allow all internal applications to send messages to any topic and only restrict topic for external applications (those that don't run in fedora-infra, maybe cico, koschei, etc. This is the simplest to setup but not entirely secure (compromised internal apps may send messages to any topics).

  • Build a whitelist of allowed topics for every producing application and set them in ansible, as we did with fedmsg. This is harder and more error-prone since we may miss topics. Also, topic permissions are regexp-based, which can also be a source of bugs for apps that send messages on very different topics (but for most a simple prefix should suffice). It's also the most secure.

What do you think?

I've made some tests and it seems to work fine :-) I just merged a change in fedora-messaging to handle the "Forbidden" responses that the server might send when publishing on a forbidden topic. So before we can use this I'll need to cut a new release, rebuild the corresponding packages, and rebase the VMs of all applications that send messages so that they pick up the new version. Ideally, those applications would also handle the new exception to log and drop the message.

How do we want to set this up in Ansible? I see the following possibilities:

  • Allow all internal applications to send messages to any topic and only restrict topic for external applications (those that don't run in fedora-infra, maybe cico, koschei, etc. This is the simplest to setup but not entirely secure (compromised internal apps may send messages to any topics).

  • Build a whitelist of allowed topics for every producing application and set them in ansible, as we did with fedmsg. This is harder and more error-prone since we may miss topics. Also, topic permissions are regexp-based, which can also be a source of bugs for apps that send messages on very different topics (but for most a simple prefix should suffice). It's also the most secure.

Do we have any app that send messages on very different topics?
Normally our standard is org.<origin>.<env>.<app>.<action....>
I can't think of an app sending different app topics.
So, I think we should be able to take the second, more secure, approach :)

Metadata Update from @ryanlerch:
- Issue marked as blocking: #8291

2 years ago

@abompard you were going to do this? whats the status?

Updating here with status:

Upstream rabbitmq has taken the patch from @abompard that was needed and merged it. They have not yet done a release.

https://github.com/ansible-collections/community.rabbitmq/pull/73

However, ansible 2.9 will likely never get this change (it eols at the end of the year). We will likely switch batcave01 to ansible-core/collections later this year, after that we can get this fix and move this forward again.

So, now we just need someone to package up ansible-collections-community-rabbitmq so we can use it. ;)

Any takers?

I think @petebuffon was willing to take it.

If not, I can be second line volunteer :)

I'll take a crack at it. @lenkaseg if you wanna work on it too then let's do it.

I'm still learning up on ansible-collections. Could this be as easy as ansible-galaxy collection install community.rabbitmq, seen here?

If not then it would be helpful to get a link to some other collections Fedora Infra has packaged to get started.

I'm trying to find if there is somewhere to look how to package a collection in Fedora-infra...

Could this be as easy as ansible-galaxy collection install community.rabbitmq, seen here?

I have never done it before, but this seems more like what would be done after packaging from batcave (?). But how to package...I don't know where to look to see where the others packacked collections are.

So I found this ansible-collection-community-mysql repo and there is this spec file:
https://src.fedoraproject.org/rpms/ansible-collection-community-mysql/blob/f34/f/ansible-collection-community-mysql.spec

Probably for the rabbitmq we should do the same thing. Create a package with a similar spec file, using the same ansible macros.

(would find it yesterday probably, but at 11PM I mistook pagure.io for src.fpo since it has the same design and wondered why I cannot find any collections there :) )

@lenkaseg exactly what I was going to suggest.

After we have a spec/package we could just directly use it ourselves, and/or we could take it through the process to be reviewed and officially added to Fedora and then EPEL. :)

Probibly we want to use it directly once we have it, but also in the background move forward to make it an official package.

Hi @petebuffon!
Any progress on the collection packaging? Can I be of some help?

@lenkaseg No progress yet. I've been sick the past few days, but I'm ready to work on the collection packaging. Want to chat about it over on IRC?

@lenkaseg check out the new repo I made, ansible-collection-community-rabbitmq. Want to give it a look and let me know what you think?

@lenkaseg check out the new repo I made, ansible-collection-community-rabbitmq. Want to give it a look and let me know what you think?

Cool!
In the spec file where there is the Source url:
https://pagure.io/ansible-collection-community-rabbitmq/blob/master/f/ansible-collection-community-rabbitmq.spec#_12
I see that in the mysql collection the address is in the same format:
https://github.com/ansible-collections/community.rabbitmq/archive/%{version}/%{name}-%{version}.tar.gz,
but when I check in the rabbitmq github repo, the tar.gz file seems to be here:
https://github.com/ansible-collections/community.rabbitmq/archive/refs/tags/1.1.0.tar.gz
I wonder why.

@lenkaseg check out the new repo I made, ansible-collection-community-rabbitmq. Want to give it a look and let me know what you think?

Cool!
In the spec file where there is the Source url:
https://pagure.io/ansible-collection-community-rabbitmq/blob/master/f/ansible-collection-community-rabbitmq.spec#_12
I see that in the mysql collection the address is in the same format:
https://github.com/ansible-collections/community.rabbitmq/archive/%{version}/%{name}-%{version}.tar.gz,
but when I check in the rabbitmq github repo, the tar.gz file seems to be here:
https://github.com/ansible-collections/community.rabbitmq/archive/refs/tags/1.1.0.tar.gz
I wonder why.

I just followed the same path as the mysql collection. Both are valid and go to the see tar file.

The spec/package are ready. What is the next step for building on batcave01? And then I'll move forward with submitting to Fedora and Epel as well.

The spec/package are ready. What is the next step for building on batcave01? And then I'll move forward with submitting to Fedora and Epel as well.

I'm not sure we want to update this during freeze, so I would say go ahead and submit it for review and we can get it added to Fedora/EPEL, if it's not done after freeze is over we can build it in infra tags, and if it is we can just use the EPEL one. ;) Please feel free to cc me on the bug and I can review...

Hey folks! Can we get this going again?

Back from winter break. In order to submit for review for Fedora/EPEL, Do I need to join the Fedora Package Maintainers? I'm looking over the docs here, https://docs.fedoraproject.org/en-US/package-maintainers/Package_Review_Process/.

You don't have to be a packager to submit a review, but if you aren't there's a step to note that you need a sponsor as part of the package review. :)

The package status is now stable for rawhide and testing for f34-f36. I closed the bugzilla ticket.

Is the next step then to do the same thing for EPEL?

If you could do epel8-next that would be great.

epel8-next is building against centos stream 8, and it has ansible-core in it (and ansible-packaging ) so it should just be requesting the branch, testing that it builds ok...

Alright the package is now in epel8-next and epel9 testing.

I made a comment to the upstream package about testing on Fedora and CentOS, I haven't heard back yet.

ok. So, we need to either ask for a freeze break, or wait until after freeze to install on batcave01.

I'm fine either way. If it's beneficial to do a freeze break to include this in batcave01 then I say let's do that.

So, some more complications here. :)

the epel8-next version needs ansible-core (also in epel8-next only). So, we can't upgrade batcave01 until we move to ansible-core. :(

ansible-core in epel8-next also is using python38 instead of the default python3. We may need to adjust the package to build with python38 there too.
I think thats likely to just be:
Add %global __python3 /bin/python3.8
and change any BuildRequires from python3-devel to python38-devel and so on.

This should be unblocked now I am pretty sure.

batcave01 should have 1.1.0 of the collection.

Hmm, I don't see the package installed or available for batcave01. Isn't it named ansible-collection-community-rabbitmq? Where is it at the moment?

When I have some time I can test the Python version (python3 vs python38).

The package name is "ansible-collection-community-rabbitmq".

According to Bodhi the package has been pushed to stable for "Fedora EPEL 8 Next" about a month ago. Not exactly sure why it isn't available for install.

So, it turns out we didn't actually need the package, but we can of course still use it and thats a nicer way forward.

Right now, ansible has a few collections it installs with ansible-galaxy:

See roles/ansible-server/files/requirements.yml

Afrer freeze we can switch to the rpm.

I've started working on this and I have something that I think is usable at the moment:

  • users of the rabbit/user and rabbit/queue role can now add a sent_topics variable that is a list of allowed topic regexps.
  • no regexp means all topics are allowed. To prevent from sending, add the ^$ regexp (see datanommer's playbook)
  • it only applies the permissions on staging at the moment

Here's an example of what it looks like for noggin, in playbooks/openshift-apps/noggin.yml:

- role: rabbit/user
    username: "noggin{{ env_suffix }}"
    sent_topics:
    - org\\.fedoraproject\\.{{ env_short }}\\.fas\\..*

I have added the variable to a few apps that are simple and that I know well, but app owners can start adding their topic permissions right now and test in staging.

Metadata Update from @abompard:
- Issue assigned to abompard

2 years ago

[backlog_refinement]
Whats the status here? Do we know what apps are left to finish this?

Localisation team hope to have this:
https://pagure=2Eio/fedora-infrastructure/issue/8291 so this ticket probab=
ly isn't done yet=2E
No idea what I can do to help here=2E Please let me know if can do somethi=
ng=2E

[backlog_refinement]
@abompard What is the status about this one?

I haven't moved forward more with this ticket. What's left to do is:
- have all apps declare their sent_topics in ansible. Bug the maitainers about it
- see what breaks in staging, fix it
- change the meaning of no sent_topics from "all topics allowed" to "no topics allowed"
- see what breaks this time and fix it
- alert the app maintainers that we're going to do this in prod
- do it in prod.

Does this sound good?

Sounds good to me. :)

[backlog]
@abompard is working on this and currently needs to commit to rabbitmq. This will be done on the staging firsts and we see how this goes.

This has now been enabled in production. As stated on the mailing-list, the following users are not protected by ACLs (which means they can send to any topics):
- notifs-web and notifs-backend, because we'll remove the old FMN soonish
- alt-src: I couldn't contact the owner (Siteshwar?). Related to CentOS Stream. I tried to contact Brian Stinston but got no answer.
- coreos: Same, couldn't contact the owner of this account.
- fedora-build-checks: Same story, I contacted Tim Flink who redirected me to msrb, but got no response.

All the other accounts are only allowed to send to the topics they have defined in Ansible.

Metadata Update from @abompard:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

9 months ago

Login to comment on this ticket.

Metadata
Boards 1
dev Status: Blocked