#2849 enable systemd preset for waagent.service (WALinuxAgent)
Closed: Accepted a year ago by churchyard. Opened 2 years ago by cjp256.

Seeking FESCo approval for the agent used by Azure to be enabled by default.

Additionally, WALinuxAgent may affect other services, requiring a FESCo approval:

Must not alter other services
Installation of the package providing the unit auto-started by this preset MUST NOT change the behavior of any other service running (or potentially running) on the system.

@cjp256 Can you expand a bit more on what you'd like to see changed and the reasoning behind it?


For those who are unfamiliar with WALinuxAgent, it's a daemon that runs inside an instance on Azure's cloud that can take many actions on a running instance.

Normally, any configuration (adding users, laying down ssh keys, setting host names) is done on the first boot by cloud-init. The WALinuxAgent allows Azure to issue additional instructions to instances to update their configuration after the first boot.

On the plus side, this can help people who want to make configuration changes on the fly long after the instance was provisioned. This helps a lot with re-configurations needed to use other products in Azure. It saves time and avoids human error mistakes.

On the negative side, you are running a daemon that takes instructions from elsewhere and makes changes inside your instance. This may be highly undesirable for some users. Some call this a backdoor. Also, there have been vulnerabilities in the past that allowed remote code execution on instances. Microsoft provided guidance around these issues.

Thanks @mhayden for the overview :D

In creating a kickstart config for an Azure image (https://pagure.io/fedora-kickstarts/pull-request/904) @ngompa asked to use presets for enabling waagent.service in accordance with Fedora policy.

WALinuxAgent is installed and enabled in all endorsed images on Azure as it supports significant functionality on Azure.

It should be comparable to the existing approved preset for google's supporting agent (approved @ https://pagure.io/fesco/issue/2578):
https://src.fedoraproject.org/fork/cjp256/rpms/fedora-release/blob/rawhide/f/90-default.preset#_351

I think that this is a positive change for the images to be placed in the Azure Marketplace. I want to offer up my support for the modification.

I think there is good guidance around the usage, but as the might offer up access in ways that are unintended in other virtualization environments, it might be a good idea to add a motd.d file identifying that this image is purpose-built for the Azure compute environment and is not built to be used in a general virtual environment.

Does the systemd service set constraints to run only on HyperV/Azure hypervisors? I believe some of the other ones do something similar.

Does the systemd service set constraints to run only on HyperV/Azure hypervisors? I believe some of the other ones do something similar.

I do not see any conditionals based on hypervisor for Azure's or Google's agents:
https://github.com/Azure/WALinuxAgent/blob/develop/init/waagent.service
https://github.com/GoogleCloudPlatform/guest-agent/blob/main/google-guest-agent.service

One could certainly be added (e.g. ConditionVirtualization=microsoft), but it wouldn't make sense to install WALinuxAgent on any image outside of Azure (or google guest agent outside of GCP, etc.)?

FWIW I installed google-guest-agent in my Azure image and rebooted and it started up (and had an error as one would expect).

I think that this is a positive change for the images to be placed in the Azure Marketplace. I want to offer up my support for the modification.

I think there is good guidance around the usage, but as the might offer up access in ways that are unintended in other virtualization environments, it might be a good idea to add a motd.d file identifying that this image is purpose-built for the Azure compute environment and is not built to be used in a general virtual environment.

Great idea!

Based on what I see in this thread, I could get on board with the following:

  • WALinuxAgent is being actively maintained in Fedora with a close watch on CVEs
  • Some type of notification for users in the instance that lets them know the daemon is present and running
  • Add it to the next release of Fedora (perhaps F38 or as soon as F37)?

I'd like to see this go through the usual change process if possible. Sure, it's a change that is limited to a subset of a single edition of Fedora, but it's probably worth making sure we have good plans for it.

One could certainly be added (e.g. ConditionVirtualization=microsoft), but it wouldn't make sense to install WALinuxAgent on any image outside of Azure (or google guest agent outside of GCP, etc.)?

Yes, such conditionalization is useful. The expectation that packages will be installed only on machines where they are needed is not met in practice. In particular, packages may be pulled in through dependencies, and people may install packages for testing or to read the docs, or by mistake, or somebody may be building a generic image compatible with multiple environments. That's why we have the requirement in the policy [1]:

The service MUST NOT, under normal operating conditions, exit with an error causing systemd to mark the unit as failed.

--

On the negative side, you are running a daemon that takes instructions from elsewhere and makes changes inside your instance. This may be highly undesirable for some users. Some call this a backdoor. Also, there have been vulnerabilities in the past that allowed remote code execution on instances. Microsoft provided guidance around these issues.

Whoah, that's not nice. I don't know if I'm just not seeing something, but the linked Microsoft pages seem very uninformative. (The 3rd party pages explain the issue. But the Microsoft "blog" entry about the vulnerabilities seems completely void of details.) What is worse, the "guidance" is rather strange: checking local logs for evidence of root compromise, seriously? Also, the 3rd party articles say [e.g. 2] that the fix commit was publicly available for a month before the update was sent out. And that the security architecture of the system is inane.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/DefaultServices/#_must_not_fail_under_normal_operating_conditions:
[2] https://www.wiz.io/blog/omigod-critical-vulnerabilities-in-omi-azure/

After reading this, I'm not too eager to enable this by default on Fedora, at least not without confirmation from people using Azure that this is really worth the risks.
To add to the requirements listed by @mhayden, I'd add:

  • the service does not listen on any external ports

Does the systemd service set constraints to run only on HyperV/Azure hypervisors? I believe some of the other ones do something similar.

I do not see any conditionals based on hypervisor for Azure's or Google's agents:
https://github.com/Azure/WALinuxAgent/blob/develop/init/waagent.service
https://github.com/GoogleCloudPlatform/guest-agent/blob/main/google-guest-agent.service

One could certainly be added (e.g. ConditionVirtualization=microsoft), but it wouldn't make sense to install WALinuxAgent on any image outside of Azure (or google guest agent outside of GCP, etc.)?

FWIW I installed google-guest-agent in my Azure image and rebooted and it started up (and had an error as one would expect).

Hmm, I guess only the VirtualBox one has it?

One could certainly be added (e.g. ConditionVirtualization=microsoft), but it wouldn't make sense to install WALinuxAgent on any image outside of Azure (or google guest agent outside of GCP, etc.)?

Yes, such conditionalization is useful. The expectation that packages will be installed only on machines where they are needed is not met in practice. In particular, packages may be pulled in through dependencies, and people may install packages for testing or to read the docs, or by mistake, or somebody may be building a generic image compatible with multiple environments. That's why we have the requirement in the policy [1]:

The service MUST NOT, under normal operating conditions, exit with an error causing systemd to mark the unit as failed.

Yeah, I think it makes sense to have the ConditionVirtualization=|microsoft flag set.

On the negative side, you are running a daemon that takes instructions from elsewhere and makes changes inside your instance. This may be highly undesirable for some users. Some call this a backdoor. Also, there have been vulnerabilities in the past that allowed remote code execution on instances. Microsoft provided guidance around these issues.

Whoah, that's not nice. I don't know if I'm just not seeing something, but the linked Microsoft pages seem very uninformative. (The 3rd party pages explain the issue. But the Microsoft "blog" entry about the vulnerabilities seems completely void of details.) What is worse, the "guidance" is rather strange: checking local logs for evidence of root compromise, seriously? Also, the 3rd party articles say [e.g. 2] that the fix commit was publicly available for a month before the update was sent out. And that the security architecture of the system is inane.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/DefaultServices/#_must_not_fail_under_normal_operating_conditions:
[2] https://www.wiz.io/blog/omigod-critical-vulnerabilities-in-omi-azure/

After reading this, I'm not too eager to enable this by default on Fedora, at least not without confirmation from people using Azure that this is really worth the risks.
To add to the requirements listed by @mhayden, I'd add:

  • the service does not listen on any external ports

I don't think we even have OMI packaged in Fedora right now.

To add to the requirements listed by @mhayden, I'd add:

  • the service does not listen on any external ports

AFAIK walinuxagent does not.

Based on what I see in this thread, I could get on board with the following:

  • WALinuxAgent is being actively maintained in Fedora with a close watch on CVEs

Absolutely. Part of the goal here is to ensure all Azure-relevant services are being actively maintained which is most easily done when we have a working Azure image :D

  • Some type of notification for users in the instance that lets them know the daemon is present and running

Is there precedence for this or is this a unique requirement singling out walinuxagent? Is there an established pattern to follow?

  • Add it to the next release of Fedora (perhaps F38 or as soon as F37)?

I'd like to see this go through the usual change process if possible. Sure, it's a change that is limited to a subset of a single edition of Fedora, but it's probably worth making sure we have good plans for it.

Can you clarify the intent here? The only ask in this thread is to enable the service when installed, which already has precedence and afaict, is the established, documented process.

We have never asked for presets to go through Change process before. I am not convinced we should for this.

We have never asked for presets to go through Change process before. I am not convinced we should for this.

Thanks for letting me know.

We have never asked for presets to go through Change process before. I am not convinced we should for this.

Agreed, although I do think it's reasonable to ask for a devel thread before submitting such a request. I also think we should require release notes issues for these (separate ticket incoming).

@ngompa Did we ever get the devel thread going on this?

@bcotton do we have a ticket yet for the release notes?

AFAIK walinuxagent does not.

Can we get a definite confirmation?

We have never asked for presets to go through Change process before. I am not convinced we should for this.

Agreed, let's not make this too complicated.

@cjp256 We discussed this issue in today's meeting and we need two things done:

  1. Please start a thread to the devel mailing list to gather feedback on the proposed change from the wider community.
  2. Let us know if the service would need to listen on TCP/UDP ports when it runs.

Metadata Update from @mhayden:
- Issue tagged with: stalled

2 years ago

On behalf of @narietta (having issues signing in), lead on WALinuxAgent, we can confirm that the service does not listen for any external connections.

Sorry to jump into this thread so late, but I'm pretty inclined to say that this service would violate the "Must not alter other services" rule. I'm -1 to enabling this by default. Documenting how to enable it intentionally would be better (a couple lines in cloud-init should be enough, I'd think).

Then we shouldn't support any cloud agents, nor ignition, nor cloud-init. That rule makes no sense given how often it's violated anyway.

Then we shouldn't support any cloud agents, nor ignition, nor cloud-init. That rule makes no sense given how often it's violated anyway.

FTR, see https://pagure.io/fesco/issue/2578 the existing Google cloud agent has a FESCo exception to this rule, specifically.

I'm leery of waiving that rule for this one given the security history, but I'm open to being convinced.

Then we shouldn't support any cloud agents, nor ignition, nor cloud-init. That rule makes no sense given how often it's violated anyway.

FTR, see https://pagure.io/fesco/issue/2578 the existing Google cloud agent has a FESCo exception to this rule, specifically.

I'm leery of waiving that rule for this one given the security history, but I'm open to being convinced.

Similarly, the VirtualBox extensions were enabled as part of an approved Change: https://bugzilla.redhat.com/show_bug.cgi?id=1534595

As far as I can tell, cloud-init isn't actually enabled as a service in Fedora by default. Ignition is enabled by default only on Fedora IoT at the request of that WG.

FTR, see https://pagure.io/fesco/issue/2578 the existing Google cloud agent has a FESCo exception to this rule, specifically.

I updated the top-post to include this detail, thank you for pointing it out.

As far as I can tell, cloud-init isn't actually enabled as a service in Fedora by default. Ignition is enabled by default only on Fedora IoT at the request of that WG.

It is for cloud variants:
https://pagure.io/fedora-kickstarts/blob/main/f/fedora-cloud-base.ks#_37

As far as I can tell, cloud-init isn't actually enabled as a service in Fedora by default. Ignition is enabled by default only on Fedora IoT at the request of that WG.

It is for cloud variants:
https://pagure.io/fedora-kickstarts/blob/main/f/fedora-cloud-base.ks#_37

OK, that was approved by the Cloud WG, so it satisfies "Services that should be enabled or disabled by default only on one or more of the Fedora Editions must be approved by those Editions' Working Groups."

Thanks for locating that; that's the one place I forgot to look yesterday.

I guess I'd feel much more comfortable including this if it was conditionalized (somehow) on only starting if it's actually running in the Azure environment. There are two ways this can work:

  1. The actual agent does the detection and exits with a success (0) code if it determines that the hypervisor is not Azure.
  2. We modify the systemd service to use ConditionVirtualization=microsoft so that it only loads when running under Hyper-V.

Since I'm gathering from the thread above that 1) is not currently the case, the easiest solution would be 2). So if you do that, you'll have my +1 for enabling the service by default (when the package is installed).

I would also prefer to have the service include that condition. Since it's only useful on Hyper-V deployments (Windows Server and Azure), it makes sense to do it anyway.

I opened up a PR on the upstream WALinuxAgent project to see what they think of applying this condition universally:
https://github.com/Azure/WALinuxAgent/pull/2661

With the PR applied, I think this is good for approving as a preset. +1

Same here. I'd be a +1 with the above linked PR applied as a patch in the package and later replaced by the update in the source itself.

The patch has been applied to WALinuxAgent in Fedora now, so can we get this voted for so we can ship the preset?

+1 from me. Thanks for keeping this going, @cjp256

Metadata Update from @ngompa:
- Issue untagged with: stalled

a year ago

This appears to be forgotten but technically APPROVED (+3,0,-0).

With https://github.com/Azure/WALinuxAgent/pull/2661 applied, the preset can be enabled.

Metadata Update from @churchyard:
- Issue tagged with: pending announcement

a year ago

Metadata Update from @churchyard:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata