#139 Support vendoring
Closed: Fixed 7 months ago by decathorpe. Opened 2 years ago by walters.

Hi, filing this after I saw an IRC discussion. Because RHEL made a decision not to match the Fedora model of "exploded crate" dependencies but instead vendor, it pushed us in at least the CoreOS group and others to avoid doing it both ways and so vendor in Fedora too.

It'd be good if this project added support for that model, and we can also help build best practices around that.


That said, I personally think there's a much better approach, which is something like this: https://hackmd.io/8_EewOxeSqGuNYhPFx1rVg

Where instead of mapping crates into RPMs, we support a model that's more like a license-checked filtered subset of crates.io, and the buildsystem supports cargo build offline from that filtered subset. We could build from something more like a Dockerfile, and generate RPMs (or not) - i.e. this path makes things much better for those who want to build things in Fedora that aren't RPMs.

So you're propsing to run an alternative crate registry in koji, like there (previously?) was an alternative maven repository support in koji?

I'm going to put aside the whole comment there, and point out that @aplanas has been working on a PR to add support for vendoring stuff in #105. However, this will not change the policy for Fedora to recommend, support, or promote the usage of vendoring.

So you're propsing to run an alternative crate registry in koji, like there (previously?) was an alternative maven repository support in koji?

Yes. But a big difference here since then is we have containers and OSBS, so the role of Koji those builds is already much reduced - i.e. in OSBS we accept Dockerfile etc. which has nothing to do with RPM repos etc.

Note: https://docs.fedoraproject.org/en-US/packaging-guidelines/Rust/#_bundled_dependencies

Basically, bundling crate dependencies in Rust packages is already forbidden for Fedora packages, unless it's imposslble to build packages otherwise. Firefox and Thunderbird are two cases. And there's one other package where the maintainer has ignored me and went for bundled dependencies anyway.

All of this discussion applies almost equally well to Go incidentally; GOPROXY can be used in much the same way as a crate mirror.

As far as I can tell, the old implementation added in PR#105 was never used (or never worked?), which is why that was removed in rust-packaging v24 (and nobody complained).

BUT: There is now actual support for building against vendored dependencies in rust-packaging v25.

The %cargo_prep macro supports setting up building against vendored dependencies (i.e. contents of an unpacked vendor tarball) instead of the system registry populated by RPM packages, with the -v path/to/vendor argument.

Additionally, there is a %cargo_vendor_manifest macro that writes the list of crates that are included in a vendor tarball to a file. Using a mechanism similar to the RPM generator for adding bundled()Provides for Go projects, when adding the generated cargo-vendor.txt file to a package by marking it with %license, the virtual Provides for all bundled Rust crates are automatically created.

I submitted a PR for rust-bootupd to use this new support: https://src.fedoraproject.org/rpms/rust-bootupd/pull-request/2

With these changes, the package is compliant with all Fedora Packaging Guidelines (correctly specifying vendored dependencies, correctly specifying license tag including licenses of statically linked Rust dependencies, etc.) - well, except for the "SHOULD build against system libraries".

I think the support code added in rust-packaging v25 should be enough for use cases in Fedora / ELN, so I'm going to close this issue as "fixed". If there's any features still missing from the new macros, please file a new ticket.

Metadata Update from @decathorpe:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

7 months ago

Thanks for working on this. While we don't always agree on everything, I do want to say explicitly that I appreciate your contributions to FOSS!

One thing I wanted to bring up is I would actually like if we could try to better support a "middle ground" where some critical crates are not vendored. There are a few good targets for this: one is e.g. openssl. But perhaps tokio too for example. I'd be a lot happier with this. I think this would be not too hard, but would require some debate for how it works; whether we e.g. have tooling which does surgery on upstream vendor tarballs or whether we try to re-synthesize them.

That said, I also want to still debate:

the virtual Provides for all bundled Rust crates are automatically created.

A huge problem in Fedora is the size of the repository metadata. This forces many people who just e.g. wants to download a kernel security update to fetch all of that data again. (Yes, there's delta metadata, we can't rely on a 100% hit rate for that).

It's a really obscure use case too; anyone who wants to scan for bundling is much better off looking at the actual source code anyways! If we have to meet somewhere, perhaps just one virtual Provides: bundled(rust) or so.

A huge problem in Fedora is the size of the repository metadata. This forces many people who just e.g. wants to download a kernel security update to fetch all of that data again. (Yes, there's delta metadata, we can't rely on a 100% hit rate for that).

It's a really obscure use case too; anyone who wants to scan for bundling is much better off looking at the actual source code anyways! If we have to meet somewhere, perhaps just one virtual Provides: bundled(rust) or so.

This is not optional, sadly. It's a MUST requirement for all bundled dependencies, and Rust is not special here. It's also the only good mechanism we have for determining which packages need to be rebuilt for security updates etc.

As for only bundling certain crates, that should be doable with some additional machinery ... though the mechanism for doing so would need to be different than a registry replacement (maybe "patching" the registry and overriding crate dependencies with path dependencies could work, I've seen packages do this manually in one or two cases).

Fun fact: I checked, and in fact, the dependencies for bootupd are all already packaged for Fedora, except for one (widestring), which is now pending review because I also need it for something else. So in a few days you could even turn off building bootupd against vendored dependencies 😅

A huge problem in Fedora is the size of the repository metadata. This forces many people who just e.g. wants to download a kernel security update to fetch all of that data again. (Yes, there's delta metadata, we can't rely on a 100% hit rate for that).

...
This is not optional, sadly. It's a MUST requirement for all bundled dependencies, and Rust is not special here. It's also the only good mechanism we have for determining which packages need to be rebuilt for security updates etc.

To be clear, policies are not carved on stone tablets. That said, there are good reasons for why we do this in general. As @decathorpe notes, this is at present the only mechanism we have for identifying which packages need a rebuild if a critical security vulnerability turns up.

Now, we can certainly discuss other ways to store and access that information, but this has worked pretty well for many years now (particularly since accessing the data is a trivial dnf repoquery call and requires no additional effort).

Then again, the repodata-based approach largely predates the major swing that the industry has taken towards vendoring (historically, we tend to have a roughly decade-long cycle between "vendoring" and "sharing everything") and you're correct, our repodata has exploded as a result.

On the other hand, high-speed internet connections have gone from being a luxury to a necessity in most developed nations, which mitigates the issue to some extent.

Now, we can certainly discuss other ways to store and access that information,

Filed https://pagure.io/packaging-committee/issue/1309

cargo-auditable would be very useful here: https://github.com/rust-secure-code/cargo-auditable , if the purpose is to detect what executables actually contain any code from a CVE'd package.

Compiling with this tool enabled, which is done easily, just embeds, in the executable, the information about what dependencies have actually been compiled into that executable. Ths information can then be extracted w/ rust-audit-info:
https://crates.io/crates/rust-audit-info.

cargo-vendor, out of the box, overestimates the packages actually depended on significantly. For instance, stratisd's vendor directory has 180 separate dependencies, but on my machine, the actual dependency count is 120. I've seen other packages where the actual/static dependency ratio is a whole lot smaller than stratisd's 2/3.

If you filter the vendor tarfile, then you are not actually bundling all the dependencies which your binary rpm claims and you have proved, by building the executable that these dependencies don't go in your executable.

There's a partial ordering here:

num_packages(cargo-vendor) >= num_packages(filtered cargo-vendor result) >= num_packages(cargo-auditable)

It is possible that cargo-auditable might, due to a bug, omit a dependency actually included. filtered cargo-vendor never can, because if it did, the executable would not have compiled.

That it is necessary for some kind of legal reasons to include, in the bundled(Provides) every package that is in the vendor directory is in direct conflict with the other use of bundled(Provides), for security purposes and checking CVEs.

The RFE this ticket was originally opened for has been implemented. Can further discussion of the general topic please be moved somewhere else, maybe the devel or legal mailing list, where more people actually see it?

Login to comment on this ticket.

Metadata