#3175 Packaging Weights/Pre-trained models for PyTorch
Closed: Invalid a year ago by zbyszek. Opened a year ago by kaitlynabdo.

Hello, the AI/ML SIG is working on packaging PyTorch for fedora. We are currently not packaging crippling packages by not distributing weights and pre-trained because of the current lawsuits surrounding AI models and copyright. We need some guidance on what we should or should not be packaging in light of the current situation.


Please post the question on Fedora legal mailing list (legal@lists.fedoraproject.org). FESCo can help handle any technical questions, but not stuff related to copyright.

I'll close this for now.

Metadata Update from @zbyszek:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

a year ago

For some more context:

We (myself, kaitlyn and some other RH folk) did talk to RH legal about this and their response was that it isn't a legal issue - upstreams are already trusted to license their own content and AI/ML model weights are no different; if there are no license issues with the weights, there are no perceived legal issues. Their claim was that this is a policy issue and therefore a question for FESCo, not legal.

That being said, that conversation was not in public and we don't have a recording of it so the above statement is my memory of what we were told. We will post to Fedora legal so that there is a more public response. Assuming the public response is the same as what I remember being told, we will bring it back to FESCo at that time.

Fedora Legal has to validate licenses. For example, they disallowed the Meta LLaMA license: https://gitlab.com/fedora/legal/fedora-license-data/-/issues/399

Every license for AI/ML models needs to be evaluated by them.

Fedora Legal has to validate licenses. For example, they disallowed the Meta LLaMA license: https://gitlab.com/fedora/legal/fedora-license-data/-/issues/399

Every license for AI/ML models needs to be evaluated by them.

Sure, we're not trying to avoid evaluation of new licenses. LLaMA isn't the only model with license issues and IMHO, that's out of scope for this exact conversation.

We're asking about the concept of including AI/ML model weights as a whole regardless of the license. Are AI/ML model weights considered to be regular, non-code content or are they something special? Is it acceptable to include weights that are licensed in a way that would normally be OK to include in Fedora (e.g mistral's open model weights are apache2) or are all model weights considered not package-able due to concerns about where the input data used to train models may have come from? Is it acceptable to package code which will automatically download pre-trained weights when the model is first used locally?

Apologies for starting this part of the conversation before having a public account of the conversation we previously had inside RH. We will restart the conversation once the legal bits have been discussed in public.

We're asking about the concept of including AI/ML model weights as a whole regardless of the license. Are AI/ML model weights considered to be regular, non-code content or are they something special?

I think they fit broadly as content, and the existing guidelines there are sufficient.

Is it acceptable to include weights that are licensed in a way that would normally be OK to include in Fedora (e.g mistral's open model weights are apache2) or are all model weights considered not package-able due to concerns about where the input data used to train models may have come from?

No, it's fine to package. From the Fedora perspective, if the data complies with our licensing policy, it's sufficient to ship, provided that they are actually useful and usable with Fedora software.

Is it acceptable to package code which will automatically download pre-trained weights when the model is first used locally?

That one might be iffy. Maybe?

Note some of the questions raised by this issue are not entirely Fedora-legal in nature and I think I gave a confusing answer to @tflink accordingly, particularly on the subissue of "Is it acceptable to package code which will automatically download pre-trained weights when the model is first used locally?". See in particular my most recent comment:
https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org/message/DSNUWK2BK6LHQF73WEDED7WSMJ7AY7L7/

I've responded there, but to also post here...

We have game engines with data file downloaders for demo content, we have web browsers that auto-download things on launch, and so on.

If you're really worried about it, tweak pytorch to require configuration or make a prompt when it triggers the first time or something.

We did this with gdb with the debuginfod, and that's probably the closest pattern to go with for this.

But this is not a legal question per se, this is a functionality and philosophy question.

Log in to comment on this ticket.

Metadata