#971 Allow SPDX license expression syntax in the License field.
Opened 2 years ago by dcantrell. Modified 2 months ago
dcantrell/packaging-committee master  into  master

@@ -50,6 +50,10 @@ 

  

  The `+License:+` field must be filled with the appropriate license Short License identifier(s) from the "Good License" tables on the {fedora-licensing} page. If your license does not appear in the tables, it needs to be sent to legal@lists.fedoraproject.org (note that this list is moderated, only members may directly post). If the license is approved, it will be added to the appropriate table.

  

+ As an alternative to the Fedora Short License identifier(s), your package's `+License:+` field may contain a valid https://spdx.org[Software Package Data Exchange] (SPDX) expression using SPDX short names. The expression specification is available https://spdx.org/specifications[here]. The license of licenses is https://spdx.org/license-list[here]. If your package uses licenses not represented in the SPDX list, you should stick with the Fedora Short License identifier(s).

+ 

+ NOTE: Do not mix Fedora Short License identifier(s) and expression syntax with SPDX syntax.

+ 

  === "Distributable"

  

  In the past, Fedora (and Red Hat Linux) packages have used "Distributable" in the `+License:+` field. In virtually all of these cases, this was not correct. Fedora no longer permits packages to use "Distributable" as a valid License. If your package contains content which is freely redistributable without restrictions, but does not contain any license other than explicit permission from the content owner/creator, then that package can use "Freely redistributable without restriction" as its `+License:+` identifier.
@@ -62,6 +66,8 @@ 

  

  Some licenses include the version as part of the Short License Identifier. This is only done when multiple versions of the license differ in significant ways (e.g. one revision is GPLv2 incompatible, while a later version is not). Be careful to ensure that you use the correct Short License Identifier, as shown in the tables on the {fedora-licensing} page.

  

+ If using SPDX license expression syntax, be sure to use the correct identifier corresponding to the version(s) of the license that apply to your package.

+ 

  === "or later version" licenses

  

  Some licenses state that either the current version of the license or later versions may be used. It is important to note when a license states this. When a license has an "or later version" clause, we note that by appending a + to the Short License Identifier.
@@ -82,9 +88,15 @@ 

  License: MPLv1.1 or GPLv2+

  ....

  

+ Example with SPDX syntax:

+ 

+ ....

+ License: MPL-1.1 OR GPL-2.0-or-later

+ ....

+ 

  === Multiple Licensing Scenarios

  

- If your package contains files which are under multiple, distinct, and independent licenses, then the spec must reflect this by using "and" as a separator. Fedora maintainers are highly encouraged to avoid this scenario whenever reasonably possible, by dividing files into subpackages (subpackages can each have their own `+License:+` field).

+ If your package contains files which are under multiple, distinct, and independent licenses, then the spec must reflect this by using "and" as a separator (SPDX syntax uses "AND" as a separator). Fedora maintainers are highly encouraged to avoid this scenario whenever reasonably possible, by dividing files into subpackages (subpackages can each have their own `+License:+` field).

  

  Example:

  Package bar-utils contains some files under the Python License, some other files under the GNU Lesser General Public License v2 or later, and one file under the BSD License (no advertising). The package spec must have:
@@ -129,12 +141,18 @@ 

  If you are unlucky enough that your package possesses items multiple, distinct, and independent licenses...AND some of those items are dual licensed, you must note the dual licensed items by wrapping them with parenthesis (). Otherwise, the guidelines for Dual and Multiple Licensing apply.

  

  Example:

- Package baz-utils contains some files under the Python License, some other files under the GNU Lesser General Public License v2 or later, one file under the BSD License, no advertising, and one file which is dual licensed as Mozilla Public License v1.1 and GNU General Public License v2 or later. The package spec must have:

+ Package baz-utils contains some files under the Python License, some other files under the GNU Lesser General Public License v2 or later, one file under the BSD 3-clause License, no advertising, and one file which is dual licensed as Mozilla Public License v1.1 and GNU General Public License v2 or later. The package spec must have:

  

  ....

  License: Python and LGPLv2+ and BSD and (MPLv1.1 or GPLv2+)

  ....

  

+ Example with SPDX syntax:

+ 

+ ....

+ License: Python-2.0 AND LGPL-2.0-or-later AND BSD-3-Clause AND (MPL-1.1 OR GPL-2.0-or-later)

+ ....

+ 

  Since this is a multiple licensing scenario, the package must contain a comment explaining the multiple licensing breakdown. The actual implementation of this is left to the maintainer.

  

  === Mixed Source Licensing Scenario
@@ -147,6 +165,12 @@ 

  License: Python and (BSD with advertising and QPL)

  ....

  

+ Example with SPDX syntax:

+ 

+ ....

+ License: Python-2.0 AND (BSD-4-Clause AND QPL-1.0)

+ ....

+ 

  == Public Domain

  

  Works which are clearly marked as being in the Public Domain, and for which no evidence is known to contradict this statement, are treated in Fedora as being in the Public Domain, on the grounds that the intentions of the original creator are reflected by such a use, even if due to regional issues, it may not have been possible for the original creator to fully abandon all of their their copyrights on the work and place it fully into the Public Domain. If you believe that a work in Fedora which is marked as being in the Public Domain is actually available under a copyright license, please inform us of this fact with details, and we will immediately investigate the claim.

The Software Package Data Exchange (https://spdx.org/) is a Linux
Foundation project with overlap with the Fedora license expression
syntax. Many projects are beginning to add SPDX-Identifier notation
to source files and use SPDX short names. Fedora spec files should
allow this in the License field as well.

Some limitations:

  • Fedora packagers need to use either the Fedora syntax or the SPDX
    syntax; they cannot mix the two.

  • SPDX is more expressive with regard to GPL, LGPL, and BSD variants
    which may require packagers to look at the code in detail again.
    For license short names where we have combined all in to one short
    name and SPDX breaks it out, we should cross reference the License
    tag expression with the source if a packager changes to SPDX syntax.

  • SPDX offers a list of license exceptions you can put in the
    expression using the WITH keyword[1]. Fedora syntax tends to put
    that as freeform in the License expression. Where there is an SPDX
    identifier, that should be used.

  • If a current Fedora license expression cannot be translated to SPDX,
    it should remain using the Fedora syntax.

  • "Redistributable" is not a valid SPDX identifier.

  • "Public Domain" is not a valid SPDX identifier[2].

It would be nice to offer Fedora packagers the option of using SPDX
syntax or our existing syntax. The SPDX expressions are easier to
validate programatically too.

[1] https://spdx.org/licenses/exceptions-index.html
[2] https://wiki.spdx.org/view/Legal_Team/Decisions/Dealing_with_Public_Domain_within_SPDX_Files

Is this approved by legal?

I have been talking to Richard Fontana about this recently and he at least seemed open to the idea. I don't think legal can rely on the License tag expressions in spec files anyway, so I don't know that they have a strong opinion on the subject or not. This policy is more about keeping ourselves honest.

I can ask for his input on this PR.

Please don't ask a specific person, but rather on the legal list. This has been discussed there back and forth. See for example https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org/thread/YT7A6MROI3CZNNEKO6RKLD3GG7NNL2LU/

I'm reluctant to permit mixing of the two systems. The Fedora syntax is notably different from SPDX in a number of key ways. The biggest challenge for Fedora is that SPDX treats all variants of common licenses like BSD and MIT as unique licenses, where in Fedora, we just refer to those as MIT or BSD. SPDX does not currently have most of the variants in their listing.

The correct path forward is not to allow mixing, but to:

A) Determine what Fedora is going to do about the MIT/BSD problem.
We either:
1) permit Fedora packages to use MIT/BSD to refer to any and all MIT/BSD variants (defying
SPDX methodology)
2) identify every MIT/BSD variant and get SPDX to give each of them a unique identifier, and
enforce their correct usage.
B) Determine how we will address Firmware licensing. I don't know if SPDX has firmware licenses in their database (or wants them). (This is what we gently refer to as "Redistributable").
C) Determine how we will address Public Domain works. Fedora permits PD works, but as you note, SPDX does not have a license entry for them.
D) Synchronize our naming schema with SPDX's naming schema (we're not too far off, but because we use GPLv2+ and they use GPL-2.0-or-later, this is going to impact a VERY large number of packages. Depending on how we address the MIT/BSD dilemma, this could be almost every package in Fedora.
E) Determine a generous window of time (after Fedora 32, but not impacting the RHEL cycle) during which maintainers will be strongly encouraged to audit and update the License tags. (It would be really helpful if we had a way of knowing when this happened, but I can't think of a way off hand). Packages which are not fixed in that time should start the non-responsive maintainer process.

This is a lot of work. I haven't been inclined to do it because, to be honest, the payoff is really low. That said, I'm certainly open to discussing if/how we transition. Probably needs to be on FESCo, not FPC.

Please don't ask a specific person, but rather on the legal list. This has been discussed there back and forth. See for example https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org/thread/YT7A6MROI3CZNNEKO6RKLD3GG7NNL2LU/

I've joined the legal mailing list. I didn't know there had been a discussion going on there for this.

I'm reluctant to permit mixing of the two systems. The Fedora syntax is notably different from SPDX in a number of key ways. The biggest challenge for Fedora is that SPDX treats all variants of common licenses like BSD and MIT as unique licenses, where in Fedora, we just refer to those as MIT or BSD. SPDX does not currently have most of the variants in their listing.

I find the SPDX breakdown more accurate, especially with regard to the BSD variations.

The correct path forward is not to allow mixing, but to:
A) Determine what Fedora is going to do about the MIT/BSD problem.
We either:
1) permit Fedora packages to use MIT/BSD to refer to any and all MIT/BSD variants (defying
SPDX methodology)
2) identify every MIT/BSD variant and get SPDX to give each of them a unique identifier, and
enforce their correct usage.

I must be missing something here. What BSD variants are missing from the current SPDX list? Putting copyright information aside and only looking at the licensing terms, SPDX captures every one I have encountered across about 1200 packages I've looked at (basically a minimal desktop install). I'm not sure what other variants are missing here.

B) Determine how we will address Firmware licensing. I don't know if SPDX has firmware licenses in their database (or wants them). (This is what we gently refer to as "Redistributable").

I talked with rfontana about this specifically. I feel that "Redistributable, no modification allowed" is not really useful or even accurate. It doesn't explain anything clearly and what it does say we can't do, you could argue we've done. Compared to the Microsoft Core Fonts, for example. Where the MS EULA allows redistribution of those fonts only in their original form. So you can run cabextract on the exec files and pull out the ttf files and repackage them. Can we do that for firmware? Is that modification? If not, why not?

I would like to see something specifically written to address firmware redistribution. I think working with SPDX would be the appropriate thing here.

C) Determine how we will address Public Domain works. Fedora permits PD works, but as you note, SPDX does not have a license entry for them.

I also talked with rfontana about this one too. My thought here is that using "Public Domain" in the License tag in a spec file is wrong. We're aren't disclaiming all rights to that SRPM. If anything we are taking things available to us in the public domain and redistributing them under an open license. I feel we should be doing that to at least clarify the terms on our stuff around public domain source (e.g., spec files, build scripts, patches, etc) and solve the problem of public domain concept variances across jurisdictions.

There is the Unlicense (https://unlicense.org/) which kind of works to address that. Might be a starting point.

D) Synchronize our naming schema with SPDX's naming schema (we're not too far off, but because we use GPLv2+ and they use GPL-2.0-or-later, this is going to impact a VERY large number of packages. Depending on how we address the MIT/BSD dilemma, this could be almost every package in Fedora.
E) Determine a generous window of time (after Fedora 32, but not impacting the RHEL cycle) during which maintainers will be strongly encouraged to audit and update the License tags. (It would be really helpful if we had a way of knowing when this happened, but I can't think of a way off hand). Packages which are not fixed in that time should start the non-responsive maintainer process.
This is a lot of work. I haven't been inclined to do it because, to be honest, the payoff is really low. That said, I'm certainly open to discussing if/how we transition. Probably needs to be on FESCo, not FPC.

Heh...not FPC, but FESCo. If I bring up something with FESCo it's supposed to go elsewhere. Too much policy and procedure at times.

My motivation for permitting this change is that the Fedora license string expressions are not easily parseable and prone to failure. I validate license expressions in rpmdiff and rpminspect and want that to be more reliable. The SPDX expressions are parseable.

That said, I have managed to work around the Fedora license strings and can validate them. I would like SPDX expressions to be an option for package maintainers but not have a hard cut over. It could be gradual over time. It doesn't need to happen all at once.

FWIW, I'm supportive of dcantrell's idea, primarily because there is growing benefit for Red Hat in having a convenient way for downstream license metadata to be SPDX conformant expressions. The benefit admittedly has nothing to do with Fedora whatsoever. I hadn't thought of mixing the two systems before, and I can understand spot's objection to that (could that be addressed partially by having an SPDX prefix or something like that for those cases where there's an attempt to use SPDX syntax?). I also find the mixed solution attractive because it seems to me now that such a gradual solution is the only way we would ever see Fedora adopt the use of SPDX expressions instead of the Fedora-native strings.

AIUI, SPDX is poised to adopt a proposal for namespaces which could be a way to address the problem of there being licenses reflected in Fedora package license tags that have no corresponding official SPDX identifier. It could also perhaps provide a shorter-term solution to things like the MIT/BSD problem (imagine, e.g. "LicenseRef-Fedora-MIT" defined to mean approximately the same thing as present day Fedora MIT from an SPDX universe sort of lens -- though maybe at that point one has to question the benefit of switching to SPDX expressions at all, unless it's just the better-defined syntax and easier parseability, which I wouldn't discount).

Did this ever go anywhere? I'm going to go ahead and close this PR but if there is FESCo buy-in and some plan for mass-switching packages over to a new scheme then please feel to re-open or submit a new one.

Pull-Request has been closed by tibbs

3 months ago

Funny timing, as I meant to post here last week!

In any case, my name is Jilayne and I joined Red Hat legal earlier this year and have also been a co-lead of the SPDX legal team for about 10 years. I've spoken to Richard Fontana and Spot about this over the years and was happy to see David's proposal, which I'd like to revive with some ideas below.

First, some pertinent background: Around 2013-14, SPDX-legal undertook reviewing and adding many licenses on the Fedora Good list to the SPDX License List to enable adoption of SPDX identifiers. As a result, many of the Fedora Good licenses (including a fair number of the Fedora MIT and BSD category licenses) can be represented by SPDX license identifiers or expressions. SPDX-legal recently updated a comparison document with a current version of the Fedora good list and looked at the new licenses added since the 2013-14 work. (see https://docs.google.com/spreadsheets/d/1fi5SVzyCAL0UDravvkS6Us4lFwRiQy-l3qTUEkY92U0/edit#gid=243613621)

To build on the comments above, here are some ideas as to process:
1) Start using SPDX identifiers for all new packages. SPDX can represent ~80% of the current Fedora good list, so this should be relatively easy. An easy way to check a license to see if it's on the SPDX License List is to use this SPDX license-diff browser plugin (works with Chrome and FireFox, also at: https://github.com/spdx/spdx-license-diff)

2) If package maintainers come across a new package with a new license that is not on the Fedora Good list, then the same determination as to it being free/open would be made and if it's allowable by Fedora, then submit the license to be added to the SPDX License List (full description of process can be found here

3) Look at licenses on the Fedora Good list that are known to not be in the SPDX License List in the comparison document: check to see if that license still exists in Fedora (some are quite old). If so, then submit to SPDX to add.

4) Pick away at updating the license tag for existing packages: start with low-hanging fruit, i.e., the licenses for which there is a 1:1 match. Maybe there is some way to automate this?

5) For the various "category" ids that Fedora uses (e.g., exceptions, MIT, BSD, Public Domain, Copyright only) - it's essentially the same as above: look at actual text of license or exception, see if there is a match to on the SPDX License List: if so, then use it. If not, then submit to SPDX to add.
As far as worrying that the latter option may result in a flood of new license submissions to SPDX - that may happen, we don't really know until we get there. We could consider pooling some of these to see how many there are, submit en masse, and then SPDX can determine how best to deal with it in the case there are a lot.

Most importantly, I think that cross-collaboration between the two communities will be key. There may be ways to automate things, and we will want to think through coordination for new license submissions from Fedora so that coverage continues as new packages are added. I'm confident that with the sharp minds of each community looking at whatever challenges arise, we can figure out the most efficient way to collaborate.

Looking forward to hearing your thoughts!

Jilayne

I'm interested in helping move this forward. @tibbs what do you think would be the best approach?

Pull-Request has been reopened by tibbs

3 months ago

So, I've been hacking away at porting over spec-cleaner to our packaging policies off and on for a few years now. Among other things, it converts license identifiers to SPDX too, so we could leverage that to speed through a lot of the trivial stuff.

@ngompa - a tool that coverts the Fedora identifiers to SPDX for the easy-to-convert licenses would be great. I poked around the repo a bit, but not being super technically savvy, I might need a bit more info as to how it works and how the data is being pulled.

We have to be a bit careful to only replace the identifiers for which there is an exact match - that is the Fedora identifier represents one license and ignore. Put another way, where Fedora uses an identifier for a category of licenses (i.e., MIT, BSD, Public Domain, Copyright Only, GPL with exceptions) - those do not lend themselves to a find-and-swap and will need a bit more investigation.

If it would be helpful, I could add a column to the compare spreadsheet and mark which ids are an one-to-one match and thus eligible for a simple find-and-swap. Is that something that would be helpful here?

So from the perspective of packaging guidelines and this committee, what's important is that we clearly tell packagers what they need to do. Honestly the actual packaging guidelines don't need to change all that much, as evidenced by the small size of this PR, though there are open issues such as whether we would continue to mention the "Fedora" license identifiers at all. Personally I think if we're going to change then we should just bite the bullet and just say "MUST use SPDX" but maybe I'm missing a good reason to keep the "old" identifiers as an option. I do think the Fedora syntax is superior because the capitalized "AND" and "OR" just make it harder to pick out the actual license IDs and make the whole thing look more like a bunch of yelling. But I'm sure that ship has long since sailed.

From the larger perspective, there's certainly more involved. There needs to be a good resource like the current license identifier list. (Maybe there is already; I don't know.) The work of getting basically all existing Fedora licenses into SPDX needs to happen before actually asking maintainers to convert, and I don't think it's at all fair to ask the individual maintainers to do that themselves. Automated conversion needs to be prepared and simply mass-applied where possible so in the common case maintainers don't need to do anything at all. Existing packager tools need to be adapted to handle this stuff where they don't already.

And sure, that's a lot of work. Certainly some of it needs to go through the change process. I think most of it is doable. The FPC change is probably somewhere in the middle of the process.

I'd like to chip in REUSE as a best practice that incorporates SPDX license identifiers, but in a frame that is very oriented towards developers. It provides a tutorial, FAQs, tools and much more.

For instance, KDE adopted it and also made their frameworks REUSE compliant, and the Linux kernel itself is ~70% REUSE compliant already.

Some questions raised here are also covered in REUSE already, e.g. licenses not in SPDX.

There are some tools already to help with mass-conversion, e.g. licensedigger, and also a plugin for ScanCode that was used earlier to convert notices in the Linux Kernel to SPDX.

So from the perspective of packaging guidelines and this committee, what's important is that we clearly tell packagers what they need to do. Honestly the actual packaging guidelines don't need to change all that much, as evidenced by the small size of this PR, though there are open issues such as whether we would continue to mention the "Fedora" license identifiers at all.

I think this PR would need some further updates. I'm not sure if it's easier to simply make a new PR or build off of this one?

Personally I think if we're going to change then we should just bite the bullet and just say "MUST use SPDX" but maybe I'm missing a good reason to keep the "old" identifiers as an option.

Agree. I think there will be a natural transition period during which Fedora or SPDX identifiers would be valid, as it's not possible to update all existing identifiers instantly.

I do think the Fedora syntax is superior because the capitalized "AND" and "OR" just make it harder to pick out the actual license IDs and make the whole thing look more like a bunch of yelling. But I'm sure that ship has long since sailed.

I think that ship has sailed, but I need to do some checking on capitalization validity on the SPDX side.

From the larger perspective, there's certainly more involved. There needs to be a good resource like the current license identifier list. (Maybe there is already; I don't know.)

What are you referring to specifically re: the current license identifier list?

I'm also aware that the rpminspect data will need to be completed as to SPDX identifiers. Haven't started on that, as I'm wondering if there is a way to leverage the compare spreadsheet I have to automate that a bit?

The work of getting basically all existing Fedora licenses into SPDX needs to happen before actually asking maintainers to convert, and I don't think it's at all fair to ask the individual maintainers to do that themselves.

I'm assuming you mean the licenses we have already identified as on the Fedora good list, but not on SPDX in the spreadsheet - there is a bit of chicken-egg on that one, in my view. Some of those licenses on the Fedora list were never added to SPDX back in 2013-14 because they were really old or we couldn't find the actual license text, so some research is needed first. That being said, I'd guess that those licenses don't represent a very big stumbling block to adoption of SPDX ids, if that makes sense.

Automated conversion needs to be prepared and simply mass-applied where possible so in the common case maintainers don't need to do anything at all. Existing packager tools need to be adapted to handle this stuff where they don't already.

And sure, that's a lot of work. Certainly some of it needs to go through the change process. I think most of it is doable. The FPC change is probably somewhere in the middle of the process.

What do you see as next steps in terms of approval by Fedora to get this started?

@ngompa - a tool that coverts the Fedora identifiers to SPDX for the easy-to-convert licenses would be great. I poked around the repo a bit, but not being super technically savvy, I might need a bit more info as to how it works and how the data is being pulled.

We have to be a bit careful to only replace the identifiers for which there is an exact match - that is the Fedora identifier represents one license and ignore. Put another way, where Fedora uses an identifier for a category of licenses (i.e., MIT, BSD, Public Domain, Copyright Only, GPL with exceptions) - those do not lend themselves to a find-and-swap and will need a bit more investigation.

If it would be helpful, I could add a column to the compare spreadsheet and mark which ids are an one-to-one match and thus eligible for a simple find-and-swap. Is that something that would be helpful here?

For sure, yes!

I do think the Fedora syntax is superior because the capitalized "AND" and "OR" just make it harder to pick out the actual license IDs and make the whole thing look more like a bunch of yelling. But I'm sure that ship has long since sailed.

I think that ship has sailed, but I need to do some checking on capitalization validity on the SPDX side.

My understanding is that this is actually not mandatory. Earlier iterations of SPDX identifier boolean logic used lowercase instead of uppercase, and I believe lowercase is still permitted. Frankly, I would not want to force uppercase for logical operands (and, or, with).

I think this PR would need some further updates. I'm not sure if it's easier to simply make a new PR or build off of this one?

I think if the work isn't being done by the person who originally opened the PR then perhaps it's better to have a new one. But whatever works.

Agree. I think there will be a natural transition period during which Fedora or SPDX identifiers would be valid, as it's not possible to update all existing identifiers instantly.

Note that packaging guidelines changes aren't naturally retroactive; we don't ask every maintainer to change packages just because we changed the guidelines. In effect, they only really apply to new packages and then if someone wants to put in the work to identify packages to change and then actually change them then they are welcome to do so. There is existing distro around making mass changes like this.

What are you referring to specifically re: the current license identifier list?

The list that Fedora Legal maintains, which I think you refer to as the "Fedora good list".

I'm assuming you mean the licenses we have already identified as on the Fedora good list, but not on SPDX in the spreadsheet - there is a bit of chicken-egg on that one, in my view.

Well, that and doing things like making sure all of the existing things Fedora calls "MIT" actually exist.

In any case, this ticket shows that the chicken exists already. If Fedora is going to convert, then every license that's used for any package in Fedora needs to be in there. It's simply not going to work to tell packagers they both need to fix the packages manually and do bureaucracy to get SPDX identifiers allocated.

If, on the other hand, the reality is that some of the licenses Fedora-legal has collected aren't actually used, then I guess it would be wasted effort to allocate identifiers for them

It's pretty much trivial to grep the current Fedora specfile corpus and see all of the license identifiers in use. It's more work to find everything that uses the more-granular Fedora IDs like "MIT" and track down just which variant they use.

What do you see as next steps in terms of approval by Fedora to get this started?

  • Identify exactly what end result is desired. (For example, will we forever live with a mix of Fedora and SPDX tags or will everything be converted?)
  • Decide who is going to do this work.
  • Start with FESCo and the change process. A packaging guidelines change will be part of that process.

Once that's done then the real work starts. And sure, there will be an uncomfortable period where the tools haven't caught up and things complain and where some packages don't get converted correctly but the idea is to have a plan in place and the people available to get it done. What I'm trying to emphasize is that it just doesn't work to throw all of it on the package maintainers (or even all that much of it).

And for the record I'll note that I personally am pretty much ambivalent towards SPDX, but I'll do what I can to help if FESCo, during the feature process, decides that it's worth the effort.

I'll also note that one thing I don't want to repeat from SUSE when they adopted SPDX is that I don't want us to adopt SPDX identifiers directly. That is, I don't want us saying "go look at SPDX for the identifiers". That made things a mess when SPDX changed the identifiers and created all kinds of confusion. (Cf. GPL-3.0+ to GPL-3.0-or-later situation) . So like the Linux kernel project, instead of us pointing to SPDX, I want us to just create a new short identifier list that happens to be SPDX identifiers. That event, a long with a few other things, soured me personally on SPDX, and I still don't particularly care for it. But I realized a while ago that I'm alone in this, and I'll try to help make this as painless as possible.

Note that we already have multiple tools in Fedora that do the opposite translation (i.e. SPDX → Fedora License identifiers), for example, in the rust2rpm.licensing python module. The data backing up the translation function uses a CSV table that maps SPDX ←→ Fedora identifiers, and writing a function that does the inverse translation should be trivial.

I'll also note that one thing I don't want to repeat from SUSE when they adopted SPDX is that I don't want us to adopt SPDX identifiers directly. That is, I don't want us saying "go look at SPDX for the identifiers". That made things a mess when SPDX changed the identifiers and created all kinds of confusion. (Cf. GPL-3.0+ to GPL-3.0-or-later situation) . So like the Linux kernel project, instead of us pointing to SPDX, I want us to just create a new short identifier list that happens to be SPDX identifiers. That event, a long with a few other things, soured me personally on SPDX, and I still don't particularly care for it. But I realized a while ago that I'm alone in this, and I'll try to help make this as painless as possible.

Naturally, I don't agree and think that creating another list of identifiers (essentially) is a bad idea - even if they happen to be the same. SPDX has always made a commitment to keep the short identifiers stable and only make a change in extenuating situations.

Re: the GPL-3.0+ to GPL-3.0-or-later change specifically, I can assure you that was a very difficult and lengthy decision. The Linux kernel has adopted SPDX and does not have their own short identifiers. If you hadn't seen, these articles may provide some insight:
https://spdx.dev/license-list-3-0-released/
https://www.fsf.org/blogs/rms/rms-article-for-claritys-sake-please-dont-say-licensed-under-gnu-gpl-2
https://www.gnu.org/licenses/identify-licenses-clearly.html

The kernel still uses the old GPL identifiers (e.g., GPL-3.0 and GPL-3.0+) because at the point at which that process was getting started, the identifiers hadn't changed yet. As a result, you will see both GPL-3.0+ and GPL-3.0-or-later in the kernel and both are valid.

Note that we already have multiple tools in Fedora that do the opposite translation (i.e. SPDX → Fedora License identifiers), for example, in the rust2rpm.licensing python module. The data backing up the translation function uses a CSV table that maps SPDX ←→ Fedora identifiers, and writing a function that does the inverse translation should be trivial.

Can you point me to that data / CSV table? Do you know who did the work or the process for determining the match/compare?
Thanks!

The python module that does the translation is here:
https://pagure.io/fedora-rust/rust2rpm/blob/master/f/rust2rpm/licensing.py

The project is packaged as "python3-rust2rpm" and the module can be imported with "import rust2rpm.licensing" in python.

The CSV data is in the same repository, here:
https://pagure.io/fedora-rust/rust2rpm/blob/master/f/rust2rpm/spdx_to_fedora.csv

Looks like @zbyszek was the one who made this contribution:
https://pagure.io/fedora-rust/rust2rpm/c/73998d6adc11ce013bd3cc432e52cb308319170d?branch=master

Yes, the table in rust2rpm is based on the spreadsheet that @jlovejoy mentioned. Since the initial version, it was occasionally updated with new licenses accepted by fedora-legal, when requested when needed for some new rust package. (It's a manual processes because the Fedora list does not provide any easy way to follow additions.) The commits to the license list include justifications, so it should be easy to trace what was changed and why.

I'd support switching to the SPDX identifiers in Fedora. The main reason is that SPDX is fairly widely used (The kernel, various other projects, rust cargo files, python setup.py files, etc.), and if we use the exact same list in Fedora, packagers don't have to do any conversion or even think too much about the license. It's also easier to quickly check if the project is compatible with Fedora.

But the switch cannot require a "flag day" where we switch to the new syntax. We would need the two syntaxes to coexist for a few years. It is also very important that it remains unambiguous which syntax is used ("traditional Fedora" or SPDX). Thus, if we allow SPDX in the License field, I think we should prefix it like License: SPDX: <spdx-expression-here>. Then we can slowly convert packages to the new syntax.

In the cases where the mapping is 1:1, we could convert packages in a mass package update, e.g. change License: GPLv2+License: SPDX: GPL-2.0-or-later, License: GPLv3+License: SPDX: GPL-3.0-or-later, etc.

For the cases where the mapping is n:1 the conversion would need to happen manually, i.e. the maintainer would review the actual license and pick the appropriate SPDX tag. If there is no "flag day", this conversion would happen at maintainer leisure.

For the cases where there is no SPDX tag, we could keep the Fedora tags for a while. If we want to convert to new syntax, we could either accept the local syntax LicenseRef-Fedora-MIT, and define that as referring to whatever "License: MIT" means now, or ask for new licenses to be added to the SDPX list.

At some point far in the future, if the new syntax is established and used exclusively, we could drop the "SPDX:" prefix.

But the switch cannot require a "flag day" where we switch to the new syntax. We would need the two syntaxes to coexist for a few years. It is also very important that it remains unambiguous which syntax is used ("traditional Fedora" or SPDX). Thus, if we allow SPDX in the License field, I think we should prefix it like License: SPDX: <spdx-expression-here>. Then we can slowly convert packages to the new syntax.

This is overkill. For the unambiguous identifiers, we can just update the ones on the license list to use SPDX-based ones. We've already done that in the past when we updated the CDDL identifiers.

I don't think it's overkill. In particular, it addresses the point raised by spot above:

It would be really helpful if we had a way of knowing when this [the conversion] happened, but I can't think of a way off hand.

Without a clear annotation what syntax is used, people and automatic checkers will be confused.

I don't think it's overkill. In particular, it addresses the point raised by spot above:

It would be really helpful if we had a way of knowing when this [the conversion] happened, but I can't think of a way off hand.

Without a clear annotation what syntax is used, people and automatic checkers will be confused.

But why does this matter if we're just updating Fedora identifiers to match SPDX ones? We can't adopt SPDX directly anyway due to the problems around BSD and MIT variants, so we could just start by changing the unambiguous ones and doing some mass cleanups in one go.

I'll also note that one thing I don't want to repeat from SUSE when they adopted SPDX is that I don't want us to adopt SPDX identifiers directly. That is, I don't want us saying "go look at SPDX for the identifiers". That made things a mess when SPDX changed the identifiers and created all kinds of confusion. (Cf. GPL-3.0+ to GPL-3.0-or-later situation) . So like the Linux kernel project, instead of us pointing to SPDX, I want us to just create a new short identifier list that happens to be SPDX identifiers. That event, a long with a few other things, soured me personally on SPDX, and I still don't particularly care for it. But I realized a while ago that I'm alone in this, and I'll try to help make this as painless as possible.

Naturally, I don't agree and think that creating another list of identifiers (essentially) is a bad idea - even if they happen to be the same. SPDX has always made a commitment to keep the short identifiers stable and only make a change in extenuating situations.

Re: the GPL-3.0+ to GPL-3.0-or-later change specifically, I can assure you that was a very difficult and lengthy decision. The Linux kernel has adopted SPDX and does not have their own short identifiers. If you hadn't seen, these articles may provide some insight:
https://spdx.dev/license-list-3-0-released/
https://www.fsf.org/blogs/rms/rms-article-for-claritys-sake-please-dont-say-licensed-under-gnu-gpl-2
https://www.gnu.org/licenses/identify-licenses-clearly.html

The kernel still uses the old GPL identifiers (e.g., GPL-3.0 and GPL-3.0+) because at the point at which that process was getting started, the identifiers hadn't changed yet. As a result, you will see both GPL-3.0+ and GPL-3.0-or-later in the kernel and both are valid.

SPDX has made more major changes in the past: changing the boolean logic format, changing the way BSD licenses are described, changing the way exceptions work, etc.

Who is to say there won't be more changes in the future? For example, one I expect to change some day is the Apache-2.0 tag, because it's ambiguous and confusing because of the common parlance of saying Apache <version> for the web server. That's the reason Fedora uses ASL 2.0, for example. When that happens, it'll be an easy change, but an annoying one.

Frankly, it's not sane for us to just say "go to SPDX" because all our tools are guaranteed to break again and again when SPDX makes changes. That was by far the worst aspect of openSUSE adopting SPDX. Even Debian didn't directly adopt SPDX, and instead normalized DEP-5 tags with SPDX identifiers. If part of the goal here is to make it easier to parse licensing information, then we cannot just adopt SPDX by saying "go look at SPDX", because that's not a stable reference.

Having our own list gives us some flexibility here: we can still do our own license review and submit licenses for SPDX recognition asynchronously, and we don't have to block packagers when a new license shows up that Red Hat/Fedora Legal deems okay to ship software. We already know the scheme in which SPDX typically names things, so we can make identifiers that SPDX will likely use, and submit them for inclusion at the same time.

I think the discussion is combining policy change with the mechanics of how we mass change a bunch of License fields in packages. Let's discuss the policy change first and worry about changing every single package later.

Well, I lied, let me discuss my motivation for proposing this. SPDX has existed for a while now and you see it in source files and developers usually are familiar with it. Those who have never seen an SPDX identifier can easily search online and find it. At this point I see SPDX expressions as sufficient for Fedora packaging purposes (where they work, see below before you reply). If the majority of packages start using these license expressions, then Fedora can get out of the business of maintaining the good licenses list.

Note: this doesn't mean Fedora is out of the business of deciding what licenses are acceptable for Fedora or not. It just gets us out of the business of coming up with short identifiers and expression syntax. I view this as a win.

But we already have the Fedora list and our identifiers, so rather than make a flag day change that would mostly annoy a lot of people for no good reason, let's instead adjust our packaging policy to allow SPDX expressions when they work. Say all new packages must use SPDX expressions whenever possible. Exceptions are packages that have an acceptable Fedora license for which there is no SPDX identifier. Call all of the Fedora identifiers deprecated at that point.

The next steps would be what @jlovejoy mentioned. For packages that can change over to SPDX expressions, maintainers can do that. For packages that lack an identifier, let's see if we can get an SPDX one. Etc.

OK, but let's say that we have half of packages using SDPX, and half using Fedora tags. Looking at a given license tag, how does one interpret it? Try it as SDPX, and if the tag is unknown look up the Fedora list? What if SPDX adds the tag later? In particular, consider "MIT": it is a valid SPDX identifier, and it is also on a Fedora list, as an identifier to a bunch of different licenses. Same story for "BSD".

But even ignoring the ambiguous cases, I think it'll be confusing for casual users: they will look at some license string, see that it is a spdx tag, and then look at another case which is not valid spdx, and be confused.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

OK, but let's say that we have half of packages using SDPX, and half using Fedora tags. Looking at a given license tag, how does one interpret it? Try it as SDPX, and if the tag is unknown look up the Fedora list? What if SPDX adds the tag later? In particular, consider "MIT": it is a valid SPDX identifier, and it is also on a Fedora list, as an identifier to a bunch of different licenses. Same story for "BSD".

But even ignoring the ambiguous cases, I think it'll be confusing for casual users: they will look at some license string, see that it is a spdx tag, and then look at another case which is not valid spdx, and be confused.

Ah, ok. So this was a concern of mine. I would say at this point in time, SPDX has more general recognition in the open source world than the Fedora license list. That's not to say that the Fedora short names are not recognized, just that a random sampling of people will likely reveal that SPDX has more recognition these days than other license identifiers. I base this merely on my own experience.

Regarding the question, how is the Fedora short name any less confusing than an SPDX abbreviation to someone new to Fedora or even working in open source? Both identifiers require cross referencing a list. I say "someone new" because for all of us in this thread, we have been working on Fedora for a long time so the Fedora names make total sense to us. The SPDX ones may even look weird or unusual, but I think that's because we've been looking at the Fedora ones for so long and our brains are calling that the baseline to compare against. In the case of SPDX, we gain the benefit of having an external project that is aimed at standardizing license short names and identifiers.

We already know that SPDX cannot represent every Fedora license, but that's not really the point of this PR. The long term goal could be viewed as "hey, it would be nice if SPDX could represent all of our approved licenses" but we aren't there yet. All this PR should be about is amending the policy to permit and encourage the use of SPDX license expressions in the License tag. If that is not possible for the package, the policy should note that it's ok to use the existing Fedora identifier when necessary.

I believe it was mentioned above but SPDX does include a spec which gives us a way to create new identifiers that fit with the SPDX spec. We could apply that to the names that SPDX currently does not represent, but I think that's a separate discussion entirely. For now, if a package maintainer wants to say "License: GPL-2.0-or-later" in the spec file, we should be ok with that.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

The PR is about using SPDX identifiers in place of the Fedora identifier list.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

The PR is about using SPDX identifiers in place of the Fedora identifier list.

I know, I'm saying that your approach is wrong for adopting SPDX identifiers. Instead of confusing people by two different ones, we should just add the relevant ones to our lists and start making our tools suggest those identifiers in place of the older ones.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

The PR is about using SPDX identifiers in place of the Fedora identifier list.

I know, I'm saying that your approach is wrong for adopting SPDX identifiers. Instead of confusing people by two different ones, we should just add the relevant ones to our lists and start making our tools suggest those identifiers in place of the older ones.

Isn't this statement irrevelant for the purposes of a policy change? You're discussing how to implement the policy.

Hi all, great to see so many comments here and sorry for being a bit behind in responding. I'll try to address the various things that have come up all in one post:

re: @ngompa 's tool that converts Fedora identifiers to SPDX for the easy-to-convert, one-to-one identifiers: I have added a column here and indicated those with a "Y" ("N/A" means no work is needed b/c the identifiers are already the same; "category" means Fedora uses an identifier to represent more than one license text and thus the license text of that actual package will need to be inspected) https://docs.google.com/spreadsheets/d/1fi5SVzyCAL0UDravvkS6Us4lFwRiQy-l3qTUEkY92U0/edit#gid=494935126
I have a bit more work to do for some that I need to do a bit of research on, so stay tuned. It might make sense to do the research on the 36 Fedora licenses that are not on SPDX first, and then finish this column

re: @zbyszek description of the process - that is spot on. For the syntax tag, that sounds sensible. When converting, another option would be to use "SPDX-License-Identifier:" as that is the usual syntax used in source files. But in any case, I think the key thing your idea helps with is some kind of way to track what has been updated and what hasn't. The only problem with this approach, is there are ~ 98 license for which the ids are already the same and need no updating. How would that fit into your idea?

For the cases where the mapping is n:1 - indeed, this will require some manual work, most likely including finding the actual license text. This is where I'd expect collaboration with the SPDX community will be key to help with identifying/matching such license texts to SPDX (or shepherding new license requests)

re: @tibbs question as to identifying the end result and the question of policy v. process that @dcantrell raised: I think the policy change would simply be: start using SPDX identifiers going forward
in terms of process: that will be an easy switch as the coverage today is ~80% of the Fedora good list (thanks to a lot of work already done by the SPDX-legal community :) For the 36 Fedora licenses that are known to not be on SPDX, I intend to get started and elicit some help on this research as soon as it seems clear the direction this proposal is going.

Speaking of which - as I stated early on, for this to work best, cross-collaboration b/w the communities is key. I'm just not sure how to best tie things together, but I'm sure that having one person be a go-between is not it. :)

The only problem with this approach, is there are ~ 98 license for which the ids are already the same and need no updating. How would that fit into your idea?

If we decide to go forward with this, I expect that we'll do an automatized mass-update [1] of spec files to automatically convert all packages where the mapping is unambiguous. This would also include the places where the tag is identical. So the change would be either like License: GPLv2+License: SPDX-License-Identifier: GPL-2.0-or-later (1:1 mapping) or License: XorgLicense: SPDX-License-Identifier: Xorg (for identical tags). The packages that have require manual work would be left unchanged, but easy to identify.

[1] https://docs.fedoraproject.org/en-US/fesco/Mass_package_changes/

If we want to mark something as converted, we don't need to do that in the License field. We can put a magic comment in the spec file or commit a text file that indicates it. I don't want to see SPDX-License-Identifier or any other weird goop in the License field.

I think this misses part of the need. Once we begin to overlap SPDX and Fedora license names in the same field (think of these as two distinct sets of names which sometimes overlap), how do programmatic tools tell whether the license expression is intended as Fedora legacy names or SPDX names?

I think this misses part of the need. Once we begin to overlap SPDX and Fedora license names in the same field (think of these as two distinct sets of names which sometimes overlap), how do programmatic tools tell whether the license expression is intended as Fedora legacy names or SPDX names?

SPDX and Fedora have no names that overlap with different meanings. Crucially, there is no technical issue with mixing the two identifier formats except with BSD vs BSD-* variants from SDPX. But even then, there is no BSD identifier in SPDX.

This is being made more complex than it actually is. It is quite likely we will initially need to have a mixture anyway, as all the directly matching identifiers (all but BSD) will trivially remap. Afterward, we'll have to do a second-pass and re-audit all BSD/MIT licensed components to remap their licenses by hand.

@zbyszek is ascribing confusion that I think won't actually be there. @spot and I will have to update RPMLint to accept both names anyway, and updating the license list to have a copy of the SPDX identifiers we're going to use from now on would be a requirement to allow packagers to easily switch over to the "newer" format.

We will also likely need to retain our identifiers for Public Domain, custom redistributable, and similar. A so-called "pure-SPDX" will never happen because SPDX explicitly does not offer anything for this. That's also the case in SUSE, too.

@jlovejoy As a note, the GPL+ mapping to SPDX is probably not an accurate way to describe that license. When GPL is used without a version declaration, it's more accurate to say GPL-any-version than GPL-1.0-or-later. Same goes for the AGPL/LGPL counterparts.

Metadata