#971 Allow SPDX license expression syntax in the License field.
Closed 2 years ago by dcantrell. Opened 4 years ago by dcantrell.
dcantrell/packaging-committee master  into  master

@@ -50,6 +50,10 @@ 

  

  The `+License:+` field must be filled with the appropriate license Short License identifier(s) from the "Good License" tables on the {fedora-licensing} page. If your license does not appear in the tables, it needs to be sent to legal@lists.fedoraproject.org (note that this list is moderated, only members may directly post). If the license is approved, it will be added to the appropriate table.

  

+ As an alternative to the Fedora Short License identifier(s), your package's `+License:+` field may contain a valid https://spdx.org[Software Package Data Exchange] (SPDX) expression using SPDX short names. The expression specification is available https://spdx.org/specifications[here]. The license of licenses is https://spdx.org/license-list[here]. If your package uses licenses not represented in the SPDX list, you should stick with the Fedora Short License identifier(s).

+ 

+ NOTE: Do not mix Fedora Short License identifier(s) and expression syntax with SPDX syntax.

+ 

  === "Distributable"

  

  In the past, Fedora (and Red Hat Linux) packages have used "Distributable" in the `+License:+` field. In virtually all of these cases, this was not correct. Fedora no longer permits packages to use "Distributable" as a valid License. If your package contains content which is freely redistributable without restrictions, but does not contain any license other than explicit permission from the content owner/creator, then that package can use "Freely redistributable without restriction" as its `+License:+` identifier.
@@ -62,6 +66,8 @@ 

  

  Some licenses include the version as part of the Short License Identifier. This is only done when multiple versions of the license differ in significant ways (e.g. one revision is GPLv2 incompatible, while a later version is not). Be careful to ensure that you use the correct Short License Identifier, as shown in the tables on the {fedora-licensing} page.

  

+ If using SPDX license expression syntax, be sure to use the correct identifier corresponding to the version(s) of the license that apply to your package.

+ 

  === "or later version" licenses

  

  Some licenses state that either the current version of the license or later versions may be used. It is important to note when a license states this. When a license has an "or later version" clause, we note that by appending a + to the Short License Identifier.
@@ -82,9 +88,15 @@ 

  License: MPLv1.1 or GPLv2+

  ....

  

+ Example with SPDX syntax:

+ 

+ ....

+ License: MPL-1.1 OR GPL-2.0-or-later

+ ....

+ 

  === Multiple Licensing Scenarios

  

- If your package contains files which are under multiple, distinct, and independent licenses, then the spec must reflect this by using "and" as a separator. Fedora maintainers are highly encouraged to avoid this scenario whenever reasonably possible, by dividing files into subpackages (subpackages can each have their own `+License:+` field).

+ If your package contains files which are under multiple, distinct, and independent licenses, then the spec must reflect this by using "and" as a separator (SPDX syntax uses "AND" as a separator). Fedora maintainers are highly encouraged to avoid this scenario whenever reasonably possible, by dividing files into subpackages (subpackages can each have their own `+License:+` field).

  

  Example:

  Package bar-utils contains some files under the Python License, some other files under the GNU Lesser General Public License v2 or later, and one file under the BSD License (no advertising). The package spec must have:
@@ -129,12 +141,18 @@ 

  If you are unlucky enough that your package possesses items multiple, distinct, and independent licenses...AND some of those items are dual licensed, you must note the dual licensed items by wrapping them with parenthesis (). Otherwise, the guidelines for Dual and Multiple Licensing apply.

  

  Example:

- Package baz-utils contains some files under the Python License, some other files under the GNU Lesser General Public License v2 or later, one file under the BSD License, no advertising, and one file which is dual licensed as Mozilla Public License v1.1 and GNU General Public License v2 or later. The package spec must have:

+ Package baz-utils contains some files under the Python License, some other files under the GNU Lesser General Public License v2 or later, one file under the BSD 3-clause License, no advertising, and one file which is dual licensed as Mozilla Public License v1.1 and GNU General Public License v2 or later. The package spec must have:

  

  ....

  License: Python and LGPLv2+ and BSD and (MPLv1.1 or GPLv2+)

  ....

  

+ Example with SPDX syntax:

+ 

+ ....

+ License: Python-2.0 AND LGPL-2.0-or-later AND BSD-3-Clause AND (MPL-1.1 OR GPL-2.0-or-later)

+ ....

+ 

  Since this is a multiple licensing scenario, the package must contain a comment explaining the multiple licensing breakdown. The actual implementation of this is left to the maintainer.

  

  === Mixed Source Licensing Scenario
@@ -147,6 +165,12 @@ 

  License: Python and (BSD with advertising and QPL)

  ....

  

+ Example with SPDX syntax:

+ 

+ ....

+ License: Python-2.0 AND (BSD-4-Clause AND QPL-1.0)

+ ....

+ 

  == Public Domain

  

  Works which are clearly marked as being in the Public Domain, and for which no evidence is known to contradict this statement, are treated in Fedora as being in the Public Domain, on the grounds that the intentions of the original creator are reflected by such a use, even if due to regional issues, it may not have been possible for the original creator to fully abandon all of their their copyrights on the work and place it fully into the Public Domain. If you believe that a work in Fedora which is marked as being in the Public Domain is actually available under a copyright license, please inform us of this fact with details, and we will immediately investigate the claim.

The Software Package Data Exchange (https://spdx.org/) is a Linux
Foundation project with overlap with the Fedora license expression
syntax. Many projects are beginning to add SPDX-Identifier notation
to source files and use SPDX short names. Fedora spec files should
allow this in the License field as well.

Some limitations:

  • Fedora packagers need to use either the Fedora syntax or the SPDX
    syntax; they cannot mix the two.

  • SPDX is more expressive with regard to GPL, LGPL, and BSD variants
    which may require packagers to look at the code in detail again.
    For license short names where we have combined all in to one short
    name and SPDX breaks it out, we should cross reference the License
    tag expression with the source if a packager changes to SPDX syntax.

  • SPDX offers a list of license exceptions you can put in the
    expression using the WITH keyword[1]. Fedora syntax tends to put
    that as freeform in the License expression. Where there is an SPDX
    identifier, that should be used.

  • If a current Fedora license expression cannot be translated to SPDX,
    it should remain using the Fedora syntax.

  • "Redistributable" is not a valid SPDX identifier.

  • "Public Domain" is not a valid SPDX identifier[2].

It would be nice to offer Fedora packagers the option of using SPDX
syntax or our existing syntax. The SPDX expressions are easier to
validate programatically too.

[1] https://spdx.org/licenses/exceptions-index.html
[2] https://wiki.spdx.org/view/Legal_Team/Decisions/Dealing_with_Public_Domain_within_SPDX_Files

Is this approved by legal?

I have been talking to Richard Fontana about this recently and he at least seemed open to the idea. I don't think legal can rely on the License tag expressions in spec files anyway, so I don't know that they have a strong opinion on the subject or not. This policy is more about keeping ourselves honest.

I can ask for his input on this PR.

Please don't ask a specific person, but rather on the legal list. This has been discussed there back and forth. See for example https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org/thread/YT7A6MROI3CZNNEKO6RKLD3GG7NNL2LU/

I'm reluctant to permit mixing of the two systems. The Fedora syntax is notably different from SPDX in a number of key ways. The biggest challenge for Fedora is that SPDX treats all variants of common licenses like BSD and MIT as unique licenses, where in Fedora, we just refer to those as MIT or BSD. SPDX does not currently have most of the variants in their listing.

The correct path forward is not to allow mixing, but to:

A) Determine what Fedora is going to do about the MIT/BSD problem.
We either:
1) permit Fedora packages to use MIT/BSD to refer to any and all MIT/BSD variants (defying
SPDX methodology)
2) identify every MIT/BSD variant and get SPDX to give each of them a unique identifier, and
enforce their correct usage.
B) Determine how we will address Firmware licensing. I don't know if SPDX has firmware licenses in their database (or wants them). (This is what we gently refer to as "Redistributable").
C) Determine how we will address Public Domain works. Fedora permits PD works, but as you note, SPDX does not have a license entry for them.
D) Synchronize our naming schema with SPDX's naming schema (we're not too far off, but because we use GPLv2+ and they use GPL-2.0-or-later, this is going to impact a VERY large number of packages. Depending on how we address the MIT/BSD dilemma, this could be almost every package in Fedora.
E) Determine a generous window of time (after Fedora 32, but not impacting the RHEL cycle) during which maintainers will be strongly encouraged to audit and update the License tags. (It would be really helpful if we had a way of knowing when this happened, but I can't think of a way off hand). Packages which are not fixed in that time should start the non-responsive maintainer process.

This is a lot of work. I haven't been inclined to do it because, to be honest, the payoff is really low. That said, I'm certainly open to discussing if/how we transition. Probably needs to be on FESCo, not FPC.

Please don't ask a specific person, but rather on the legal list. This has been discussed there back and forth. See for example https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org/thread/YT7A6MROI3CZNNEKO6RKLD3GG7NNL2LU/

I've joined the legal mailing list. I didn't know there had been a discussion going on there for this.

I'm reluctant to permit mixing of the two systems. The Fedora syntax is notably different from SPDX in a number of key ways. The biggest challenge for Fedora is that SPDX treats all variants of common licenses like BSD and MIT as unique licenses, where in Fedora, we just refer to those as MIT or BSD. SPDX does not currently have most of the variants in their listing.

I find the SPDX breakdown more accurate, especially with regard to the BSD variations.

The correct path forward is not to allow mixing, but to:
A) Determine what Fedora is going to do about the MIT/BSD problem.
We either:
1) permit Fedora packages to use MIT/BSD to refer to any and all MIT/BSD variants (defying
SPDX methodology)
2) identify every MIT/BSD variant and get SPDX to give each of them a unique identifier, and
enforce their correct usage.

I must be missing something here. What BSD variants are missing from the current SPDX list? Putting copyright information aside and only looking at the licensing terms, SPDX captures every one I have encountered across about 1200 packages I've looked at (basically a minimal desktop install). I'm not sure what other variants are missing here.

B) Determine how we will address Firmware licensing. I don't know if SPDX has firmware licenses in their database (or wants them). (This is what we gently refer to as "Redistributable").

I talked with rfontana about this specifically. I feel that "Redistributable, no modification allowed" is not really useful or even accurate. It doesn't explain anything clearly and what it does say we can't do, you could argue we've done. Compared to the Microsoft Core Fonts, for example. Where the MS EULA allows redistribution of those fonts only in their original form. So you can run cabextract on the exec files and pull out the ttf files and repackage them. Can we do that for firmware? Is that modification? If not, why not?

I would like to see something specifically written to address firmware redistribution. I think working with SPDX would be the appropriate thing here.

C) Determine how we will address Public Domain works. Fedora permits PD works, but as you note, SPDX does not have a license entry for them.

I also talked with rfontana about this one too. My thought here is that using "Public Domain" in the License tag in a spec file is wrong. We're aren't disclaiming all rights to that SRPM. If anything we are taking things available to us in the public domain and redistributing them under an open license. I feel we should be doing that to at least clarify the terms on our stuff around public domain source (e.g., spec files, build scripts, patches, etc) and solve the problem of public domain concept variances across jurisdictions.

There is the Unlicense (https://unlicense.org/) which kind of works to address that. Might be a starting point.

D) Synchronize our naming schema with SPDX's naming schema (we're not too far off, but because we use GPLv2+ and they use GPL-2.0-or-later, this is going to impact a VERY large number of packages. Depending on how we address the MIT/BSD dilemma, this could be almost every package in Fedora.
E) Determine a generous window of time (after Fedora 32, but not impacting the RHEL cycle) during which maintainers will be strongly encouraged to audit and update the License tags. (It would be really helpful if we had a way of knowing when this happened, but I can't think of a way off hand). Packages which are not fixed in that time should start the non-responsive maintainer process.
This is a lot of work. I haven't been inclined to do it because, to be honest, the payoff is really low. That said, I'm certainly open to discussing if/how we transition. Probably needs to be on FESCo, not FPC.

Heh...not FPC, but FESCo. If I bring up something with FESCo it's supposed to go elsewhere. Too much policy and procedure at times.

My motivation for permitting this change is that the Fedora license string expressions are not easily parseable and prone to failure. I validate license expressions in rpmdiff and rpminspect and want that to be more reliable. The SPDX expressions are parseable.

That said, I have managed to work around the Fedora license strings and can validate them. I would like SPDX expressions to be an option for package maintainers but not have a hard cut over. It could be gradual over time. It doesn't need to happen all at once.

FWIW, I'm supportive of dcantrell's idea, primarily because there is growing benefit for Red Hat in having a convenient way for downstream license metadata to be SPDX conformant expressions. The benefit admittedly has nothing to do with Fedora whatsoever. I hadn't thought of mixing the two systems before, and I can understand spot's objection to that (could that be addressed partially by having an SPDX prefix or something like that for those cases where there's an attempt to use SPDX syntax?). I also find the mixed solution attractive because it seems to me now that such a gradual solution is the only way we would ever see Fedora adopt the use of SPDX expressions instead of the Fedora-native strings.

AIUI, SPDX is poised to adopt a proposal for namespaces which could be a way to address the problem of there being licenses reflected in Fedora package license tags that have no corresponding official SPDX identifier. It could also perhaps provide a shorter-term solution to things like the MIT/BSD problem (imagine, e.g. "LicenseRef-Fedora-MIT" defined to mean approximately the same thing as present day Fedora MIT from an SPDX universe sort of lens -- though maybe at that point one has to question the benefit of switching to SPDX expressions at all, unless it's just the better-defined syntax and easier parseability, which I wouldn't discount).

Did this ever go anywhere? I'm going to go ahead and close this PR but if there is FESCo buy-in and some plan for mass-switching packages over to a new scheme then please feel to re-open or submit a new one.

Pull-Request has been closed by tibbs

2 years ago

Funny timing, as I meant to post here last week!

In any case, my name is Jilayne and I joined Red Hat legal earlier this year and have also been a co-lead of the SPDX legal team for about 10 years. I've spoken to Richard Fontana and Spot about this over the years and was happy to see David's proposal, which I'd like to revive with some ideas below.

First, some pertinent background: Around 2013-14, SPDX-legal undertook reviewing and adding many licenses on the Fedora Good list to the SPDX License List to enable adoption of SPDX identifiers. As a result, many of the Fedora Good licenses (including a fair number of the Fedora MIT and BSD category licenses) can be represented by SPDX license identifiers or expressions. SPDX-legal recently updated a comparison document with a current version of the Fedora good list and looked at the new licenses added since the 2013-14 work. (see https://docs.google.com/spreadsheets/d/1fi5SVzyCAL0UDravvkS6Us4lFwRiQy-l3qTUEkY92U0/edit#gid=243613621)

To build on the comments above, here are some ideas as to process:
1) Start using SPDX identifiers for all new packages. SPDX can represent ~80% of the current Fedora good list, so this should be relatively easy. An easy way to check a license to see if it's on the SPDX License List is to use this SPDX license-diff browser plugin (works with Chrome and FireFox, also at: https://github.com/spdx/spdx-license-diff)

2) If package maintainers come across a new package with a new license that is not on the Fedora Good list, then the same determination as to it being free/open would be made and if it's allowable by Fedora, then submit the license to be added to the SPDX License List (full description of process can be found here

3) Look at licenses on the Fedora Good list that are known to not be in the SPDX License List in the comparison document: check to see if that license still exists in Fedora (some are quite old). If so, then submit to SPDX to add.

4) Pick away at updating the license tag for existing packages: start with low-hanging fruit, i.e., the licenses for which there is a 1:1 match. Maybe there is some way to automate this?

5) For the various "category" ids that Fedora uses (e.g., exceptions, MIT, BSD, Public Domain, Copyright only) - it's essentially the same as above: look at actual text of license or exception, see if there is a match to on the SPDX License List: if so, then use it. If not, then submit to SPDX to add.
As far as worrying that the latter option may result in a flood of new license submissions to SPDX - that may happen, we don't really know until we get there. We could consider pooling some of these to see how many there are, submit en masse, and then SPDX can determine how best to deal with it in the case there are a lot.

Most importantly, I think that cross-collaboration between the two communities will be key. There may be ways to automate things, and we will want to think through coordination for new license submissions from Fedora so that coverage continues as new packages are added. I'm confident that with the sharp minds of each community looking at whatever challenges arise, we can figure out the most efficient way to collaborate.

Looking forward to hearing your thoughts!

Jilayne

I'm interested in helping move this forward. @tibbs what do you think would be the best approach?

Pull-Request has been reopened by tibbs

2 years ago

So, I've been hacking away at porting over spec-cleaner to our packaging policies off and on for a few years now. Among other things, it converts license identifiers to SPDX too, so we could leverage that to speed through a lot of the trivial stuff.

@ngompa - a tool that coverts the Fedora identifiers to SPDX for the easy-to-convert licenses would be great. I poked around the repo a bit, but not being super technically savvy, I might need a bit more info as to how it works and how the data is being pulled.

We have to be a bit careful to only replace the identifiers for which there is an exact match - that is the Fedora identifier represents one license and ignore. Put another way, where Fedora uses an identifier for a category of licenses (i.e., MIT, BSD, Public Domain, Copyright Only, GPL with exceptions) - those do not lend themselves to a find-and-swap and will need a bit more investigation.

If it would be helpful, I could add a column to the compare spreadsheet and mark which ids are an one-to-one match and thus eligible for a simple find-and-swap. Is that something that would be helpful here?

So from the perspective of packaging guidelines and this committee, what's important is that we clearly tell packagers what they need to do. Honestly the actual packaging guidelines don't need to change all that much, as evidenced by the small size of this PR, though there are open issues such as whether we would continue to mention the "Fedora" license identifiers at all. Personally I think if we're going to change then we should just bite the bullet and just say "MUST use SPDX" but maybe I'm missing a good reason to keep the "old" identifiers as an option. I do think the Fedora syntax is superior because the capitalized "AND" and "OR" just make it harder to pick out the actual license IDs and make the whole thing look more like a bunch of yelling. But I'm sure that ship has long since sailed.

From the larger perspective, there's certainly more involved. There needs to be a good resource like the current license identifier list. (Maybe there is already; I don't know.) The work of getting basically all existing Fedora licenses into SPDX needs to happen before actually asking maintainers to convert, and I don't think it's at all fair to ask the individual maintainers to do that themselves. Automated conversion needs to be prepared and simply mass-applied where possible so in the common case maintainers don't need to do anything at all. Existing packager tools need to be adapted to handle this stuff where they don't already.

And sure, that's a lot of work. Certainly some of it needs to go through the change process. I think most of it is doable. The FPC change is probably somewhere in the middle of the process.

I'd like to chip in REUSE as a best practice that incorporates SPDX license identifiers, but in a frame that is very oriented towards developers. It provides a tutorial, FAQs, tools and much more.

For instance, KDE adopted it and also made their frameworks REUSE compliant, and the Linux kernel itself is ~70% REUSE compliant already.

Some questions raised here are also covered in REUSE already, e.g. licenses not in SPDX.

There are some tools already to help with mass-conversion, e.g. licensedigger, and also a plugin for ScanCode that was used earlier to convert notices in the Linux Kernel to SPDX.

So from the perspective of packaging guidelines and this committee, what's important is that we clearly tell packagers what they need to do. Honestly the actual packaging guidelines don't need to change all that much, as evidenced by the small size of this PR, though there are open issues such as whether we would continue to mention the "Fedora" license identifiers at all.

I think this PR would need some further updates. I'm not sure if it's easier to simply make a new PR or build off of this one?

Personally I think if we're going to change then we should just bite the bullet and just say "MUST use SPDX" but maybe I'm missing a good reason to keep the "old" identifiers as an option.

Agree. I think there will be a natural transition period during which Fedora or SPDX identifiers would be valid, as it's not possible to update all existing identifiers instantly.

I do think the Fedora syntax is superior because the capitalized "AND" and "OR" just make it harder to pick out the actual license IDs and make the whole thing look more like a bunch of yelling. But I'm sure that ship has long since sailed.

I think that ship has sailed, but I need to do some checking on capitalization validity on the SPDX side.

From the larger perspective, there's certainly more involved. There needs to be a good resource like the current license identifier list. (Maybe there is already; I don't know.)

What are you referring to specifically re: the current license identifier list?

I'm also aware that the rpminspect data will need to be completed as to SPDX identifiers. Haven't started on that, as I'm wondering if there is a way to leverage the compare spreadsheet I have to automate that a bit?

The work of getting basically all existing Fedora licenses into SPDX needs to happen before actually asking maintainers to convert, and I don't think it's at all fair to ask the individual maintainers to do that themselves.

I'm assuming you mean the licenses we have already identified as on the Fedora good list, but not on SPDX in the spreadsheet - there is a bit of chicken-egg on that one, in my view. Some of those licenses on the Fedora list were never added to SPDX back in 2013-14 because they were really old or we couldn't find the actual license text, so some research is needed first. That being said, I'd guess that those licenses don't represent a very big stumbling block to adoption of SPDX ids, if that makes sense.

Automated conversion needs to be prepared and simply mass-applied where possible so in the common case maintainers don't need to do anything at all. Existing packager tools need to be adapted to handle this stuff where they don't already.

And sure, that's a lot of work. Certainly some of it needs to go through the change process. I think most of it is doable. The FPC change is probably somewhere in the middle of the process.

What do you see as next steps in terms of approval by Fedora to get this started?

@ngompa - a tool that coverts the Fedora identifiers to SPDX for the easy-to-convert licenses would be great. I poked around the repo a bit, but not being super technically savvy, I might need a bit more info as to how it works and how the data is being pulled.

We have to be a bit careful to only replace the identifiers for which there is an exact match - that is the Fedora identifier represents one license and ignore. Put another way, where Fedora uses an identifier for a category of licenses (i.e., MIT, BSD, Public Domain, Copyright Only, GPL with exceptions) - those do not lend themselves to a find-and-swap and will need a bit more investigation.

If it would be helpful, I could add a column to the compare spreadsheet and mark which ids are an one-to-one match and thus eligible for a simple find-and-swap. Is that something that would be helpful here?

For sure, yes!

I do think the Fedora syntax is superior because the capitalized "AND" and "OR" just make it harder to pick out the actual license IDs and make the whole thing look more like a bunch of yelling. But I'm sure that ship has long since sailed.

I think that ship has sailed, but I need to do some checking on capitalization validity on the SPDX side.

My understanding is that this is actually not mandatory. Earlier iterations of SPDX identifier boolean logic used lowercase instead of uppercase, and I believe lowercase is still permitted. Frankly, I would not want to force uppercase for logical operands (and, or, with).

I think this PR would need some further updates. I'm not sure if it's easier to simply make a new PR or build off of this one?

I think if the work isn't being done by the person who originally opened the PR then perhaps it's better to have a new one. But whatever works.

Agree. I think there will be a natural transition period during which Fedora or SPDX identifiers would be valid, as it's not possible to update all existing identifiers instantly.

Note that packaging guidelines changes aren't naturally retroactive; we don't ask every maintainer to change packages just because we changed the guidelines. In effect, they only really apply to new packages and then if someone wants to put in the work to identify packages to change and then actually change them then they are welcome to do so. There is existing distro around making mass changes like this.

What are you referring to specifically re: the current license identifier list?

The list that Fedora Legal maintains, which I think you refer to as the "Fedora good list".

I'm assuming you mean the licenses we have already identified as on the Fedora good list, but not on SPDX in the spreadsheet - there is a bit of chicken-egg on that one, in my view.

Well, that and doing things like making sure all of the existing things Fedora calls "MIT" actually exist.

In any case, this ticket shows that the chicken exists already. If Fedora is going to convert, then every license that's used for any package in Fedora needs to be in there. It's simply not going to work to tell packagers they both need to fix the packages manually and do bureaucracy to get SPDX identifiers allocated.

If, on the other hand, the reality is that some of the licenses Fedora-legal has collected aren't actually used, then I guess it would be wasted effort to allocate identifiers for them

It's pretty much trivial to grep the current Fedora specfile corpus and see all of the license identifiers in use. It's more work to find everything that uses the more-granular Fedora IDs like "MIT" and track down just which variant they use.

What do you see as next steps in terms of approval by Fedora to get this started?

  • Identify exactly what end result is desired. (For example, will we forever live with a mix of Fedora and SPDX tags or will everything be converted?)
  • Decide who is going to do this work.
  • Start with FESCo and the change process. A packaging guidelines change will be part of that process.

Once that's done then the real work starts. And sure, there will be an uncomfortable period where the tools haven't caught up and things complain and where some packages don't get converted correctly but the idea is to have a plan in place and the people available to get it done. What I'm trying to emphasize is that it just doesn't work to throw all of it on the package maintainers (or even all that much of it).

And for the record I'll note that I personally am pretty much ambivalent towards SPDX, but I'll do what I can to help if FESCo, during the feature process, decides that it's worth the effort.

I'll also note that one thing I don't want to repeat from SUSE when they adopted SPDX is that I don't want us to adopt SPDX identifiers directly. That is, I don't want us saying "go look at SPDX for the identifiers". That made things a mess when SPDX changed the identifiers and created all kinds of confusion. (Cf. GPL-3.0+ to GPL-3.0-or-later situation) . So like the Linux kernel project, instead of us pointing to SPDX, I want us to just create a new short identifier list that happens to be SPDX identifiers. That event, a long with a few other things, soured me personally on SPDX, and I still don't particularly care for it. But I realized a while ago that I'm alone in this, and I'll try to help make this as painless as possible.

Note that we already have multiple tools in Fedora that do the opposite translation (i.e. SPDX → Fedora License identifiers), for example, in the rust2rpm.licensing python module. The data backing up the translation function uses a CSV table that maps SPDX ←→ Fedora identifiers, and writing a function that does the inverse translation should be trivial.

I'll also note that one thing I don't want to repeat from SUSE when they adopted SPDX is that I don't want us to adopt SPDX identifiers directly. That is, I don't want us saying "go look at SPDX for the identifiers". That made things a mess when SPDX changed the identifiers and created all kinds of confusion. (Cf. GPL-3.0+ to GPL-3.0-or-later situation) . So like the Linux kernel project, instead of us pointing to SPDX, I want us to just create a new short identifier list that happens to be SPDX identifiers. That event, a long with a few other things, soured me personally on SPDX, and I still don't particularly care for it. But I realized a while ago that I'm alone in this, and I'll try to help make this as painless as possible.

Naturally, I don't agree and think that creating another list of identifiers (essentially) is a bad idea - even if they happen to be the same. SPDX has always made a commitment to keep the short identifiers stable and only make a change in extenuating situations.

Re: the GPL-3.0+ to GPL-3.0-or-later change specifically, I can assure you that was a very difficult and lengthy decision. The Linux kernel has adopted SPDX and does not have their own short identifiers. If you hadn't seen, these articles may provide some insight:
https://spdx.dev/license-list-3-0-released/
https://www.fsf.org/blogs/rms/rms-article-for-claritys-sake-please-dont-say-licensed-under-gnu-gpl-2
https://www.gnu.org/licenses/identify-licenses-clearly.html

The kernel still uses the old GPL identifiers (e.g., GPL-3.0 and GPL-3.0+) because at the point at which that process was getting started, the identifiers hadn't changed yet. As a result, you will see both GPL-3.0+ and GPL-3.0-or-later in the kernel and both are valid.

Note that we already have multiple tools in Fedora that do the opposite translation (i.e. SPDX → Fedora License identifiers), for example, in the rust2rpm.licensing python module. The data backing up the translation function uses a CSV table that maps SPDX ←→ Fedora identifiers, and writing a function that does the inverse translation should be trivial.

Can you point me to that data / CSV table? Do you know who did the work or the process for determining the match/compare?
Thanks!

The python module that does the translation is here:
https://pagure.io/fedora-rust/rust2rpm/blob/master/f/rust2rpm/licensing.py

The project is packaged as "python3-rust2rpm" and the module can be imported with "import rust2rpm.licensing" in python.

The CSV data is in the same repository, here:
https://pagure.io/fedora-rust/rust2rpm/blob/master/f/rust2rpm/spdx_to_fedora.csv

Looks like @zbyszek was the one who made this contribution:
https://pagure.io/fedora-rust/rust2rpm/c/73998d6adc11ce013bd3cc432e52cb308319170d?branch=master

Yes, the table in rust2rpm is based on the spreadsheet that @jlovejoy mentioned. Since the initial version, it was occasionally updated with new licenses accepted by fedora-legal, when requested when needed for some new rust package. (It's a manual processes because the Fedora list does not provide any easy way to follow additions.) The commits to the license list include justifications, so it should be easy to trace what was changed and why.

I'd support switching to the SPDX identifiers in Fedora. The main reason is that SPDX is fairly widely used (The kernel, various other projects, rust cargo files, python setup.py files, etc.), and if we use the exact same list in Fedora, packagers don't have to do any conversion or even think too much about the license. It's also easier to quickly check if the project is compatible with Fedora.

But the switch cannot require a "flag day" where we switch to the new syntax. We would need the two syntaxes to coexist for a few years. It is also very important that it remains unambiguous which syntax is used ("traditional Fedora" or SPDX). Thus, if we allow SPDX in the License field, I think we should prefix it like License: SPDX: <spdx-expression-here>. Then we can slowly convert packages to the new syntax.

In the cases where the mapping is 1:1, we could convert packages in a mass package update, e.g. change License: GPLv2+License: SPDX: GPL-2.0-or-later, License: GPLv3+License: SPDX: GPL-3.0-or-later, etc.

For the cases where the mapping is n:1 the conversion would need to happen manually, i.e. the maintainer would review the actual license and pick the appropriate SPDX tag. If there is no "flag day", this conversion would happen at maintainer leisure.

For the cases where there is no SPDX tag, we could keep the Fedora tags for a while. If we want to convert to new syntax, we could either accept the local syntax LicenseRef-Fedora-MIT, and define that as referring to whatever "License: MIT" means now, or ask for new licenses to be added to the SDPX list.

At some point far in the future, if the new syntax is established and used exclusively, we could drop the "SPDX:" prefix.

But the switch cannot require a "flag day" where we switch to the new syntax. We would need the two syntaxes to coexist for a few years. It is also very important that it remains unambiguous which syntax is used ("traditional Fedora" or SPDX). Thus, if we allow SPDX in the License field, I think we should prefix it like License: SPDX: <spdx-expression-here>. Then we can slowly convert packages to the new syntax.

This is overkill. For the unambiguous identifiers, we can just update the ones on the license list to use SPDX-based ones. We've already done that in the past when we updated the CDDL identifiers.

I don't think it's overkill. In particular, it addresses the point raised by spot above:

It would be really helpful if we had a way of knowing when this [the conversion] happened, but I can't think of a way off hand.

Without a clear annotation what syntax is used, people and automatic checkers will be confused.

I don't think it's overkill. In particular, it addresses the point raised by spot above:

It would be really helpful if we had a way of knowing when this [the conversion] happened, but I can't think of a way off hand.

Without a clear annotation what syntax is used, people and automatic checkers will be confused.

But why does this matter if we're just updating Fedora identifiers to match SPDX ones? We can't adopt SPDX directly anyway due to the problems around BSD and MIT variants, so we could just start by changing the unambiguous ones and doing some mass cleanups in one go.

I'll also note that one thing I don't want to repeat from SUSE when they adopted SPDX is that I don't want us to adopt SPDX identifiers directly. That is, I don't want us saying "go look at SPDX for the identifiers". That made things a mess when SPDX changed the identifiers and created all kinds of confusion. (Cf. GPL-3.0+ to GPL-3.0-or-later situation) . So like the Linux kernel project, instead of us pointing to SPDX, I want us to just create a new short identifier list that happens to be SPDX identifiers. That event, a long with a few other things, soured me personally on SPDX, and I still don't particularly care for it. But I realized a while ago that I'm alone in this, and I'll try to help make this as painless as possible.

Naturally, I don't agree and think that creating another list of identifiers (essentially) is a bad idea - even if they happen to be the same. SPDX has always made a commitment to keep the short identifiers stable and only make a change in extenuating situations.

Re: the GPL-3.0+ to GPL-3.0-or-later change specifically, I can assure you that was a very difficult and lengthy decision. The Linux kernel has adopted SPDX and does not have their own short identifiers. If you hadn't seen, these articles may provide some insight:
https://spdx.dev/license-list-3-0-released/
https://www.fsf.org/blogs/rms/rms-article-for-claritys-sake-please-dont-say-licensed-under-gnu-gpl-2
https://www.gnu.org/licenses/identify-licenses-clearly.html

The kernel still uses the old GPL identifiers (e.g., GPL-3.0 and GPL-3.0+) because at the point at which that process was getting started, the identifiers hadn't changed yet. As a result, you will see both GPL-3.0+ and GPL-3.0-or-later in the kernel and both are valid.

SPDX has made more major changes in the past: changing the boolean logic format, changing the way BSD licenses are described, changing the way exceptions work, etc.

Who is to say there won't be more changes in the future? For example, one I expect to change some day is the Apache-2.0 tag, because it's ambiguous and confusing because of the common parlance of saying Apache <version> for the web server. That's the reason Fedora uses ASL 2.0, for example. When that happens, it'll be an easy change, but an annoying one.

Frankly, it's not sane for us to just say "go to SPDX" because all our tools are guaranteed to break again and again when SPDX makes changes. That was by far the worst aspect of openSUSE adopting SPDX. Even Debian didn't directly adopt SPDX, and instead normalized DEP-5 tags with SPDX identifiers. If part of the goal here is to make it easier to parse licensing information, then we cannot just adopt SPDX by saying "go look at SPDX", because that's not a stable reference.

Having our own list gives us some flexibility here: we can still do our own license review and submit licenses for SPDX recognition asynchronously, and we don't have to block packagers when a new license shows up that Red Hat/Fedora Legal deems okay to ship software. We already know the scheme in which SPDX typically names things, so we can make identifiers that SPDX will likely use, and submit them for inclusion at the same time.

I think the discussion is combining policy change with the mechanics of how we mass change a bunch of License fields in packages. Let's discuss the policy change first and worry about changing every single package later.

Well, I lied, let me discuss my motivation for proposing this. SPDX has existed for a while now and you see it in source files and developers usually are familiar with it. Those who have never seen an SPDX identifier can easily search online and find it. At this point I see SPDX expressions as sufficient for Fedora packaging purposes (where they work, see below before you reply). If the majority of packages start using these license expressions, then Fedora can get out of the business of maintaining the good licenses list.

Note: this doesn't mean Fedora is out of the business of deciding what licenses are acceptable for Fedora or not. It just gets us out of the business of coming up with short identifiers and expression syntax. I view this as a win.

But we already have the Fedora list and our identifiers, so rather than make a flag day change that would mostly annoy a lot of people for no good reason, let's instead adjust our packaging policy to allow SPDX expressions when they work. Say all new packages must use SPDX expressions whenever possible. Exceptions are packages that have an acceptable Fedora license for which there is no SPDX identifier. Call all of the Fedora identifiers deprecated at that point.

The next steps would be what @jlovejoy mentioned. For packages that can change over to SPDX expressions, maintainers can do that. For packages that lack an identifier, let's see if we can get an SPDX one. Etc.

OK, but let's say that we have half of packages using SDPX, and half using Fedora tags. Looking at a given license tag, how does one interpret it? Try it as SDPX, and if the tag is unknown look up the Fedora list? What if SPDX adds the tag later? In particular, consider "MIT": it is a valid SPDX identifier, and it is also on a Fedora list, as an identifier to a bunch of different licenses. Same story for "BSD".

But even ignoring the ambiguous cases, I think it'll be confusing for casual users: they will look at some license string, see that it is a spdx tag, and then look at another case which is not valid spdx, and be confused.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

OK, but let's say that we have half of packages using SDPX, and half using Fedora tags. Looking at a given license tag, how does one interpret it? Try it as SDPX, and if the tag is unknown look up the Fedora list? What if SPDX adds the tag later? In particular, consider "MIT": it is a valid SPDX identifier, and it is also on a Fedora list, as an identifier to a bunch of different licenses. Same story for "BSD".

But even ignoring the ambiguous cases, I think it'll be confusing for casual users: they will look at some license string, see that it is a spdx tag, and then look at another case which is not valid spdx, and be confused.

Ah, ok. So this was a concern of mine. I would say at this point in time, SPDX has more general recognition in the open source world than the Fedora license list. That's not to say that the Fedora short names are not recognized, just that a random sampling of people will likely reveal that SPDX has more recognition these days than other license identifiers. I base this merely on my own experience.

Regarding the question, how is the Fedora short name any less confusing than an SPDX abbreviation to someone new to Fedora or even working in open source? Both identifiers require cross referencing a list. I say "someone new" because for all of us in this thread, we have been working on Fedora for a long time so the Fedora names make total sense to us. The SPDX ones may even look weird or unusual, but I think that's because we've been looking at the Fedora ones for so long and our brains are calling that the baseline to compare against. In the case of SPDX, we gain the benefit of having an external project that is aimed at standardizing license short names and identifiers.

We already know that SPDX cannot represent every Fedora license, but that's not really the point of this PR. The long term goal could be viewed as "hey, it would be nice if SPDX could represent all of our approved licenses" but we aren't there yet. All this PR should be about is amending the policy to permit and encourage the use of SPDX license expressions in the License tag. If that is not possible for the package, the policy should note that it's ok to use the existing Fedora identifier when necessary.

I believe it was mentioned above but SPDX does include a spec which gives us a way to create new identifiers that fit with the SPDX spec. We could apply that to the names that SPDX currently does not represent, but I think that's a separate discussion entirely. For now, if a package maintainer wants to say "License: GPL-2.0-or-later" in the spec file, we should be ok with that.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

The PR is about using SPDX identifiers in place of the Fedora identifier list.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

The PR is about using SPDX identifiers in place of the Fedora identifier list.

I know, I'm saying that your approach is wrong for adopting SPDX identifiers. Instead of confusing people by two different ones, we should just add the relevant ones to our lists and start making our tools suggest those identifiers in place of the older ones.

We should not even talk about it being SPDX in the first place. Just simply update the identifiers on our list and note the legacy identifier and start updating things. Talking about SPDX vs non-SPDX identifiers is irrelevant and pointless.

Packagers care about two things:

Whether it's SPDX, Fedora, or the Goof Troop who defines them is irrelevant.

The PR is about using SPDX identifiers in place of the Fedora identifier list.

I know, I'm saying that your approach is wrong for adopting SPDX identifiers. Instead of confusing people by two different ones, we should just add the relevant ones to our lists and start making our tools suggest those identifiers in place of the older ones.

Isn't this statement irrevelant for the purposes of a policy change? You're discussing how to implement the policy.

Hi all, great to see so many comments here and sorry for being a bit behind in responding. I'll try to address the various things that have come up all in one post:

re: @ngompa 's tool that converts Fedora identifiers to SPDX for the easy-to-convert, one-to-one identifiers: I have added a column here and indicated those with a "Y" ("N/A" means no work is needed b/c the identifiers are already the same; "category" means Fedora uses an identifier to represent more than one license text and thus the license text of that actual package will need to be inspected) https://docs.google.com/spreadsheets/d/1fi5SVzyCAL0UDravvkS6Us4lFwRiQy-l3qTUEkY92U0/edit#gid=494935126
I have a bit more work to do for some that I need to do a bit of research on, so stay tuned. It might make sense to do the research on the 36 Fedora licenses that are not on SPDX first, and then finish this column

re: @zbyszek description of the process - that is spot on. For the syntax tag, that sounds sensible. When converting, another option would be to use "SPDX-License-Identifier:" as that is the usual syntax used in source files. But in any case, I think the key thing your idea helps with is some kind of way to track what has been updated and what hasn't. The only problem with this approach, is there are ~ 98 license for which the ids are already the same and need no updating. How would that fit into your idea?

For the cases where the mapping is n:1 - indeed, this will require some manual work, most likely including finding the actual license text. This is where I'd expect collaboration with the SPDX community will be key to help with identifying/matching such license texts to SPDX (or shepherding new license requests)

re: @tibbs question as to identifying the end result and the question of policy v. process that @dcantrell raised: I think the policy change would simply be: start using SPDX identifiers going forward
in terms of process: that will be an easy switch as the coverage today is ~80% of the Fedora good list (thanks to a lot of work already done by the SPDX-legal community :) For the 36 Fedora licenses that are known to not be on SPDX, I intend to get started and elicit some help on this research as soon as it seems clear the direction this proposal is going.

Speaking of which - as I stated early on, for this to work best, cross-collaboration b/w the communities is key. I'm just not sure how to best tie things together, but I'm sure that having one person be a go-between is not it. :)

The only problem with this approach, is there are ~ 98 license for which the ids are already the same and need no updating. How would that fit into your idea?

If we decide to go forward with this, I expect that we'll do an automatized mass-update [1] of spec files to automatically convert all packages where the mapping is unambiguous. This would also include the places where the tag is identical. So the change would be either like License: GPLv2+License: SPDX-License-Identifier: GPL-2.0-or-later (1:1 mapping) or License: XorgLicense: SPDX-License-Identifier: Xorg (for identical tags). The packages that have require manual work would be left unchanged, but easy to identify.

[1] https://docs.fedoraproject.org/en-US/fesco/Mass_package_changes/

If we want to mark something as converted, we don't need to do that in the License field. We can put a magic comment in the spec file or commit a text file that indicates it. I don't want to see SPDX-License-Identifier or any other weird goop in the License field.

I think this misses part of the need. Once we begin to overlap SPDX and Fedora license names in the same field (think of these as two distinct sets of names which sometimes overlap), how do programmatic tools tell whether the license expression is intended as Fedora legacy names or SPDX names?

I think this misses part of the need. Once we begin to overlap SPDX and Fedora license names in the same field (think of these as two distinct sets of names which sometimes overlap), how do programmatic tools tell whether the license expression is intended as Fedora legacy names or SPDX names?

SPDX and Fedora have no names that overlap with different meanings. Crucially, there is no technical issue with mixing the two identifier formats except with BSD vs BSD-* variants from SDPX. But even then, there is no BSD identifier in SPDX.

This is being made more complex than it actually is. It is quite likely we will initially need to have a mixture anyway, as all the directly matching identifiers (all but BSD) will trivially remap. Afterward, we'll have to do a second-pass and re-audit all BSD/MIT licensed components to remap their licenses by hand.

@zbyszek is ascribing confusion that I think won't actually be there. @spot and I will have to update RPMLint to accept both names anyway, and updating the license list to have a copy of the SPDX identifiers we're going to use from now on would be a requirement to allow packagers to easily switch over to the "newer" format.

We will also likely need to retain our identifiers for Public Domain, custom redistributable, and similar. A so-called "pure-SPDX" will never happen because SPDX explicitly does not offer anything for this. That's also the case in SUSE, too.

@jlovejoy As a note, the GPL+ mapping to SPDX is probably not an accurate way to describe that license. When GPL is used without a version declaration, it's more accurate to say GPL-any-version than GPL-1.0-or-later. Same goes for the AGPL/LGPL counterparts.

SPDX and Fedora have no names that overlap with different meanings

That is a good point.

If we want to mark something as converted, we don't need to do that in the License field. We can put a magic comment in the spec file or commit a text file that indicates it.

OK, I've come around to accept this. There should be no confusion and indeed I was overthinking this. I guess we don't even need to add comments to the spec files: if we do a mass update of the spec files, we can keep a list of packages that have been converted, and then there'll be less noise in the spec files.

We will also likely need to retain our identifiers for Public Domain, custom redistributable, and similar. A so-called "pure-SPDX" will never happen because SPDX explicitly does not offer anything for this. That's also the case in SUSE, too.

I was looking at the license list, and we have 4k binary packages that use 'Public Domain' (usually in combination with other licenses). Asking SPDX to add all those variants and inspecting those packages manually to disambiguate the exact version is just infeasible.

But since 'Public Domain' is usually used in a more complex expression, we will want to convert the rest to the SPDX tags and syntax. So we need some SPDX-compatible format that will allow us to express that. @jlovejoy what would you recommend? This includes examples like pl: (GPLv2+ with exceptions or Artistic 2.0) and (GPL+ or Artistic) and (BSD or GPL+) and TCL and UCD and MIT and BSD and Public Domain, and mutt: GPLv2+ and Public Domain. Can we express the public domain part in spdx expression as LicenseRef-Fedora-Public-Domain?

So from the perspective of packaging guidelines and this committee, what's important is that we clearly tell packagers what they need to do.

Yeah. I think that (if we do this) we should say that SPDX tags and syntax should be used, and only allow exceptions for the cases like "Public Domain" which we can't express in SDPX. I think the packaging docs should enumerate those specific cases and how to deal with them.

For the 36 Fedora licenses that are known to not be on SPDX, I intend to get started and elicit some help on this research as soon as it seems clear the direction this proposal is going.

We should figure this out before doing the conversion.
I looked into "ZPLv1.0" and it is not used in F35.
Similarly for "Webmin" — I think if we had Webmin in Fedora, it was ages ago.
So maybe let's cross out the ones that aren't used first, and then chip away at the others.

I looked into "ZPLv1.0" and it is not used in F35.

We don't have anything using the Zope Public License v1.0 anymore? Do none of the remaining Zope components use that license?

Similarly for "Webmin" — I think if we had Webmin in Fedora, it was ages ago.

Webmin itself is BSD-3-Clause, so I don't know what that's for?

Do none of the remaining Zope components use that license?

According to repoquery, they don't.

I think a big part of this effort will be dealing with licenses that are specified incorrectly.
I now filed https://bugzilla.redhat.com/show_bug.cgi?id=2025849, https://bugzilla.redhat.com/show_bug.cgi?id=2025850, https://bugzilla.redhat.com/show_bug.cgi?id=2025853, https://bugzilla.redhat.com/show_bug.cgi?id=2025854, https://bugzilla.redhat.com/show_bug.cgi?id=2025856. But cleaning this up will be good even if we don't switch the syntax.

I did some more investigation, and things are much worse than expected.

SPDX and Fedora have no names that overlap with different meanings

Sadly, this is not true. There are just a few license tags that are used with different meanings in spdx and fedora, but unfortunately they are also the popular ones, so effectively this is very much untrue.

  1. In Fedora, BSD is a "category", and corresponds to BSD-3-clause, BSD-2-clause, BSD-3-clause-FreeBSD in SDPX. Fortunately there is no BSD tag in SDPX.
  2. In Fedora, MIT is a "category", and corresponds to MIT, mpich2, MIT-feh, MIT-CMU, HPND, etc. The label MIT exists in both grammars and is ambiguous.
  3. In Fedora, LGPLv2 is effectively a category that maps to either LGPL-2.0 or LGPL-2.1 in SPDX, and and similarly for the variant with + and the variants with with exceptions.

License tags by frequency in F35 (binary package License fields):

     18745 MIT             <----- category, ambiguous
     10113 ASL 2.0
      9584 GPLv2+                                             [EDIT: not category, 1:1 mapping]
      8874 BSD             <----- category
      7633 LGPLv2+     <----- category
      5348 GPLv3+
      5227 GPLv2
      4343 LPPL
      4002 Public Domain        <----- category
      3842 GPL+
      3283 CC-BY-SA      <----- category
      3089 Artistic
      2850 CC-BY             <----- category
      2834 Artistic 2.0
      2609 UCD
      2563 Utopia
      1161 LGPLv2                <----- category
      1105 GPLv3
      1005 LPPL 1.3

I'm very discouraged by those results. We have "category" tags that can't be converted to spdx more than 45k, and the most popular license tag is ambiguous between the two systems. I have some ideas, but none of them are pretty. One option would be to replace BSD by LicenseRef-category-BSD, and MIT by LicenseRef-category-MIT, and then ask maintainers to switch to a non-category SPDX tag later on. I'm all ears if anyone has better ideas.

You're making the assumption that we're going to do pure-SPDX in one go. We're not.

We can start by simply switching the identifiers that aren't ambiguous and mass replacing them. Then go back and deal with the category licenses, then go back and deal with the exceptions.

And note, I will absolutely never recommend we just tell people to go to spdx.org and look at the identifiers there. We will maintain our own list, we will just happen to use the same identifiers from spdx.org. Our "good licenses" and "bad licenses" list remains useful and valid, we will just take identifiers from SPDX and put them there rather than making up our own. We already started doing that with newer licenses and when we converted one category license recently (CDDL was split into CDDL-1.0 and CDDL-1.1 already). When we have to make up identifiers, we'll just send them to SPDX too. I recently asked for FDK-AAC to be added and that got done a couple weeks ago. The identifier is the same because I asked for it.

That way, we avoid the dumb LicenseRef- prefixes because those things only matter if we aimed for some mythical world where we rely on SPDX to vet our licenses. SPDX explicitly does not do that, and only creates an index for licenses in the first place. So it's inappropriate to rely on SPDX for the value judgement of whether a license is acceptable for packages in Fedora.

What you wrote is hardly a reply to what I wrote.

You're making the assumption that we're going to do pure-SPDX in one go.

I'm very much not doing that. In fact, the whole discussion about ambiguity between the two sets is because of the time where both would be used.

And note, I will absolutely never recommend we just tell people to go to spdx.org and look at the identifiers there. We will maintain our own list, ...

You wrote that before. I have no issue with that.

That way, we avoid the dumb LicenseRef- prefixes because those things only matter if we aimed for some mythical world where we rely on SPDX to vet our licenses.

I never said anything about dropping our list of good licenses. AFAIK, nobody has proposed that either.

--

OK, let me try to make my point in a somewhat different way. I assumed that we would be able to convert a significant chunk of our license tags to the SPDX tags automatically. Based on the statistics I put above, we will be able to do that for only a minority of packages. So the process will require maintainer input, i.e. it will drag out probably over years.

What is worse, is that if we start using the SPDX tags in a subset of packages, because of the ambiguity of shared tags, it will not be clear if a given package was already converted to SPDX or not. We have 18k tags that could be Fedora or could be SPDX, and you can't tell just from the tag, so we will have to annotate that somehow.

You listed GPLv2+ as a category, which it's not. That's GPL-2.0-or-later in SPDX.

As for LGPLv2+ category, we can scan the metadata for identifying v2.0 vs v2.1 by the checksum of the license file.

The troublesome ones are the CC licenses, I don't know if those packages actually include a copy of their license in there. If they do, then yes, we can identify them by license file checksum and fix it.

You listed GPLv2+ as a category, which it's not. That's GPL-2.0-or-later in SPDX.

Thanks. I edited my comment above to change this.

As for LGPLv2+ category, we can scan the metadata for identifying v2.0 vs v2.1 by the checksum of the license file.

The troublesome ones are the CC licenses, I don't know if those packages actually include a copy of their license in there. If they do, then yes, we can identify them by license file checksum and fix it.

Yeah, in principle we can do that. Most likely "by checksum" wouldn't work, because there's just too many variants with trivial differences. We would want to do a fuzzy comparison like the spdx license comparison plugin does.

Another option would work for packages that have SPDX in upstream, for example Rust packages. We currently generate the Fedora license from that, so we could just look up the upstream expression and use it.

You listed GPLv2+ as a category, which it's not. That's GPL-2.0-or-later in SPDX.

Thanks. I edited my comment above to change this.

As for LGPLv2+ category, we can scan the metadata for identifying v2.0 vs v2.1 by the checksum of the license file.

The troublesome ones are the CC licenses, I don't know if those packages actually include a copy of their license in there. If they do, then yes, we can identify them by license file checksum and fix it.

Yeah, in principle we can do that. Most likely "by checksum" wouldn't work, because there's just too many variants with trivial differences. We would want to do a fuzzy comparison like the spdx license comparison plugin does.

Yeah, but we can at least get a good chunk of them with exact matches and go back and do fuzzy ones afterward.

Another option would work for packages that have SPDX in upstream, for example Rust packages. We currently generate the Fedora license from that, so we could just look up the upstream expression and use it.

Yeah, we could have a --no-license-conversion flag and run rust2rpm on all the crates to regenerate the license tags. Most of them have the upstream expression as a comment, but I'd rather not rely on that.

We will also likely need to retain our identifiers for Public Domain, custom redistributable, and similar. A so-called "pure-SPDX" will never happen because SPDX explicitly does not offer anything for this. That's also the case in SUSE, too.

I was looking at the license list, and we have 4k binary packages that use 'Public Domain' (usually in combination with other licenses). Asking SPDX to add all those variants and inspecting those packages manually to disambiguate the exact version is just infeasible.

But since 'Public Domain' is usually used in a more complex expression, we will want to convert the rest to the SPDX tags and syntax. So we need some SPDX-compatible format that will allow us to express that. @jlovejoy what would you recommend? This includes examples like pl: (GPLv2+ with exceptions or Artistic 2.0) and (GPL+ or Artistic) and (BSD or GPL+) and TCL and UCD and MIT and BSD and Public Domain, and mutt: GPLv2+ and Public Domain. Can we express the public domain part in spdx expression as LicenseRef-Fedora-Public-Domain?

I think something like this should be permissible. SPDX provides an expression spec and then defines a bunch of well known licenses and modifiers. Where SPDX does not define something, the spec allows a project to add additional things and still remain within the "SPDX parseable" specification.

So from the perspective of packaging guidelines and this committee, what's important is that we clearly tell packagers what they need to do.

Yeah. I think that (if we do this) we should say that SPDX tags and syntax should be used, and only allow exceptions for the cases like "Public Domain" which we can't express in SDPX. I think the packaging docs should enumerate those specific cases and how to deal with them.

There are others, such as the firmware licenses and things like that which do not have an existing SPDX entry. But the SPDX spec provides a mechanism to express these things.

For the 36 Fedora licenses that are known to not be on SPDX, I intend to get started and elicit some help on this research as soon as it seems clear the direction this proposal is going.

We should figure this out before doing the conversion.
I looked into "ZPLv1.0" and it is not used in F35.
Similarly for "Webmin" — I think if we had Webmin in Fedora, it was ages ago.
So maybe let's cross out the ones that aren't used first, and then chip away at the others.

There is A LOT of ongoing work in this space--auditing our existing Good License list and figuring what we use and such. @jlovejoy can speak more to this.

Do none of the remaining Zope components use that license?

According to repoquery, they don't.

I think a big part of this effort will be dealing with licenses that are specified incorrectly.

Yes, auditing package licenses is going to be a large project. Not impossible, just very involved.

It's worth noting as well that we have no way right now to guarantee that the License specified in a spec file matches the license in the source. We leave that entirely up to package maintainers and if they do not catch a license update/change/modification and adjust the spec file accordingly, we will never know about it unless someone goes looking.

Getting us to a point where we can build more automated tools to audit the licensing information is a good thing.

I now filed https://bugzilla.redhat.com/show_bug.cgi?id=2025849, https://bugzilla.redhat.com/show_bug.cgi?id=2025850, https://bugzilla.redhat.com/show_bug.cgi?id=2025853, https://bugzilla.redhat.com/show_bug.cgi?id=2025854, https://bugzilla.redhat.com/show_bug.cgi?id=2025856. But cleaning this up will be good even if we don't switch the syntax.

I did some more investigation, and things are much worse than expected.

SPDX and Fedora have no names that overlap with different meanings

Sadly, this is not true. There are just a few license tags that are used with different meanings in spdx and fedora, but unfortunately they are also the popular ones, so effectively this is very much untrue.

  1. In Fedora, BSD is a "category", and corresponds to BSD-3-clause, BSD-2-clause, BSD-3-clause-FreeBSD in SDPX. Fortunately there is no BSD tag in SDPX.
  2. In Fedora, MIT is a "category", and corresponds to MIT, mpich2, MIT-feh, MIT-CMU, HPND, etc. The label MIT exists in both grammars and is ambiguous.
  3. In Fedora, LGPLv2 is effectively a category that maps to either LGPL-2.0 or LGPL-2.1 in SPDX, and and similarly for the variant with + and the variants with with exceptions.

License tags by frequency in F35 (binary package License fields):
18745 MIT <----- category, ambiguous 10113 ASL 2.0 9584 GPLv2+ [EDIT: not category, 1:1 mapping] 8874 BSD <----- category 7633 LGPLv2+ <----- category 5348 GPLv3+ 5227 GPLv2 4343 LPPL 4002 Public Domain <----- category 3842 GPL+ 3283 CC-BY-SA <----- category 3089 Artistic 2850 CC-BY <----- category 2834 Artistic 2.0 2609 UCD 2563 Utopia 1161 LGPLv2 <----- category 1105 GPLv3 1005 LPPL 1.3

I'm very discouraged by those results. We have "category" tags that can't be converted to spdx more than 45k, and the most popular license tag is ambiguous between the two systems. I have some ideas, but none of them are pretty. One option would be to replace BSD by LicenseRef-category-BSD, and MIT by LicenseRef-category-MIT, and then ask maintainers to switch to a non-category SPDX tag later on. I'm all ears if anyone has better ideas.

@jlovejoy has been working on this for months and can speak more to it in detail. But what I wanted to say is that your findings here are one reason I would like to see Fedora use SPDX expressions. BSD and MIT are simply too broad to be meaningful.

You're making the assumption that we're going to do pure-SPDX in one go. We're not.

That is important. We can't do this as a flag day operation. Nor should we.

We can start by simply switching the identifiers that aren't ambiguous and mass replacing them. Then go back and deal with the category licenses, then go back and ideal with the exceptions.

And note, I will absolutely never recommend we just tell people to go to spdx.org and look at the identifiers there. We will maintain our own list, we will just happen to use the same identifiers from spdx.org. Our "good licenses" and "bad licenses" list remains useful and valid, we will just take identifiers from SPDX and put them there rather than making up our own. We already started doing that with newer licenses and when we converted one category license recently (CDDL was split into CDDL-1.0 and CDDL-1.1 already). When we have to make up identifiers, we'll just send them to SPDX too. I recently asked for FDK-AAC to be added and that got done a couple weeks ago. The identifier is the same because I asked for it.

It bears reiterating that SPDX provides a specification for projects to uniformly represent licensing information. By itself it is incomplete for most projects, Fedora included. So yes, we will have to maintain our own database. But I want Fedora to build off the SPDX specification and abbreviations because those are well established and broadly understood in the developer community.

Fedora should have a license-data project under fedora-legal that maintains this information and this project should serve as the source that publishes to the Packaging Guide and other formats that are used by tools in the distribution. Oh look, one was started:

https://pagure.io/fedora-legal/license-data

That way, we avoid the dumb LicenseRef- prefixes because those things only matter if we aimed for some mythical world where we rely on SPDX to vet our licenses. SPDX explicitly does not do that, and only creates an index for licenses in the first place. So it's inappropriate to rely on SPDX for the value judgement of whether a license is acceptable for packages in Fedora.

I think you're conflating two things here. SPDX is entirely capable of cataloging and short-naming a license. But yes, the actual approval part for what Fedora deems appropriate for inclusion is all on our side. I don't think SPDX has ever implied they do the latter.

FWIW, the LicenseRef- notation is nice from a programming standpoint.

FWIW, the LicenseRef- notation is nice from a programming standpoint.

I too find it aesthetically icky, and would have preferred an internet-protocol-header-like convention with just X-, but... I promised @jlovejoy that I would not bikeshed on it it. It's what they picked, and it's that color now. Let's take any further debate on that particular detail to Fedora Social Hour. :)

okay, I'm not going to try and reply to each message separately with the fancy quote methodology as there are too many - but I'll try to respond to the key points:

  • using SPDX identifiers starting today (by way of example) for all NEW Fedora packages would be relatively easy.
  • yes, the hard part is updating the existing packages and that could certainly take a long time. You go with the low-hanging fruit - @ngompa has the right approach here: "We can start by simply switching the identifiers that aren't ambiguous and mass replacing them. Then go back and deal with the category licenses, then go back and ideal with the exceptions." And when it comes to the category licenses, I'd suggest not trying to do that manually, but considering a license scanning tool to help get that job done.
  • thanks @zbyszek for looking into ZPL-1.0 and Webmin licenses - I'll mark those as don't-need-to-add-to-SPDX :)
  • reiterating some of @dcantrell comments - SPDX is not, nor never would, replace Fedora's guidelines on what is a Fedora "good" or "bad" license - that is and will continue to be a Fedora determination
  • As for the greater goal here - something that I hear over and over in the open source legal community is the need for better licensing info upstream. This effort helps contribute to that goal. having a "category" to indicate many licenses may seem practical in some ways, but it's not expected nor desired by downstream recipients (whether you agree with that expectation or desire by the downstream recipients isn't the point). Being more precise by leveraging an existing standard will only provide more value downstream (even if you don't personally see the value ;)
  • thanks @mattdm - it's never efficient to get dragged into debate over a ship that has long since sailed!

I have just (painfully) made a new PR related to updating the licensing package guidelines to enable use of SPDX identifiers here: https://pagure.io/packaging-committee/pull-request/1142

I have just (painfully) made a new PR related to updating the licensing package guidelines to enable use of SPDX identifiers here: https://pagure.io/packaging-committee/pull-request/1142

Closing this PR since it's replaced by 1142.

Pull-Request has been closed by dcantrell

2 years ago
Metadata