#486 Discuss and decide on a Fedora Council policy on the use of AI-based tooling for contributions
Opened 3 months ago by t0xic0der. Modified 11 days ago

Why?

In the Outreachy 2024 Orientation Call that took place on Friday, 08 Mar 2024, a bunch of AI assistant tools were used to transcribe what was spoken for the better ease of those who are participating as well as for those who were not participating (like me). Our applicants suggested tools from Otter.ai, Read.ai and some more and whilst they were used during the call, it was not until after the meeting that we realized that we perhaps need to be a lot more thorough in choosing which AI-based tooling to make use of for contributions.

AI is cool and all but I would be wary about the quality of contributions made with the help of AI-based tooling to the community as well as the safety of the contributors while using those. After all, we would hate to have our voices reused somewhere else without our express consent just because we overlooked some legalese written in as small a font size as possible. Also, a bunch of these tools make it mandatory for the users to log in and hence, they unknowingly consent to loads of marketing mail that they could totally do without.

What?

As a member of the Fedora Council, I ask that we discuss and decide on a policy for the use of AI-based tooling for making contributions. These could include but are not restricted to things like the nature of AI-based toolings used, the ways that one could maintain contribution quality, the things contributors need to be wary of while making a choice and the implications of copyright and trademark laws when a certain such tool is employed.


Some guidance at the project level would be helpful. We would likely need a legal consultation for this, but before we go for that, I suggest writing out our values/beliefs about what we feel makes sense as a project policy on this topic.

Some examples.

The Read.ai looks fancy and stuff with just how accurately it was able to provide the recap with divisions made in chapters/topics, action items and key questions from the Outreachy 2024 Orientation Call.

Screenshot_20240313_085600.png

But then I see that they have made it mandatory for folks to sign up to access the logs.

Screenshot_20240313_085710.png

And I slowly sense my excitement turning into wariness.

The Otter.ai looks nice as well.

Screenshot_20240313_085903.png

Let us try to see if they let me in without agreeing with their shady terms.

Screenshot_20240313_090027.png

Ahh... OK. Never mind.

Addendum

The email address on the last screenshot was automatically filled there when I clicked on the link that I received in my work email address and not something that I filled by myself. I personally refrained from using any of these and simply elected to rewatch the recording again. This is my personal stand on this situation and while I frown upon these practices, I do understand that AI-based toolings like these could be incredibly helpful in accessibility areas. Until we have a policy governing how these toolings can be made use of, the use of AI-based tooling for contribution is a mighty thin ice that folks would continue to walk on until one fine day it decides to cave in.

Metadata Update from @amoloney:
- Issue tagged with: Next Meeting, policies

2 months ago

This remains my proposal from the Fedora Council meeting today:

  1. We do not want to write a policy about generative AI contributions.
  2. We do not want to create a policy about generative AI contributions in upstream projects.
  3. We want to write a statement about generative AI use in contributions in Fedora. The statement describes why the maintainer model in Fedora is important and why we care about getting the Freedom Foundation right when it comes to AI.

This remains my proposal from the Fedora Council meeting today:

  1. We do not want to write a policy about generative AI contributions.
  2. We do not want to create a policy about generative AI contributions in upstream projects.
  3. We want to write a statement about generative AI use in contributions in Fedora. The statement describes why the maintainer model in Fedora is important and why we care about getting the Freedom Foundation right when it comes to AI.

I'm still hesitant on this. I understand your position, but I also feel we as Fedora are leaders in the broader open source community. There may be upstream projects out there waiting for a downstream user of their project to put forth a policy prohibiting code contributions from generative AI. For smaller projects, it makes it easier for them to say "well, my project is used in Fedora and Fedora prohibits generative AI code contributions so therefore I do not allow them in my project."

This could help continue pushing the issue that we do not have a resolution on how open source licensing compliance works (or, "creator rights" if you want to refer to everything) with generative AI.

I like point 3, but I do think Fedora needs something stronger specifically when it comes to code contributions.

Putting aside creator rights, I am also concerned about quality. From what I have seen from generative AI systems is that the code quality is just terrible. We work really hard in Fedora to not have terrible code. I don't want to add bad code at an exponential rate.

I don't think it's possible for you to make a statement (point 3) without creating a policy to back it (point 1). I'm not sure whether we should be saying things about what upstreams do per se (point 2), though.

We need a policy about Fedora contributions though.

I don't think it's possible for you to make a statement (point 3) without creating a policy to back it (point 1). I'm not sure whether we should be saying things about what upstreams do per se (point 2), though.

We need a policy about Fedora contributions though.

I agree that point 3 implies point 1.

A statement about upstreams is difficult. For one, Fedora is already upstream for so many things in the open source space. But we also very clearly make the distinction between upstream and Fedora as downstream even when the people involved are the same.

Regarding contributions and specifically code contributions, I think a Fedora policy is necessary. We even direct many contributions back to upstream projects so a Fedora policy could have an indirect effect on the upstream projects.

Any policy we have I think Fedora needs to identify the concerns that exist right now around generative AI. I've mentioned my concerns around creator rights and licensing compliance as well as overall code quality. Until these are resolved in a way we can trust, I think Fedora needs to both recognize these as real problems and note the areas where generative AI contributions are not permitted. Fedora itself does not want to accidentally violate the terms of an open source license, for instance. Right now we can't say with certainty that generative AI guarantees that will never happen. All we know is that generative AI systems have and continue to violate open source licensing terms with generated code.

On the other hand, generative AI systems can be useful. For example, say I write a couple new public functions for a library and use ChatGPT to generate the initial draft of the API documentation from that code. Then I go and fix up some phrasing, add some clarifications, and otherwise make it sound less robotic. I have used it on code I wrote and I used it to help me write documentation. This is an example of where I'm torn on generative AI. Examples like this feel super useful while using generative AI to patch broken code feels very scary and problematic.

Metadata Update from @jflory7:
- Issue assigned to t0xic0der
- Issue priority set to: None (was: 1)
- Issue tagged with: In Progress

24 days ago

Metadata Update from @jflory7:
- Assignee reset

24 days ago

here's a non generative example of AI that may serve as a good example for future discussions:

I personally have used http://voiceinput.futo.org/, largely when composing long messages on my phone or writing essays for class. its simply speech to text, so I'm still doing all the creativity. The models are all on-device and nothing is sent via the internet. The code behind it is available too under a semi-open license (the organization supporting it is still working to figure out their stance on going fully open source with their licensing).

I personally love this tool because it saves so much time for typing long text on my phone and, even though i still have to put in time to editing for typos, it still takes less time for me to write the same message in most cases.

I think (per the last council meeting) it may be nice to create some kind of category system for these AI examples to make discussing them easier. Also maybe categorizing their use may be helpful since it seems like common themes against the use of AI include:

  1. is the company behind it trying to be extractive/use the data in their own interest at the expense of their users/the community?
  2. how human supervised is the use of AI? i.e. how likely is it that the inevitable AI generated mistake makes it past a person because its too hard to review?

While otter and read.ai may fail the first of these tests, and AI code generation may create hard to spot bugs that fail the second, i think theres still some room for some AI (like FUTO voice or similar accessibility tools) to be used.

This isnt by any means a complete system for classifying all uses of AI in fedora, but i imagine this could be expanded into a flowchart or something to provide a repeatable, transparent way to "sort" the different uses of AI into buckets (sort of how theres a list of "tests" for fair use of copyrighted material but still some gray area to accommodate things changing in the future)

Metadata Update from @amoloney:
- Issue untagged with: Next Meeting

11 days ago

Log in to comment on this ticket.

Metadata