#2117 limits on name values
Closed: Fixed 2 years ago by tkopecek. Opened 4 years ago by mikem.

We have a number of name fields in the db. Some have length limits in the schema, others are simply TEXT fields with no length limit.

Beyond length, we don't currently enforce any particular rules about these names. For example, it is currently possible (though very unwise) to create a tag with a newline in the name.

Furthermore, with unicode support, it becomes possible to use a number of questionable characters in name fields. While using an accented character in a username may be valid, using a poop emoji seems much less so.

I'm opening this issue for discussion. How wide a variety of names should Koji support here? Are there fixed rules that we can safely hard code, or does it need to be all configurable? Could we limit name fields to simply ascii? Can we have a common set of rules for all name fields, or are some name fields "special"?


Metadata Update from @mikem:
- Custom field Size adjusted to None

4 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.22

4 years ago

I think there was some RH Bugzilla problem a while ago when someone changed their user name to have an emoji, and the problem was that XML-RPC could not handle that. I searched my email for details but I must've deleted it.

Anyway, this is probably something to test all the way through the stack to verify that XML-RPC can handle every character type that we choose to "support".

TEXT is very big for something like a package name. Maybe we could drop that down to VARCHAR(255) or something.

Probably some java package is going to exceed 255, but certainly there is some practical limit we could set. We probably want to stop a broken build process from having a 1MB package name ;)

@mikem for java: It probably would mean, that some file aka package_name.jar must be created. In such case we still hit the limit of 255 (or shorter for unicode) as most filesystems stops there (https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits)

re: length. I was thinking perhaps a limit that is quite high, but still manageable. E.g. 2048. That gives plenty of room for systematically verbose names, but keeps someone from accidentally cramming, say, raw iso contents into a name field :smirk:

However, I intended this issue to be primarily about the character set limits.

At the extreme end we could say: ascii characters only in names.

Also, I feel like there are different categories here.

With our own name fields (tags, targets, permissions, volumes, btypes, ext repos) I feel we could be more strict.

WIth content that comes from build content (e.g. package names, rpm names, archive names, also comps data) we need to be more flexible. We don't want to put too many artificial limits on what can be built.

I don't think we have to sort this out for 1.22

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.23 (was: 1.22)

3 years ago

So, does it make sense to close it, that our internal fields are ascii, whle external input is utf-8?

@tkopecek and I talked about this issue a bit today and I will attempt to summarize here.

  • This discussion has been open for six months and we haven't gotten a great deal of input.

  • Some kind of size limit seems appropriate. After all a 1GiB tag name is absurd.

  • The full span of unicode seems too much, and for that matter we might not want to allow all
    ascii either (control codes, whitespace)

  • We don't know what the existing set of characters in use across Koji installations is

  • The rules probably need to be different for "internal" fields like tag names, versus "external"
    fields like name, version, and release of builds

I think in the end we were mostly settled on making this configurable. That would probably be:

  1. A configurable max length for the affected fields
  2. A configurable regex that the affected fields must match
  3. Perhaps other config flags to control application of rules that are difficult to code in a regex (e.g. ascii only)

It's a little unclear just where we want to apply this. The title of this issue refers specifically to name fields, but we have many other TEXT fields that are not names, and package names might need special treatment (ditto version and release).

Metadata Update from @tkopecek:
- Issue untagged with: discussion
- Issue set to the milestone: 1.24 (was: 1.23)
- Issue tagged with: tech-debt

3 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.25 (was: 1.24)

3 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.26 (was: 1.25)

2 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.27 (was: 1.26)

2 years ago

Metadata Update from @tkopecek:
- Issue tagged with: testing-ready

2 years ago

Metadata Update from @jobrauer:
- Issue tagged with: testing-done

2 years ago

Metadata Update from @jcupova:
- Issue untagged with: testing-done, testing-ready
- Issue set to the milestone: 1.28 (was: 1.27)

2 years ago

Metadata Update from @jcupova:
- Issue tagged with: testing-ready

2 years ago

Metadata Update from @jobrauer:
- Issue tagged with: testing-done

2 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #3028 Merged 2 years ago