#43 Document naming policy.
Merged 7 years ago by asamalik. Opened 7 years ago by dmach.
dmach/modularity master  into  master

@@ -7,6 +7,7 @@ 

      :maxdepth: 1

  

  

+     building-modules/naming-policy

      building-modules/developing

      building-modules/module-guidelines

      building-modules/building-local

@@ -0,0 +1,112 @@ 

+ =============

+ Naming Policy

+ =============

+ This document defines naming policy for modulemd metadata of final (built) modules.

+ This policy does **NOT** apply on sources such as modulemd yaml in dist-git.

+ 

+ The goal is to provide unique identifiers for modules

+ that are both human readable and also suitable for machine processing.

+ 

+ 

+ Fields

+ ------

+ 

+ * **N** - Name

+ * **S** - Stream

+ * **V** - Version

+ * **C** - Context

+ * **A** - Arch

+ * **P** - Profile

+ 

+ 

+ Separators

+ ----------

+ | Fields are separated with ':' (colon): N:S:V:C:A.

+ | If P is specified, it's separated from N:S:V:C:A with '/' (forward slash): N:S:V:C:A/P.

+ 

+ Examples::

+ 

+     # N:S:V:C:A

+     base-runtime:fedora-26:1:0123abcd:x86_64

+ 

+     # N:S:V:C:A/P

+     base-runtime:fedora-26:1:0123abcd:x86_64/minimal

+ 

+ 

+ Forms

+ -----

+ A form is a sequence of fields that fully or partially identifies a module.

+ 

+ 

+ Full Forms

+ ~~~~~~~~~~

+ 

+ N:S:V:C:A

+     Unique identifier of a module.

+ N:S:V:C:A/P

+     Unique identifier of a module profile.

+ 

+ 

+ Partial Forms

+ ~~~~~~~~~~~~~

+ 

+ Supported partial forms are: N [ : S [ :V [ :C ] ] ] [ :A ] [ /P ]

+ 

+ Namely:

+ 

+ * N

+ * N::A

+ * N:S

+ * N:S::A

+ * N:S:V

+ * N:S:V::A

+ * N:S:V:C

+ * N:S:V:C:A (identical to N:S:V:C::A)

+ * and all combinations with /P

+ 

+ Missing fields **SHOULD** be populated with recommended defaults:

+ 

+ Stream

+     defaults to the enabled or system default stream for the module in this particular order

+ Version

+     defaults to the latest available version in the module stream

+ Context

+     defaults to a value matching with already installed modules or modules involved in the transaction (not yet installed)

+ Arch

+     defaults to the system arch (e.g. DNF's $basearch)

+ Profile

+     defaults to the system default or 'default' profile

+ 

+ 

+ Allowed Characters

+ ------------------

+ **N** - Name

+     a-z A-Z 0-9 . - _ +

+ **S** - Stream

+     a-z A-Z 0-9 . - _ +

+ **V** - Version

+     0-9

+ **C** - Context

+     0-9 a-f

+ **A** - Arch

+     a-z A-Z 0-9 . - _ +

+ **P** - Profile

+     a-z A-Z 0-9 . - _ +

+ 

+ All fields **MUST** start and end with an alphanumeric character:

+ a-z A-Z 0-9

+ 

+ 

+ Forbidden Characters

+ --------------------

+ This paragraph serves as a design decision for future changes.

+ 

+ Following characters **MUST NOT** be part of any field:

+ 

+ * ':' (colon) - separator

+ * '/' (forward slash) - profile separator

+ * '\\\\' (backslash) - comon control character

+ * '*' (asterisk) - common wildcard

+ * '?' (question mark) - common wildcard

+ * '@' (at) - grpspec in YUM and DNF

+ * ' ' (space) - common separator

no initial comment

I still don't see how this is useful and where it has been agreed on this. I still think that having same separator is wrong

since version is optional, what base-runtime:26 would mean? is 26 stream or version? I think this needs to be described very carefully which fields are optional and obviously change separator.

why does it matter how files are names? what matters is metadata inside.

obviously missing details here are:

  • ascending or descending
  • in which locale
  • natsort?
  • how to sort versions?

I think specifying version without specifying stream is terribly wrong.

so since I think this is wrong, my proposal here is Docker/pip style:
N[:S[/V]][.A][+P]

…except maybe if a module/stream is installed already and you want to update/downgrade while staying in the same stream

I think that's mostly to acommodate users encountering these files. Technically, RPM packages could be named anything, too, but the convention is N-V-R.A.rpm. IMO, the criterion could be downgraded from MUST to SHOULD.

  • yeah, "ascending" probably
  • what do you mean with "locale"? Language, encoding?
  • module versions are mere integers, so I don't get what the problem with sorting them is

For full identification of a module - all fields are mandatory (N:S:V:A).
Partial form is used by tools (namely DNF) to make a query, return best matching module and work with it.

I don't understand where's the problem.
It has been working with YUM/DNF and RPMs for ages,
this proposal is even stricter on delimiters and makes parsing easier.

This will be very hard to follow in dist-git. You want to name your modulemd as N.yaml (N states for "name"). If this is talking about DNF specific yaml files cached locally, I think it should be removed from the doc and moved to different one.

To have diffable output, the sorting of all fields in modulemd.dumps() output has to be the same. Should we mention it in this document too? Note that I have no idea if modulemd behaves this way or not...

Agreed. I don't think this part means anything else than the files DNF is operating on, but it doesn't make it clear.

@jkaluza, I don't think there are any promises about preserving the structure of a modulemd document, neither with the current Python implementation nor with its slated C/glib/gobject based successor.

I've talked to @psabata and @sgallagh and the main technical reason (leaving aside the discussion if it's necessary or not :stuck_out_tongue:) seems to be that the hash data structures available in glib/gobject don't have the functionality of an OrderedDict in Python.

We've discussed that an acceptable place to keep things in order is the Python wrapper around the gobject-introspected interface, i.e. Python would load the YAML document in a way that preserves order (e.g. using the pyaml instead of the PyYAML module), parse things up, use modulemd to "do things" on the structure, and then coerce things back into the old structure before writing out to disk. Or something like that :smile:.

Why can't we default to whichever stream matches (if only one).

I'm not sure we want to match to the default stream if another also matches, but I also don't understand why Igor doesn't want name:version matching anyway ... maybe we can talk about that tomorrow?

We did vote on this last week and non-same separator was a clear winner, no? Is this strawman to talk about tomorrow. Would be kind of useful to put some other text in here for what people voted on for comparison?

For ref. the votes were (roughly):
All dash: -4
All colons: +3 -3
'@' and dot (p.n@s.v.a/P): -2
Partial colon (p.n:s.v.a/P): -4
Current (n-s-v/P): +8 -2
PIP Style (N[:S[/V]][.A][+P]): +17 -0

I'd go the other way and say we should drop '.' and '_' from name/stream, unless we have known uses for either. Expanding "allowed" characters is easy, later.

Note that we should at least mention the stuff happening with platform/factory where we'll need to add a hash/checksum somewhere here. It's not 100% finalized yet, but still it will be confusing to talk about both things and not reference the other.

We've basically agreed to include a new field called "context", abbreviated with C. While we think this field shouldn't be displayed to the user in most situations, it might be specified in some other and will be necessary to truly uniquely identify a module build. Therefore I would like you to include it in this proposal. Its presence should about as rare as that of the base architecture.

Just for reference, James' comment here is about the context field noted above.

As noted in some other comments, some combinations of missing fields make more sense than others. In my opinion, the following should be considered as valid input. I'm omitting the profile since it should be possible to add profile to any and all of the combinations below. Examples provided in parens. I will use colons as a separator even though I don't agree with that choice -- and the examples below will just emphasize the weakness of this solution.

  • N (httpd)
  • NA (httpd:i686)
  • NS (httpd:2.4)
  • NSA (httpd:2.4:i686)
  • NSV (httpd:2.4:20170808123000)
  • NSVA (httpd:2.4:20170808123000:i686)
  • NSVC (httpd:2.4:20170808123000:deadbeef)
  • NSVCA (httpd:2.4:20170808123000:deadbeef:i686)

Stream is effectively a part of the structured name. It doesn't make sense to specify version or context without it. Relying on default stream while being extremely specific at the same time just feels weird and an unlikely use case.

Context serves as a unique flag (hash) identifying the compatibility context of the module's artifacts and most people shouldn't need it -- it is still useful for filing bug reports or cloning a system, however.

I don't understand what "defaults to enabled" means here. Defaulting to the one defined by system profiles (or other temporary configuration) if no stream is specified is reasonable and I think it's desired. See my comment above.

Also, as noted above, since stream is effectively part of the (structured) name, specifying just NV doesn't make much sense to me. Again, reasons above.

Besides the configured default, this should default to default unless specified or configured otherwise. Furthermore if the profile doesn't exist, consider it being empty. Empty profiles are fine. I repeat -- empty profiles are fine.

I'm a proponent of the PIP style as well. While the DNF team owns this problem, I don't see a reason to completely ignore the popular vote, especially since the proposed solution here (just colons) relies on dark magic and heuristics inside DNF rather than being deterministic. More trouble would arise once somebody names their stream x86_64 or something similar. You then need more policies to prevent that, in multiple places. Several tools that handle modules need to implement exactly the same logic as DNF... it just doesn't make sense to me.

Please, seriously reconsider going with the PIP style, even though it might seem complex at first. Also realize the most common use case for the average Joe will be just the following four: N, N+P, N:S, N:S+P.

The only thing that needs to be added to this option is the context hash value. Extending the proposal with "%" below ("%" is known as hash to certain programmers):

N [ :S [ /V [ %C ] ] ] [ .A ] [ +P ]

This proposal also clearly maps to my comment for line 43 above.

"." would need to be excluded if we go with the PIP style.

"." would need to be excluded if we go with the PIP-style.

Any reason to make this more restrictive than the rest? You could argue we don't have such symbols in architectures today. Well, we don't have architectures with dashes either.

Well, in this case neither "_" nor "." conflict with anything or any delimeter, even if we go with the PIP style. But "." in name and stream is problematic for the PIP-style, as I mentioned above.

This is in conflict with some of the definitions above that do not include uppercase letters (for some reason).

Once again, voting for the PIP-style here ;) I would extend this list to all the separators proposed -- :, /, %, . and +.

This cannot be a general rule for all the systems that handle modulemd. Besides there being no reason for that, it's not even possible to implement, as others have noted.

If this is about how DNF stores module metadata on the system, I don't think it belongs in this document.

This is an interesting idea but I don't think it belongs in this document. This should be an RFE for modulemd.

So I added several comments, many of which I feel fairly strongly about.

Note the modulemd specification is going to be updated within the next couple of days. We will be adding the context hash, base architecture and EOL tags. This is all pretty simple but two of these have an immediate effect on this document. More in the comments.

And just to ensure it isn't lost, I'm proposing this extended PIP-style notation that doesn't require any heuristics and policies all over the place:

N [ :S [ /V [ %C ] ] ] [ .A ] [ +P ]

The only difference from the popular winner is the addition of the context hash.

I'd like to propose a requirement: given a specification of a module, it must be possible to figure out what name/stream/version/etc. of the module has been specified based solely on on the text of the specification. [If you have to make make http requests to PDC to parse a name into fields, that's not acceptable.]

The text here seems to contradict that, though it may simply be saying that it should be possible to "partially" specify a module.

Owen, I believe this is about selecting a module without specifying the entire identifier, just like dnf install perl and dnf install perl-5.24.2-393.fc26.x86_64 install the same thing (assuming that NVRA is the latest, of course). The tooling should fill in the unspecified fields with defaults -- the default configured stream, the latest version, the matching context, the matching architecture and the default configured profile.

In the context of naming, what worries me about the opaque hash for a context is that you can either a) not specify the context and assume that the backend processing will "do what I mean" or b) do a look of the available contexts and pick one for the name. In the context of an existing set of enabled modules the DWIM is reasonably obvious - "pick the context that is compatible with the set of modules that are already enabled" (and hope there is exactly one) - but say you are specifying a module to build a Dockerfile against and the module is available in multiple contexts. What's the intent? Is it the module with the "latest" version of the dependencies, sorting the streams in some fashion?

This makes me wonder if it should be possible to specify context instead as (Platform=f27) or (Platform=f26,Python=3.7). This obviously takes the PIP naming scheme to new heights of complexity, but without it, I worry that different parts of our infrastructure will invent their own way of matching against available contexts.

One easy way to specify context in one invocation is to list multiple modules in that transaction. I will skip platform in the example below because it is always installed for you -- or you base your container on it. But let's say your application foo is built against python:3.6 and python:3.7. If you asked dnf to install foo (with the default stream and such), it wouldn't know which python to choose. Now you have two main options.

You either pick the default python stream selected by system configuration -- the system profiles. Or you explicitly ask the user to clarify what they want, saying that the input wasn't sufficient to select the software (some distros do this and it's not so bad).

Or, to force it in one go, the user could run something like dnf install @python:3.7 @foo. In this case, the proper context of both modules would be selected -- the python:3.7 in the latest version compatible with the platform you have, and foo in the default stream, the latest version and context compatible with your platform and python:3.7.

Dan, Adam and I discussed this in person today. We agreed Dan will be submitting an updated proposal with the string form using the following format:

N [ : S [ :V [ :C ] ] ] [ :A ] [ /P ]

To avoid ambiguity, specifying architecture with a partial form, one needs to use an IPv6-like syntax. Several examples of valid strings can be found below:

  • httpd
  • httpd:2.4
  • httpd:2.4::i686
  • httpd::ppc64le
  • httpd:2.4:20170809172200:deadbeef:s390x/minimal

The above implies selectors that specify only version or context (or both) but no stream are invalid. While it could be technically possible, just like it is in IPv6 addresses, it was deemed unnecessary and confusing. If there's a demand for that, we can add it later.

Thank you all for the feedback.
As Petr posted already, we met today to work on a new version
of this document and incorporate your thoughts.

First of all, we decided to avoid PIP notation,
because it's overly complex and requires users
to remember all separators.
Having one separator for fields and an additional
one for profile is more reasonable.
Keeping it simple is also a requirement from our management...

Changes from previous version:

Removed:
* file names (document scope)
* metadata ordering (document scope)

Added:
* C (Context) field - based on new modulemd specs

Changed:
* used IPv6 style syntax for ::A
* unified allowed characters across most fields

I'm hoping you'll find this version better,
but if you still have any suggestions, use either the comments in the "code"
or create one big comment for better readability of more complex suggestions.

rebased

7 years ago

Maybe this could be expanded a little bit -- perhaps mention modules involved in the transaction (not yet installed) affect the choice too?

@psabata good point, I've updated the text.

rebased

7 years ago

...or default if no configuration is present. Also the configuration (system profiles) should be per module, I think.

All the string fields should also allow +. We allow it in RPMs as well (gtk+ for instance).

rebased

7 years ago

rebased with all requested changes

unless there are any additional change requests,
can anyone merge this PR?

+1

Let's push this forward.

Can someone merge this pull request?
Or is there any blocker?

Pull-Request has been merged by asamalik

7 years ago