#45 implement repoquery-like commands
Merged 6 years ago by asamalik. Opened 6 years ago by asamalik.

file modified
+5 -55
@@ -6,6 +6,9 @@ 

  

  Currently, this consists of:

  

+ * `fedmod rqoquery`: simple repoquery-like commands providing operations

+   like listing modules, resolving dependencies for packages, finding out

+   where a certain package is, etc.

  * `fedmod rpm2module`: generates a draft modulemd file based on

    the given RPM name (multiple RPM names can be given, but the resulting

    draft module will lack any descriptive metadata in that case)
@@ -30,62 +33,9 @@ 

  See the local development instructions below for info on running directly

  from a local development clone with `pipenv`.

  

- ## Modulemd creation

+ ## User documentation

  

- Before generating any draft modulemd files, first run the following command to

- fetch and locally cache the required metadata files:

- 

-     $ fedmod fetch-metadata

- 

- `fedmod rpm2module [RPM NAMES]` will then create a modulemd file from the

- given package names and emit it on `stdout`. The YAML metadata can be written

- directly to a file instead by passing the ``--output` (or `-o`) option:

- 

-     $ fedmod rpm2module -o graphite-web.yaml graphite-web

- 

- Only module level build dependencies are generated by default - there is no

- attempt to make the generated module definition self-hosting. If a self-hosting

- module is desired, then the `--build-deps N` option can be passed, where `N` is

- the number of levels of recursive build dependencies to attempt to include (this

- can quickly become unmanageable due to dependencies on build tools that

- themselves have complex build requirements, but are not yet part of a defined

- module)

- 

- The following metadata is currently used as input to the draft module generation

- process:

- 

- * Package dependency definitions are pulled from the regular Fedora 27

-   release and updates repositories, with the metadata being downloaded for

-   local use via the `fedmod fetch-metadata` command

- 

- * Installable module definitions are pulled from the modular Fedora Bikeshed

-   repository, with the metadata being downloaded for local use via the

-   `fedmod fetch-metadata` command

- 

- * The definition of Fedora's build-only `bootstrap` module is retrieved

-   directly from the relevant

-   [dist-git repository](https://src.fedoraproject.org/modules/bootstrap/raw/master/f/bootstrap.yaml)

- 

- * Descriptive metadata is taken from the system running `fedmod`. Due to this,

-   `fedmod` currently only supports Fedora 26+. (This will be fixed to use

-   the same repository metadata as is used for package dependency resolution)

- 

- Module dependencies currently err on the side of making the generated modules

- smaller by permitting generated modules to depend on packages that aren't

- listed as part of the public API of other modules. This reflects the fact that

- those transitive dependencies are typically the reason for the lower level

- modules appearing in the dependency set in the first place, as well as the fact

- that true dependency isolation will start being enforced once modules begin

- including opaque container images, such that only the client libraries are

- installed into shared environments.

- 

- Other limitations in generated `modulemd` files:

- 

- * `components` are only given a name and rationale, relying on the default

-   settings for everything else

- * the stream for module level dependencies is currently hardcoded to `f27`.

-   This isn't right, but we can't set anything better until the mechanism for

-   depending on multiple streams without naming them specifically is defined.

+ Please see the [User docs of fedmod modularity tools](src/README.md).

  

  

  ## Local development

file added
+147
@@ -0,0 +1,147 @@ 

+ # User docs of fedmod modularity tools

+ 

+ fedmod provides tools for working with Fedora's modulemd metadata format

+ that aren't related to actually building them (for build commands, see

+ fedpkg and mbs-build).

+ 

+ Currently, this consists of:

+ 

+ * `fedmod rqoquery`: simple repoquery-like commands providing operations

+   like listing modules, resolving dependencies for packages, finding out

+   where a certain package is, etc.

+ * `fedmod rpm2module`: generates a draft modulemd file based on

+   the given RPM name (multiple RPM names can be given, but the resulting

+   draft module will lack any descriptive metadata in that case)

+ * `fedmod fetch-metadata`: download the F27 package and module metadata needed

+   to generate draft module definitions (the metadata sets to use are not yet

+   configurable)

+ 

+ 

+ ## Using the repoquery-like commands

+ 

+ **List all modules**

+ 

+ Lists all modules available.

+ 

+ ```

+ $ fedmod list-modules

+ module1

+ module2

+ ...

+ ```

+ 

+ **List all modularized packages**

+ 

+ Lists all packages that have been modularized. It can optionally list only duplicate packages, and show in which modules every package is.

+ 

+ ```

+ $ fedmod list-rpms

+ pkg1

+ pkg2

+ ...

+ 

+ $ fedmod list-rpms --duplicate-only

+ pkg2

+ ...

+ 

+ $ fedmod list-rpms --list-modules

+ pkg1    (module1)

+ pkg2    (module2, module3)

+ ...

+ ```

+ 

+ **Resolve package dependencies**

+ 

+ Resolve package dependencies which is useful for creating new modules. User can also specify modular dependencies.

+ 

+ ```

+ $ fedmod resolve-deps pkg

+ pkg2

+ pkg3

+ pkg4

+ ...

+ 

+ $ fedmod resolve-deps -m host -m platform pkg

+ pkg3

+ ``` 

+ 

+ **Find package in modules**

+ 

+ Finds out whether a certain package has been modularized and in which module(s).

+ 

+ ```

+ $fedmod where-is-package pkg

+ module1

+ module2

+ ```

+ 

+ **List packages of a module**

+ 

+ Lists all packages in a given module. Can also list full NEVRAs.

+ 

+ ```

+ $ fedmod module-packages module

+ pkg1

+ pkg2

+ 

+ $ fedmod module-packages --full-nevra module 

+ pkg1-0:2.4.28-3.module_e7ab08d3.x86_64

+ pkg2-0:4.5.20-1.module_e7ab08d3.x86_64

+ ```

+ 

+ ## Modulemd creation

+ 

+ Before generating any draft modulemd files, first run the following command to

+ fetch and locally cache the required metadata files:

+ 

+     $ fedmod fetch-metadata

+ 

+ `fedmod rpm2module [RPM NAMES]` will then create a modulemd file from the

+ given package names and emit it on `stdout`. The YAML metadata can be written

+ directly to a file instead by passing the ``--output` (or `-o`) option:

+ 

+     $ fedmod rpm2module -o graphite-web.yaml graphite-web

+ 

+ Only module level build dependencies are generated by default - there is no

+ attempt to make the generated module definition self-hosting. If a self-hosting

+ module is desired, then the `--build-deps N` option can be passed, where `N` is

+ the number of levels of recursive build dependencies to attempt to include (this

+ can quickly become unmanageable due to dependencies on build tools that

+ themselves have complex build requirements, but are not yet part of a defined

+ module)

+ 

+ The following metadata is currently used as input to the draft module generation

+ process:

+ 

+ * Package dependency definitions are pulled from the regular Fedora 27

+   release and updates repositories, with the metadata being downloaded for

+   local use via the `fedmod fetch-metadata` command

+ 

+ * Installable module definitions are pulled from the modular Fedora Bikeshed

+   repository, with the metadata being downloaded for local use via the

+   `fedmod fetch-metadata` command

+ 

+ * The definition of Fedora's build-only `bootstrap` module is retrieved

+   directly from the relevant

+   [dist-git repository](https://src.fedoraproject.org/modules/bootstrap/raw/master/f/bootstrap.yaml)

+ 

+ * Descriptive metadata is taken from the system running `fedmod`. Due to this,

+   `fedmod` currently only supports Fedora 26+. (This will be fixed to use

+   the same repository metadata as is used for package dependency resolution)

+ 

+ Module dependencies currently err on the side of making the generated modules

+ smaller by permitting generated modules to depend on packages that aren't

+ listed as part of the public API of other modules. This reflects the fact that

+ those transitive dependencies are typically the reason for the lower level

+ modules appearing in the dependency set in the first place, as well as the fact

+ that true dependency isolation will start being enforced once modules begin

+ including opaque container images, such that only the client libraries are

+ installed into shared environments.

+ 

+ Other limitations in generated `modulemd` files:

+ 

+ * `components` are only given a name and rationale, relying on the default

+   settings for everything else

+ * the stream for module level dependencies is currently hardcoded to `f27`.

+   This isn't right, but we can't set anything better until the mechanism for

+   depending on multiple streams without naming them specifically is defined.

file modified
+24
@@ -157,7 +157,10 @@ 

  

  _SRPM_REVERSE_LOOKUP = {}  # SRPM name : module name

  _RPM_REVERSE_LOOKUP = {}   # RPM name : module name

+ _BETTER_SRPM_REVERSE_LOOKUP = {}  # SRPM name : [module names]

+ _BETTER_RPM_REVERSE_LOOKUP = {}   # RPM name : [module names]

  _BOOTSTRAP_REVERSE_LOOKUP = {}

+ _MODULE_FORWARD_LOOKUP = {}

  def _populate_module_reverse_lookup():

      # TODO: Cache the reverse mapping as a JSON file, as with _BOOTSTRAP_REVERSE_LOOKUP_CACHE

      if _RPM_REVERSE_LOOKUP:
@@ -173,21 +176,39 @@ 

          modules_yaml = modules_yaml_gz.read()

      modules = modulemd.loads_all(modules_yaml)

      for module in modules:

+         _MODULE_FORWARD_LOOKUP[module.name] = module

          for srpmname in module.components.rpms:

              # This isn't entirely valid, as it doesn't account for multiple

              # modules that include the same source RPM with different output

              # filters (e.g. python3-ecosystem vs python2-ecosystem)

              _SRPM_REVERSE_LOOKUP[srpmname] = module.name

+             if srpmname not in _BETTER_SRPM_REVERSE_LOOKUP:

+                 _BETTER_SRPM_REVERSE_LOOKUP[srpmname] = []

+             _BETTER_SRPM_REVERSE_LOOKUP[srpmname].append(module.name)

          for rpmname in module.artifacts.rpms:

              # This is only valid for module sets that are guaranteed to be

              # fully coinstallable, and hence only allow any given RPM to be

              # published by at most one module

              rpmprefix = rpmname.split(":", 1)[0].rsplit("-", 1)[0]

              _RPM_REVERSE_LOOKUP[rpmprefix] = module.name

+             if rpmprefix not in _BETTER_RPM_REVERSE_LOOKUP:

+                 _BETTER_RPM_REVERSE_LOOKUP[rpmprefix] = []

+             _BETTER_RPM_REVERSE_LOOKUP[rpmprefix].append(module.name)

      # Read the extra RPM bootstrap metadata

      with open(_BOOTSTRAP_REVERSE_LOOKUP_CACHE, "r") as cachefile:

          _BOOTSTRAP_REVERSE_LOOKUP.update(json.load(cachefile))

  

+ def list_modules():

+     return _MODULE_FORWARD_LOOKUP.keys()

+ 

+ def get_rpms_in_module(module_name):

+     if module_name in _MODULE_FORWARD_LOOKUP:

+         return _MODULE_FORWARD_LOOKUP[module_name].artifacts.rpms

+     return set()

+ 

+ def get_modules_for_rpm(rpm_name):

+     result = _BETTER_RPM_REVERSE_LOOKUP.get(rpm_name)

+     return result

  

  def get_module_for_rpm(rpm_name, *, allow_bootstrap=False):

      result = _RPM_REVERSE_LOOKUP.get(rpm_name)
@@ -195,6 +216,9 @@ 

          _BOOTSTRAP_REVERSE_LOOKUP.get(rpm_name)

      return result

  

+ def get_rpm_reverse_lookup():

+     return _BETTER_RPM_REVERSE_LOOKUP

+ 

  class Repo(object):

      def __init__(self, name, metadata_path):

          self.name = name

file modified
+82
@@ -3,6 +3,7 @@ 

  import logging

  

  from .module_generator import ModuleGenerator

+ from .module_repoquery import ModuleRepoquery

  from . import _depchase, _repodata

  

  # TODO: Switch this over to click (already a dependency for progress bars)
@@ -55,6 +56,69 @@ 

              description="Caches needed repository metadata locally"

          )

  

+         parser_list_modules = subparsers.add_parser(

+             'list-modules', parents=[base_parser],

+             help='Lists all modules.',

+         )

+ 

+         parser_list_rpms = subparsers.add_parser(

+             'list-rpms', parents=[base_parser],

+             help='Lists all modularized rpm packages.',

+         )

+         parser_list_rpms.add_argument(

+             "--duplicate-only",

+             action='store_true',

+             help="Only packages that are in more than one modules.",

+         )

+         parser_list_rpms.add_argument(

+             "--list-modules",

+             action='store_true',

+             help="List modules for every package.",

+         )

+ 

+         parser_resolve_deps = subparsers.add_parser(

+             'resolve-deps', parents=[base_parser],

+             help='List dependencies of given rpm packages.',

+         )

+         parser_resolve_deps.add_argument(

+             "--module-dependency",

+             "-m",

+             action="append",

+             metavar="MODULE",

+             help="Module to be used as a dependency. Can be used multiple times.",

+         )

+         parser_resolve_deps.add_argument(

+             "pkgs",

+             metavar='PKGS',

+             nargs='+',

+             help="Specify list of packages.",

+         )

+ 

+         parser_module_packages = subparsers.add_parser(

+             'module-packages', parents=[base_parser],

+             help='Lists packages in a given module',

+         )

+         parser_module_packages.add_argument(

+             "module",

+             metavar='MODULE',

+             help="Name of the module.",

+         )

+         parser_module_packages.add_argument(

+             "--full-nevra",

+             action='store_true',

+             help="Print the full NEVRA instead of only name.",

+         )

+ 

+         parser_where_is_package = subparsers.add_parser(

+             'where-is-package', parents=[base_parser],

+             help='Check if a package has been modularized and where',

+         )

+         parser_where_is_package.add_argument(

+             "pkg",

+             metavar='PKG',

+             help="Name of the package.",

+         )

+ 

          return parser

  

      def __init__(self, args=None):
@@ -78,6 +142,24 @@ 

              mg.run(cli.args.output_fname, cli.args.build_deps_iterations)

          elif cli.args.cmd_name == 'fetch-metadata':

              _repodata.download_repo_metadata()

+         elif cli.args.cmd_name == 'list-modules':

+             rq = ModuleRepoquery()

+             rq.list_modules()

+         elif cli.args.cmd_name == 'list-rpms':

+             rq = ModuleRepoquery()

+             rq.list_modularized_pkgs(

+                 duplicate_only=cli.args.duplicate_only,

+                 list_modules=cli.args.list_modules,

+             )

+         elif cli.args.cmd_name == 'resolve-deps':

+             rq = ModuleRepoquery()

+             rq.list_pkg_deps(cli.args.pkgs, cli.args.module_dependency)

+         elif cli.args.cmd_name == 'module-packages':

+             rq = ModuleRepoquery()

+             rq.list_rpms_in_module(cli.args.module, full_nevra=cli.args.full_nevra)

+         elif cli.args.cmd_name == 'where-is-package':

+             rq = ModuleRepoquery()

+             rq.list_modules_for_rpm(cli.args.pkg)

We should switch over to click for real at some point - right now we're using it for the progress bars in the metadata downloader, but I never got around to switching the actual arg parsing over.

  

      except KeyboardInterrupt:

          print('\nInterrupted by user')

@@ -0,0 +1,66 @@ 

+ from __future__ import absolute_import

+ 

+ import sys

+ import modulemd

+ import logging

+ import dnf

+ from . import _depchase, _repodata

+ 

+ def _name_only(rpm_name):

+     name, version, release = rpm_name.rsplit("-", 2)

+     return name

+     

+ class ModuleRepoquery(object):

+     

+     def list_modules(self):

+         _repodata._populate_module_reverse_lookup()

+         module_names = _repodata.list_modules()

+         if module_names:

+             for name in module_names:

+                 print(name)

+ 

+     def list_modules_for_rpm(self, pkg):

+         _repodata._populate_module_reverse_lookup()

+         module_names = _repodata.get_modules_for_rpm(pkg)

+         if module_names:

+             for name in module_names:

+                 print(name)

+ 

+     def list_rpms_in_module(self, module, full_nevra=False):

+         _repodata._populate_module_reverse_lookup()

+         rpm_names = _repodata.get_rpms_in_module(module)

+         if rpm_names:

+             for name in rpm_names:

+                 if full_nevra:

+                     print(name)

+                 else:

+                     print(_name_only(name))

+     

+     def list_pkg_deps(self, pkgs, module_deps):

+         _repodata._populate_module_reverse_lookup()

+         pkgs_in_modules = set()

+         if module_deps:

+             for module in module_deps:

+                 rpm_names = _repodata.get_rpms_in_module(module)

+                 pkgs_in_modules |= set(map(lambda x: _name_only(x), rpm_names))

+                 

+         pool = _depchase.make_pool("x86_64")

+         run_deps = _depchase.ensure_installable(pool, pkgs)

+         rpm_names = run_deps - pkgs_in_modules

+         if rpm_names:

+             for name in rpm_names:

+                 print(name)

+ 

+     def list_modularized_pkgs(self, duplicate_only=False, list_modules=False):

+         _repodata._populate_module_reverse_lookup()

+         rpm_names = _repodata.get_rpm_reverse_lookup()

+         if rpm_names:

+             for name in rpm_names.keys():

+                 if not (duplicate_only and not len(rpm_names[name]) > 1):

+                     if list_modules:

+                         print(name + "      (" + ", ".join(rpm_names[name]) + ")")

+                     else:

+                         print(name)

+ 

+ 

+          

\ No newline at end of file

@@ -0,0 +1,82 @@ 

+ import pytest

+ import os.path

+ from _fedmod.cli import ModtoolsCLI

+ from _fedmod.module_repoquery import ModuleRepoquery

+ 

+ class TestListingModules(object):

+ 

+     def setup(self):

+         self.mr = ModuleRepoquery()

+     

+     def test_list_modules(self, capfd):

+         self.mr.list_modules()

+         out, err = capfd.readouterr()

+ 

+         assert "platform" in out

+         assert "host" in out

+ 

+ 

+ class TestListingPackages(object):

+ 

+     def setup(self):

+         self.mr = ModuleRepoquery()

+ 

+     def test_list_packages(self, capfd):

+         self.mr.list_modularized_pkgs()

+         out, err = capfd.readouterr()

+ 

+         assert "kernel" in out

+         assert "gcc" in out

+         assert "(platform)" not in out

+     

+     def test_list_packages_with_modules(self, capfd):

+         self.mr.list_modularized_pkgs(list_modules=True)

+         out, err = capfd.readouterr()

+ 

+         assert "kernel" in out

+         assert "gcc" in out

+         assert "(platform)" in out

+ 

+ 

+ class TestResolvingDependencies(object):

+ 

+     def setup(self):

+         self.mr = ModuleRepoquery()

+ 

+     def test_list_pkg_deps(self, capfd):

+         self.mr.list_pkg_deps(["nginx"], [])

+         out, err = capfd.readouterr()

+ 

+         assert "gzip" in out

+         assert "nginx" in out

+ 

+     def test_list_pkg_deps_with_module_deps(self, capfd):

+         self.mr.list_pkg_deps(["nginx"], ["platform", "host"])

+         out, err = capfd.readouterr()

+ 

+         assert "gzip" not in out

+         assert "nginx" in out

+ 

+ class TestListingPackagesInModule(object):

+ 

+     def setup(self):

+         self.mr = ModuleRepoquery()

+ 

+     def test_list_rpms_in_module(self, capfd):

+         self.mr.list_rpms_in_module("host")

+         out, err = capfd.readouterr()

+ 

+         assert "kernel" in out

+         assert "httpd" not in out

+ 

+ class TestListingModulesWithPackage(object):

+ 

+     def setup(self):

+         self.mr = ModuleRepoquery()

+ 

+     def test_list_modules_for_rpm(self, capfd):

+         self.mr.list_modules_for_rpm("kernel")

+         out, err = capfd.readouterr()

+ 

+         assert "host" in out

+         assert "httpd" not in out

This PR adds multiple commands that will be useful to module developers.

List all modules

Lists all modules available.

$ fedmod list-modules
module1
module2
...

List all modularized packages

Lists all packages that have been modularized. It can optionally list only duplicate packages, and show in which modules every package is.

$ fedmod list-rpms
pkg1
pkg2
...

$ fedmod list-rpms --duplicate-only
pkg2
...

$ fedmod list-rpms --list-modules
pkg1    (module1)
pkg2    (module2, module3)
...

Resolve package dependencies

Resolve package dependencies which is useful for creating new modules. User can also specify modular dependencies.

$ fedmod resolve-deps pkg
pkg2
pkg3
pkg4
...

$ fedmod resolve-deps -m host -m platform pkg
pkg3

Find package in modules

Finds out whether a certain package has been modularized and in which module(s).

$fedmod where-is-package pkg
module1
module2

List packages of a module

Lists all packages in a given module. Can also list full NEVRAs.

$ fedmod module-packages module
pkg1
pkg2

$ fedmod module-packages --full-nevra module 
pkg1-0:2.4.28-3.module_e7ab08d3.x86_64
pkg2-0:4.5.20-1.module_e7ab08d3.x86_64

_load_module_metadata_from_cache would be a clearer name :)

Rather than returning the module list here, how about populating a forward lookup table from module name to ModuleMD?

That will also work properly with the short circuit return above that checks for whether or not the metadata has already been read.

We should switch over to click for real at some point - right now we're using it for the progress bars in the metadata downloader, but I never got around to switching the actual arg parsing over.

Much trailing whitespace this file has :)

Mostly +1 from me, but the current structure of the metadata loading code looks like a bug magnet to me. I'd suggest adding a forward lookup table from module names to modulemd records, and using that in the affected functions.

Then you can just have a single helper function that the existing "load all the metadata" function used for modulemd generation also calls.

The usage docs you have above also need to go somewhere. Perhaps a src/README.md file that we pull into the PyPI package and RPM?

Thanks for the review. What you're proposing sounds reasonable, let me do that.

Now it even feel obvious I should have done it that way before. Thanks for noticing. Fix pushed.

1 new commit added

  • use forward lookup table for modules
6 years ago

2 new commits added

  • use forward lookup table for modules
  • implement repoquery-like commands
6 years ago

Nice, this version looks much cleaner. At some point we should eliminate the duplicated reverse lookup tables, but that can be in a later refactoring patch rather than complicating this one.

That leaves:

  • adding some new tests to the tests directory (they don't need to be comprehensive, but should at least ensure basic usage of each command works - your docs examples would be fine)
  • moving the docs from the PR into a src/README.md file (the rudimentary usage docs at the top of the dev README can also be moved there, and replaced with a relative file reference to [users docs](src/README.md)

1 new commit added

  • add tests for module repoquery
6 years ago

1 new commit added

  • add user docs
6 years ago

I've completed both, the tests and the user documentation.

I will merge this PR as we've agreed on irc.

4 new commits added

  • add user docs
  • add tests for module repoquery
  • use forward lookup table for modules
  • implement repoquery-like commands
6 years ago

rebased onto 904ed92

6 years ago

Pull-Request has been merged by asamalik

6 years ago