#147 Fix bug introduced by recent patches and clean up the code a bit
Merged 2 years ago by zbyszek. Opened 2 years ago by zbyszek.
fedora-rust/ zbyszek/rust2rpm fixlets  into  master

file modified
+22 -13
@@ -3,6 +3,7 @@ 

  import contextlib

  from datetime import datetime, timezone

  import difflib

+ import functools

  import itertools

  import os

  import pathlib
@@ -44,6 +45,12 @@ 

           r"GNU-.*[0-9].*|GPL-.*[0-9].*|LGPL-.*[0-9].*|MIT-.*[0-9].*|"

           r"MPL-.*[0-9].*|OFL-.*[0-9].*)")

  

+ def sortify(func):

+     """Return a sorted list from a generator"""

+     def wrapper(*args, **kwargs):

+         return sorted(func(*args, **kwargs))

+     return functools.update_wrapper(wrapper, func)

+ 

  def get_default_target():

      try:

          os_release_file = open('/etc/os-release')
@@ -125,6 +132,9 @@ 

          doc_files = get_doc_files(toml)

          license_files = get_license_files(toml)

          toml = os.path.join(toml, "Cargo.toml")

+     else:

+         doc_files = []

+         license_files = []

  

      return toml, None, version, doc_files, license_files

  
@@ -205,18 +215,18 @@ 

  def _is_path(path):

      return "/" in path or path in {".", ".."}

  

+ @sortify

  def get_license_files(path):

-     license_files = []

      exclude = { "vendor", "example", "examples", "_example", "_examples",

-                    "testdata", "_testdata", ".github", "tests", "test"}

+                 "testdata", "_testdata", ".github", "tests", "test" }

      for root, dirs, files in os.walk(path, topdown=True):

          dirs[:] = [d for d in dirs if d not in exclude]

-         license_files.extend([os.path.relpath(os.path.join(root, f), path) for f in files if LICENSES.match(f)])

-     license_files.sort()

-     return license_files

+         for f in files:

+             if LICENSES.match(f):

+                 yield os.path.relpath(os.path.join(root, f), path)

  

+ @sortify

  def get_doc_files(path):

-     doc_files = []

      matcher = re.compile(

          r"(.*\.md|.*\.markdown|.*\.mdown|.*\.mkdn|.*\.rst|.*\.txt|AUTHORS|"

          r"AUTHORS[\.\-].*|CONTRIBUTORS|CONTRIBUTORS[\.\-].*|README|"
@@ -225,9 +235,9 @@ 

      matcherex = re.compile(r"CMakeLists\.txt")

      for root, dirs, files in os.walk(path, topdown=True):

          dirs[:] = []

-         doc_files.extend([os.path.relpath(os.path.join(root, f), path) for f in files if matcher.match(f) and not LICENSES.match(f) and not matcherex.match(f)])

-     doc_files.sort()

-     return doc_files

+         for f in files:

+             if matcher.match(f) and not LICENSES.match(f) and not matcherex.match(f):

+                 yield os.path.relpath(os.path.join(root, f), path)

  

  def make_diff_metadata(crate, version, patch=False, store=False):

      if _is_path(crate):
@@ -245,10 +255,9 @@ 

      else:

          cratef, crate, version = download(crate, version)

  

-     with files_from_crate(cratef, crate, version) as files:

-         diff = make_patch(files[0], enabled=patch)

-         metadata = Metadata.from_file(files[0])

-         doc_files, license_files = files[1:3]

+     with files_from_crate(cratef, crate, version) as (toml, doc_files, license_files):

+         diff = make_patch(toml, enabled=patch)

+         metadata = Metadata.from_file(toml)

      if store:

          shutil.copy2(cratef, os.path.join(os.getcwd(), f"{metadata.name}-{version}.crate"))

      return crate, diff, metadata, doc_files, license_files

@aplanas: maybe you could review this or #146 in return for #105.

nit: Python doc strings are """

If you put the initialization at the beginning, you can skip the else

mini-nit: space at the end of the dict

IMHO generators and decorators hide you intention: create a sorted list. I would drop sortify and use the explicit approach.

What is worst, there is no gain on using a generator. sorted requires to generate first the full list on memory. You can test that with this simple function:

@sortify
def rnd_sort():
    for i in range(10):
        print("R")
        yield random()

for i in rnd_sort():
    print(f"> {i}")

You can see that with @sortify the list is fully generated first (print lots of "R"), and later print the result. If you remove the decorator, you will see the normal behavior of a generator

Thank you for the review.

I think we'll have to agree to disagree on the issue of the decorator. I know that the decorator doesn't make things faster: it obviously cannot, since we need to generate a list to sort it. The reason for the refactoring is that (IMO) it makes the code much more readable. Even in the example you gave above, if I ask "what values does rnd_sort() yield?", you can immediately and without any thinking say, "oh, numbers returned by random()". Similarly for get_license_files(), it is obvious that we yield values produced by relpath(). But with the previous version, we do list manipulations, and you have to think that .extend() expands the list. Errors where you one uses .append instead of .extend on a list are fairly common. (Or .extend on sequence value like a string that was never supposed to be expanded.) With the generator approach, this kind of error is unlikely.

If you put the initialization at the beginning, you can skip the else

I don't think that's better. I like that it's clear that the empty lists are the fallback values.

I implemented the other suggestions.

3 new commits added

  • Rector explicit indexing into a tuple unpacking
  • Refactor the code to generate file lists
  • Fix crash when path to a toml file is given
2 years ago

The reason for the refactoring is that (IMO) it makes the code much more readable

It is fair, but is not true. The sort is not explicit (not in the caller not in the called), it is hidden under a decorator, outside were the data is generated or consumed.

The sorfify is also tricky to read, using a wrapper to export the name of a function that is never used outside, and a docstring that was never written. IMHO update_wrapper is used when we want to attach non-observable behavior into an exposed function, like memoization, log, validation, authorization, or something else that do not change the inner behavior of the decorated function. Here we are changing the order!

If you want to use a generator, I would still suggest to sort the content from the caller then. It is explicit and makes more clear that the generator is still delivering the full list.

In any case LGTM : ). Thanks!

The sort is (IMO) very explicit. When decorator syntax was introduced, one of the explicit design goals was to make the decoration visible. And I think the decorator name make it clear that it is about sorting. "Tricky to read" is relative — if someone is not familiar with how decorators are put together, there is indeed a lot of magic going on. But once they get past this general unfamiliarity, this particular decorator is relatively simple.

Also, even if the details of the decorator are confusing, the reader doesn't really need to understand them. The important information is in the docstring: it takes a generator and returns a sorted list. With that knowledge you can understand how the decorator is used and even use it in other places.

update_wrapper is used when we want to attach non-observable behavior

No, update wrapper is used to copy function signature and docstring. In general it's not supposed to hide anything, quite the opposite. Also, some decorators change behaviour of a function quite significantly… @property, @contextlib.contextmanager, @functools.lru_cache, etc, etc.

There is another benefit from using decorators like this: once written, the same decorator can be used in other places. In this relatively short codebase there are a few other places where we do similar construction of lists.

Dunno, I just like decorators. It's such a neat concept.

Commit f0de38a fixes this pull-request

Pull-Request has been merged by zbyszek

2 years ago

Pull-Request has been merged by zbyszek

2 years ago

FWIW, if you want a new, sorted list from existing values, you can just use [*sorted(foo)], which does the same thing under the hood, but does not require either decorators, nor writing new wrapper functions.

[*sorted(foo)]

Yes, but I'd still need to construct foo, which I want to avoid.

Metadata