#26 filterlist: filter jpg and png too
Closed 7 years ago by tibbs. Opened 7 years ago by adamwill.
adamwill/quick-fedora-mirror filter-png-jpg  into  master

file modified
+3 -3
@@ -58,8 +58,8 @@ 

      null = open(os.devnull, 'w')

      p = argparse.ArgumentParser(

          description='Generate a list of files and times, suitable for consumption by quick-fedora-mirror, '

-                     'and a much smaller list with packages, Device Tree boot files, HTML files and '

-                     'directories filtered out, for consumption by fedfind.')

+                     'and a much smaller list with various non-image file types and directories filtered '

+                     'out, for consumption by fedfind.')

      p.add_argument('-c', '--checksum', action='store_true',

                     help='Include checksums of all repomd.xml files in the file list.')

      p.add_argument('-C', '--checksum-file', action='append', dest='checksum_files',
@@ -112,7 +112,7 @@ 

          # opts.filelist.write(entry.path + '\n')

          print(entry.path, file=opts.filelist)

          # write to filtered list if appropriate

-         skips = ('.rpm', '.drpm', '.dtb', '.html')

+         skips = ('.rpm', '.drpm', '.dtb', '.html', '.png', '.jpg', '.hdr', '.filez', '.dirtree')

          if not any(entry.path.endswith(skip) for skip in skips) and not (entry.is_dir()):

              print(entry.path, file=opts.filterlist)

          if entry.name in opts.checksum_files:

alt has a ton of png files in it, turns out, so let's filter
these too.

rebased

7 years ago

Updated to also skip 'hdr', 'filez' and 'dirtree' files (there's quite a lot of those in archive).

So the list of 'skips' is getting a bit long now, which triggers my worry factor, as does the fact that we now know that new file types seem to come and go. There's a danger here that some change to the compose process will suddenly inflate the filter list size and we won't immediately notice, which will mean people using fedfind might suddenly wind up doing more large file downloads than they bargained for.

So I've proposed https://pagure.io/quick-fedora-mirror/pull-request/27 as an alternative here, which switches to importing the list of 'known image types' from productmd and only including files that end with one of the extensions in the image list. fedfind currently uses the same productmd constant downstream to find the actual image files from either the rsync output (old) or the filterlist contents (new). I've mentioned my thoughts on the positives and negatives of each approach there.

Tagging @kevin @ausil @bowlofeggs for thoughts on which approach is preferred here.

I merged the other one; it makes the most sense to me.

For the record, in general I'd like to at least try to keep this all as generic as possible. I think there's a good general framework for doing faster rsync-based mirroring in here, but that will never come out if the actual tools are only useful for Fedora.

Of course, I kind of screwed that up with the name I picked, but....

I merged the other one; it makes the most sense to me.

For the record, in general I'd like to at least try to keep this all as generic as possible. I think there's a good general framework for doing faster rsync-based mirroring in here, but that will never come out if the actual tools are only useful for Fedora.

Of course, I kind of screwed that up with the name I picked, but....

Oh, pagure, why do you make it look like you ate my comment and trick me into resubmitting it?

Pull-Request has been closed by tibbs

7 years ago

It should be relatively easy to tweak the 'imagelist' thing to be more powerful and generic if desired; make it some kind of generic 'produce filtered lists' mechanism that can filter in different ways, produce multiple lists, etc. I just didn't want to write that as a castle in the air with no real-world use and that I didn't fundamentally care about, which would inevitably rot away, so I went with the more minimal approach to it.

Of course, at that point it might make sense to split it off from create-imagelist as an independent script called filter-imagelist or something.

Metadata