#1 Paralellize
Merged 6 years ago by tibbs. Opened 6 years ago by zbyszek.
zbyszek/fedora-misc-package-utilities parallel  into  master

file modified
+9 -8
@@ -3,6 +3,7 @@ 

  import argparse

  import requests

  import sys

+ import multiprocessing

  

  VERBOSE = False

  
@@ -37,7 +38,6 @@ 

      vprint('Fetched maintainers of {}: {}'.format(pkg, ', '.join(maintainers)))

      return maintainers

  

- 

  def options_parse():

      p = argparse.ArgumentParser(

          description='Find maintainers given package names')
@@ -62,13 +62,14 @@ 

      by_package = {}

      opts = options_parse()

  

-     for package in opts.infile:

-         package = package.strip()

-         try:

-             by_package[package] = get_maintainers(package)

-         except PkgdbError as e:

-             print('ERR: {}'.format(e), file=sys.stderr)

-             continue

+     packages = [line.strip() for line in opts.infile]

+     with multiprocessing.Pool() as pool:

+         mapped = pool.map(get_maintainers, packages)

+     for package, mapping in zip(packages, mapped):

+         if isinstance(mapping, Exception):

+             print('ERR: {}: {}'.format(package, mapping), file=sys.stderr)

+         else:

+             by_package[package] = mapping

  

      if not by_package:

          print('No valid packages given.')

This brings down the wall-clock time down approximately proportionally
to the number of CPUs (or possibly even more, e.g. here 16 min 44 went
down to less then 1 min with 12 CPUs).

I added a loop on failure. Unfortunately sometimes the server returns
an invalid answer. That also happens when running serially, but it
seems to happen more often in parallelized mode.

I just noticed this, because it's just so easy to lose important notifications in the sheer flood of notification mail I get.

Of course the whole thing was broken after pkgdb went away. I've just merged a PR to fix that, but of course it conflicts with this one. So, first question is whether this is still even needed after the switch over to pagure.

How long does it take to run on pagure?

For ~1700 packages it took ~30 mins, with parallelization ~8 mins (4 CPUs). So I think this would still be beneficial.

rebased onto 5e9b556

6 years ago

Pull-Request has been merged by tibbs

6 years ago
Metadata