#2103 fix list-signed --tag memory issues
Merged 3 years ago by tkopecek. Opened 4 years ago by tkopecek.
tkopecek/koji issue2102  into  master

file modified
+13 -7
@@ -2101,6 +2101,8 @@ 

      rpm_idx = {}

      if options.key:

          qopts['sigkey'] = options.key

+ 

+     sigs = []

      if options.rpm:

          rpm_info = options.rpm

          try:
@@ -2112,7 +2114,7 @@ 

          if rinfo.get('external_repo_id'):

              parser.error(_("External rpm: %(name)s-%(version)s-%(release)s.%(arch)s@"

                             "%(external_repo_name)s") % rinfo)

-         qopts['rpm_id'] = rinfo['id']

+         sigs += session.queryRPMSigs(rpm_id=rinfo['id'], **qopts)

      if options.build:

          build = options.build

          try:
@@ -2121,13 +2123,10 @@ 

              pass

          binfo = session.getBuild(build, strict=True)

          build_idx[binfo['id']] = binfo

-         sigs = []

          rpms = session.listRPMs(buildID=binfo['id'])

          for rinfo in rpms:

              rpm_idx[rinfo['id']] = rinfo

              sigs += session.queryRPMSigs(rpm_id=rinfo['id'], **qopts)

-     else:

-         sigs = session.queryRPMSigs(**qopts)

      if options.tag:

          tag = options.tag

          try:
@@ -2138,9 +2137,16 @@ 

          tagged = {}

          for binfo in builds:

              build_idx.setdefault(binfo['id'], binfo)

-         for rinfo in rpms:

-             rpm_idx.setdefault(rinfo['id'], rinfo)

-             tagged[rinfo['id']] = 1

+         results = []

+         # use batched multicall as there could be potentially a lot of results

+         # so we don't exhaust server resources

+         with session.multicall(batch=5000) as m:

+             for rinfo in rpms:

+                 rpm_idx.setdefault(rinfo['id'], rinfo)

+                 tagged[rinfo['id']] = 1

+                 results.append(m.queryRPMSigs(rpm_id=rinfo['id']), **qopts)

+         sigs += [x.result[0] for x in results]

+ 

      # Now figure out which sig entries actually have live copies

      for sig in sigs:

          rpm_id = sig['rpm_id']

Just wondering about magic numbers here (eg "5000"). Are the defaults not enough? Why 5000?

Just wondering about magic numbers here (eg "5000"). Are the defaults not enough? Why 5000?

Default is None, which means do everything in one call, it could lead to either client or server-side memory exhaustion (or due to xmlrpc overhead to big request/response denial), so it is better to do that in batches. 5000 is more a guess of size, than a justified number. For simpler cases I used a 10000 and e.g. in clone_tag it can be set by CLI option as it is very API intensive and optimal value can be found per environment / size of tag.

Ok, cool. Thanks for the explanation. I wonder if we should put a small comment here explaining the significance of 5000, so the code is easier to understand. "Tested 5000 with XX GB of memory on the hub" or something so that users know the RAM dependency.

Metadata Update from @tkopecek:
- Pull-request tagged with: testing-ready

3 years ago

rebased onto eb7c33a

3 years ago

Commit 4b3c9f3 fixes this pull-request

Pull-Request has been merged by tkopecek

3 years ago

Metadata Update from @jcupova:
- Pull-request tagged with: testing-done

3 years ago