PR#1015: Get the RPM license headers from Koji and use it to fill the MMD content licenses field. - fm-orchestrator

fm-orchestrator

#1015 Get the RPM license headers from Koji and use it to fill the MMD content licenses field.

Merged 5 years ago by jkaluza. Opened 5 years ago by jkaluza.

jkaluza/fm-orchestrator cg-fill-licenses into cg-final-mmds

Get the RPM license headers from Koji and use it to fill the MMD content licenses field.

Jan Kaluza • 5 years ago

3001a8a

module_build_service/builder/KojiContentGenerator.py

file modified

+40 -24

		`@@ -215,36 +215,46 @@`
		`# If the tag doesn't exist.. then there are no rpms in that tag.`
		`return []`

		`- # Get the exclusivearch and excludearch lists for each RPM.`
		`+ # Get the exclusivearch, excludearch and license data for each RPM.`
		`# The exclusivearch and excludearch lists are set in source RPM from which the RPM`
		`# was built.`
		`- # Create temporary dict with source RPMs in rpm_id:build_id format.`
		`- src_rpms_ids = {rpm["id"]: rpm["build_id"] for rpm in rpms if rpm["arch"] == "src"}`
		`+ # Create temporary dict with source RPMs in rpm_id:rpms_list_index format.`
		`+ src_rpms = {}`
		`+ binary_rpms = {}`
		`+ for rpm in rpms:`
		`+ if rpm["arch"] == "src":`
		`+ src_rpms[rpm["id"]] = rpm`
		`+ else:`
		`+ binary_rpms[rpm["id"]] = rpm`
lholecek commented 5 years ago Let's play code golf: src_rpms = {rpm for rpm in rpms if rpm["arch"] == "src"} binary_rpms = {rpm for rpm in rpms if rpm["arch"] != "src"}
jkaluza commented 5 years ago Iterates the list of RPMs twice :). Not a big deal, I had that code in previous versions of this PR locally, but decided I will use single iteration instead in the end.
		`# Prepare the arguments for Koji multicall.`
		`- # We will call session.getRPMHeaders(...) for each SRC RPM to get exclusivearch and`
		`- # excludearch headers.`
		`- multicall_kwargs = [{"rpmID": rpm_id, "headers": ["exclusivearch", "excludearch"]}`
		`- for rpm_id in src_rpms_ids.keys()]`
		`- src_rpms_headers = koji_retrying_multicall_map(`
		`+ # We will call session.getRPMHeaders(...) for each SRC RPM to get exclusivearch,`
		`+ # excludearch and license headers.`
		`+ multicall_kwargs = [{"rpmID": rpm_id,`
		`+ "headers": ["exclusivearch", "excludearch", "license"]}`
		`+ for rpm_id in src_rpms.keys()]`
		`+ # For each binary RPM, we only care about the "license" header.`
		`+ multicall_kwargs += [{"rpmID": rpm_id, "headers": ["license"]}`
		`+ for rpm_id in binary_rpms.keys()]`
		`+ rpms_headers = koji_retrying_multicall_map(`
		`session, session.getRPMHeaders, list_of_kwargs=multicall_kwargs)`

		`# Temporary dict with build_id as a key to find builds easily.`
		`builds = {build['build_id']: build for build in builds}`

		`# Handle the multicall result. For each build associated with the source RPM,`
		`- # store the exclusivearch and excludearch lists.`
		`- for build_id, headers in zip(src_rpms_ids.values(), src_rpms_headers):`
		`- builds[build_id]["exclusivearch"] = headers["exclusivearch"]`
		`- builds[build_id]["excludearch"] = headers["excludearch"]`
		`-`
		`- # Check each RPM and fill-in additional data from its build to get them`
		`- # easily in other methods.`
		`- for rpm in rpms:`
		`- idx = rpm['build_id']`
		`- rpm['srpm_name'] = builds[idx]['name']`
		`- rpm['srpm_nevra'] = builds[idx]['nvr']`
		`- rpm['exclusivearch'] = builds[idx]['exclusivearch']`
		`- rpm['excludearch'] = builds[idx]['excludearch']`
		`+ # store the exclusivearch and excludearch lists. For each RPM, store the 'license' and`
		`+ # also other useful data from the Build associated with the RPM.`
		`+ for rpm, headers in zip(src_rpms.values() + binary_rpms.values(), rpms_headers):`
		`+ build = builds[rpm["build_id"]]`
		`+ if "exclusivearch" in headers and "excludearch" in headers:`
		`+ build["exclusivearch"] = headers["exclusivearch"]`
		`+ build["excludearch"] = headers["excludearch"]`
		`+`
		`+ rpm["license"] = headers["license"]`
		`+ rpm['srpm_name'] = build['name']`
		`+ rpm['srpm_nevra'] = build['nvr']`
		`+ rpm['exclusivearch'] = build['exclusivearch']`
		`+ rpm['excludearch'] = build['excludearch']`

		`return rpms`

		`@@ -422,7 +432,7 @@`
		`def _fill_in_rpms_list(self, mmd, arch):`
		`"""`
		Fills in the list of built RPMs in architecture specific `mmd` for `arch`
		- using the data from `self.rpms_dict`.
		+ using the data from `self.rpms_dict` as well as the content licenses field.

		`:param Modulemd.Module mmd: MMD to add built RPMs to.`
		`:param str arch: Architecture for which to add RPMs.`
		`@@ -445,6 +455,9 @@`
		`# Modulemd.SimpleSet into which we will add the RPMs.`
		`rpm_artifacts = Modulemd.SimpleSet()`

		`+ # Modulemd.SimpleSet into which we will add licenses of all RPMs.`
		`+ rpm_licenses = Modulemd.SimpleSet()`
		`+`
		# Check each RPM in `self.rpms_dict` to find out if it can be included in mmd
		`# for this architecture.`
		`for nevra, rpm in self.rpms_dict.items():`
		`@@ -515,6 +528,11 @@`
		`# Add RPM to packages.`
		`rpm_artifacts.add(nevra)`

		`+ # Not all RPMs have licenses (for example debuginfo packages).`
		`+ if "license" in rpm and rpm["license"]:`
		`+ rpm_licenses.add(rpm["license"])`
		`+`
		`+ mmd.set_content_licenses(rpm_licenses)`
		`mmd.set_rpm_artifacts(rpm_artifacts)`
		`return mmd`

		`@@ -522,7 +540,6 @@`
		`"""`
		`Finalizes the modulemd:`
		`- Fills in the list of built RPMs respecting filters, whitelist and multilib.`
		`- - TODO: Fills in the list of licences.`

		`:param str arch: Name of arch to generate the final modulemd for.`
		`:rtype: str`
		`@@ -532,7 +549,6 @@`
		`# Fill in the list of built RPMs.`
		`mmd = self._fill_in_rpms_list(mmd, arch)`

		`- # TODO: Fill in the licences.`
		`return unicode(mmd.dumps())`

		`def _prepare_file_directory(self):`

tests/test_content_generator.py

file modified

+33 -3

		`@@ -19,7 +19,9 @@`
		`# SOFTWARE.`
		`#`
		`# Written by Stanislav Ochotnicky <sochotnicky@redhat.com>`
		`+ # Jan Kaluza <jkaluza@redhat.com>`

		`+ import pytest`
		`import json`

		`import os`
		`@@ -364,8 +366,10 @@`
		`koji_session.listTaggedRPMS.return_value = (rpms, builds)`
		`koji_session.multiCall.side_effect = [`
		`# getRPMHeaders response`
		`- [[{'excludearch': ["x86_64"], 'exclusivearch': []}],`
		`- [{'excludearch': [], 'exclusivearch': ["x86_64"]}]]`
		`+ [[{'excludearch': ["x86_64"], 'exclusivearch': [], 'license': 'MIT'}],`
		`+ [{'excludearch': [], 'exclusivearch': ["x86_64"], 'license': 'GPL'}],`
		`+ [{'license': 'MIT'}],`
		`+ [{'license': 'GPL'}]]`
		`]`
		`get_session.return_value = koji_session`

		`@@ -374,11 +378,14 @@`
		`# We want to mainly check the excludearch and exclusivearch code.`
		`if rpm["name"] == "module-build-macros":`
		`assert rpm["excludearch"] == ["x86_64"]`
		`+ assert rpm["license"] == "MIT"`
		`else:`
		`assert rpm["exclusivearch"] == ["x86_64"]`
		`+ assert rpm["license"] == "GPL"`

		`def _add_test_rpm(self, nevra, srpm_name=None, multilib=None,`
		`- koji_srpm_name=None, excludearch=None, exclusivearch=None):`
		`+ koji_srpm_name=None, excludearch=None, exclusivearch=None,`
		`+ license=None):`
		`"""`
		`Helper method to add test RPM to ModuleBuild used by KojiContentGenerator`
		`and also to Koji tag used to generate the Content Generator build.`
		`@@ -392,6 +399,7 @@`
		`srpm_name` is "httpd" but `koji_srpm_name` would be "httpd24-httpd".
		`:param list excludearch: List of architectures this package is excluded from.`
		`:param list exclusivearch: List of architectures this package is exclusive for.`
		`+ :param str license: License of this RPM.`
		`"""`
		`parsed_nevra = kobo.rpmlib.parse_nvra(nevra)`
		`parsed_nevra["payloadhash"] = "hash"`
		`@@ -401,6 +409,7 @@`
		`parsed_nevra["srpm_name"] = srpm_name`
		`parsed_nevra["excludearch"] = excludearch or []`
		`parsed_nevra["exclusivearch"] = exclusivearch or []`
		`+ parsed_nevra["license"] = license or ""`
		`self.cg.rpms.append(parsed_nevra)`
		`self.cg.rpms_dict[nevra] = parsed_nevra`

		`@@ -519,3 +528,24 @@`
		`"dhcp-libs-12:4.3.5-5.module_2118aef6.x86_64",`
		`"dhcp-libs-12:4.3.5-5.module_2118aef6.i686",`
		`"perl-Tangerine-12:4.3.5-5.module_2118aef6.x86_64"])`
		`+`
		`+ @pytest.mark.parametrize(`
		`+ "licenses, expected", (`
		`+ (["GPL", "MIT"], ["GPL", "MIT"]),`
		`+ (["GPL", ""], ["GPL"]),`
		`+ (["GPL", "GPL"], ["GPL"]),`
		`+ )`
		`+ )`
		`+ def test_fill_in_rpms_list_license(self, licenses, expected):`
		`+ self._add_test_rpm("dhcp-libs-12:4.3.5-5.module_2118aef6.x86_64", "dhcp",`
		`+ license=licenses[0])`
		`+ self._add_test_rpm("dhcp-libs-12:4.3.5-5.module_2118aef6.i686", "dhcp")`
		`+ self._add_test_rpm("perl-Tangerine-12:4.3.5-5.module_2118aef6.x86_64", "perl-Tangerine",`
		`+ license=licenses[1])`
		`+ self._add_test_rpm("perl-Tangerine-12:4.3.5-5.module_2118aef6.i686", "perl-Tangerine")`
		`+`
		`+ mmd = self.cg.module.mmd()`
		`+ mmd = self.cg._fill_in_rpms_list(mmd, "x86_64")`
		`+`
		`+ # Only x86_64 packages should be filled in, because we requested x86_64 arch.`
		`+ assert set(mmd.get_content_licenses().get()) == set(expected)`

jkaluza commented 5 years ago

This PR is against cg-final-mmds branch and it is just part of the end goal which is generating the final modulemd files in the content generator build. It is therefore known to not be complete, but I decided to code this bigger feature in smaller chunks to make reviews easier.

This PR changes the KojiContentGenerator to get the "license" headers from each RPM in modular Koji tag and include the list of all used licenses in the resulting per-architecture MMD file.

mprahl commented on line 24 of module_build_service/builder/KojiContentGenerator.py 5 years ago

Optional: Instead of an OrderedDict, you could just use a dict with the same keys and have the values be the rpm instead of the index. It'd be easier to read and I believe it'd actually use less RAM (although minimal) since the value of each key would be a reference to a memory address instead of a new value.

mprahl commented on line 55 of module_build_service/builder/KojiContentGenerator.py 5 years ago

This should be:

if "exclusivearch" in headers and "excludearch" in headers:

mprahl commented on line 52 of module_build_service/builder/KojiContentGenerator.py 5 years ago

Optional: I think you can consolidate the for rpm in rpms: loop into this one since this loop should be iterating over all the RPMs.

Edited 5 years ago by mprahl

mprahl commented on line 87 of module_build_service/builder/KojiContentGenerator.py 5 years ago

Optional: This can be optimized to:

if rpm.get("license"):

Edited 5 years ago by mprahl

rebased onto 3001a8a

5 years ago

lholecek commented on line 18 of module_build_service/builder/KojiContentGenerator.py 5 years ago

Let's play code golf:

src_rpms = {rpm for rpm in rpms if rpm["arch"] == "src"}
binary_rpms = {rpm for rpm in rpms if rpm["arch"] != "src"}

mprahl commented 5 years ago

:thumbsup:

jkaluza commented on line 18 of module_build_service/builder/KojiContentGenerator.py 5 years ago

Iterates the list of RPMs twice :). Not a big deal, I had that code in previous versions of this PR locally, but decided I will use single iteration instead in the end.

Pull-Request has been merged by jkaluza

5 years ago