Issue #23: RFI: Merge identical arch FTIs - packager_dashboard

fedora-qa / packager_dashboard

#23 RFI: Merge identical arch FTIs

Closed 3 years ago by lbrabec. Opened 3 years ago by churchyard.

What if there was a logic that merges the architecture reports with identical issues?

We were already discussing this, and it honestly is a can of worms, once we get to more complex situations. Merging "the absolutely same ones" would be fairly straightforward, but I'm now trying to put together something that finds "largest common subsets", so we can group the reasons "more better" :) (and yes, my math is weak :D)

churchyard commented 3 years ago

Nice. Grouping reasons more better sounds awesome :D

Side note: with traditional libraries, the reasons will usually not be the same on 32bit and 64bit, but that at least means there will be 2 groups instead of 6 arches.

churchyard commented 3 years ago

(Do you merge on backend or frontend?)

jskladan commented 3 years ago

IDK, I'm fairly positive I have "the algorithm" figured out, but would be nice if somebody checked my "math" :D

import collections
import pprint

d1 = dict(
  a = [1, 2, 3, 4, 5],
  b = [4, 5, 6, 7, 8],
  c = [1, 2, 3, 4, 5, 6, 7, 8]
)

pprint.pprint(d1)

d2 = collections.defaultdict(list)
for k,v in d1.items():
    for i in v:
        d2[i].append(k)


pprint.pprint(d2)
"""
defaultdict(<class 'list'>,
            {1: ['a', 'c'],
             2: ['a', 'c'],
             3: ['a', 'c'],
             4: ['a', 'b', 'c'],
             5: ['a', 'b', 'c'],
             6: ['b', 'c'],
             7: ['b', 'c'],
             8: ['b', 'c']})
"""

d3 = collections.defaultdict(list)
for k, v in d2.items():
    d3[', '.join(v)].append(k)

pprint.pprint(d3)
"""
defaultdict(<class 'list'>,
            {'a, b, c': [4, 5],
             'a, c': [1, 2, 3],
             'b, c': [6, 7, 8]})

"""

jskladan commented 3 years ago

And I'm also pretty sure this can be condensed down to some map-reduce (so it would be javascript and multicore friendly), but I'm way to tired to do that now.

churchyard commented 3 years ago

I see what you did there. This can however give you a "less better" result:

d1 = dict(
  a = [1, 8, 9, 10, 11],
  b = [2, 7, 8],
  c = [3, 7, 9],
  d = [4, 10, 12],
  e = [5, 11],
  f = [6, 12, 8],
)

This has "only" 6 keys, but the result has 12 (and you can get more if you try harder). Is the idea to use the result only if it has less keys than the input, because for real cases, it's likely to create better results?

jskladan commented 3 years ago

This has "only" 6 keys, but the result has 12 (and you can get more if you try harder). Is the idea to use the result only if it has less keys than the input, because for real cases, it's likely to create better results?

It honestly just did not occur to me.

I'm not necessarily sure "number of keys" is the best metric - take this as an example where the "number of keys on output" is bigger, yet the outcome is (IMO) more readable:

d1 = dict(
  a = [1, 2, 3, 4, 5, 6],
  b = [1, 2 ,3, 7, 8],
  c = [1, 2, 3, 4, 8],
  d = [1, 2, 4, 7, 8]
  )
...
pprint.pprint(d3)
"""
defaultdict(<class 'list'>,
            {'a, b, c, d': [1, 2],
             'a, b, c': [3],
             'a, c, d': [4],
             'b, c, d': [8],
             'b, d': [7],
             'a': [5, 6]})
"""

But I struggle to represent "readability" in a programatical way. From the top of my head, I can think of these:
- number of lines produced on the output - aim for at least 25-30% reduction?
- average number of keys-per-section - aim above 2?

Given the code I already shared, this is what I mean:

lines =  (sum([len(v) for v in d3.values()]) + len(d3)) / (sum([len(v) for v in d1.values()]) + len(d1))
condensation = sum([len(k.split(', ')) for k in d3]) / float(len(d3))

print("Lines proportion:", lines) 
print("Key condensation:", condensation)

Obviously tailored to "suit my example and kill yours" (0.52, 2.33) vs (0.96, 1.58).

I think the best way forward would be to test this out on top of the real FTI data, and see what happens, so we can tweak the metrics (or find out that we would not need them anyway).

Also, in the end, it really depends on what you, as a packager, want to be seeing, since I'm not the target audience here, I kind of have a rough time deciding :)

Maybe this whole exercise just does not make sense in the real world, and we'd be better-off with just de-duplicating when all the reasons are the same for the two (or more) arches. WDYT?

Edited 3 years ago by jskladan

churchyard commented 3 years ago

Maybe this whole exercise just does not make sense in the real world, and we'd be better-off with just de-duplicating when all the reasons are the same for the two (or more) arches. WDYT?

Right. Let's start with merging the arches if the sets are identical. And we might as well end there.

Metadata Update from @lbrabec:
- Issue assigned to lbrabec

3 years ago

lbrabec commented 3 years ago

Right. Let's start with merging the arches if the sets are identical. And we might as well end there.

Deployed.

Metadata Update from @lbrabec:
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata

Assignee

lbrabec

Tags

None

Blocking

None

Depending on

None

Attachments 2

png

Attached 3 years ago View Comment

png

Attached 3 years ago View Comment

fedora-qa / packager_dashboard

Source Code

#23 RFI: Merge identical arch FTIs Closed 3 years ago by lbrabec. Opened 3 years ago by churchyard.

Metadata

Attachments 2

#23 RFI: Merge identical arch FTIs

Closed 3 years ago by lbrabec. Opened 3 years ago by churchyard.