#130 Give ABRT some ❤
Opened 2 years ago by catanzaro. Modified 3 months ago

ABRT is very important for improving the quality of Fedora Workstation. So let's give it some ❤.

ABRT has two completely separate crash reporting paths currently, so let's talk about them separately.

❤ Automatic crash reports with truncated backtraces

This is already working more or less well.

The main problem here is that crash reports never seem to make their way from the retrace server to Bugzilla unless a user manually reports the crash to Bugzilla. Ideally, once a crash is reported by, say, 100 users, a Bugzilla report would be automatically created. (Mega bonus points if it could learn to create bug reports directly in GNOME GitLab.) I think we used to have this functionality, but if so, it's broken.

But on the whole, automatic crash reporting is in good shape.

❤ Semi-automated manual crash reports with full backtraces

Automatic crash reports, while important and helpful, don't contain full backtraces (to protect the user's privacy, since full backtraces sometimes contain sensitive data in stack frames). Without a full backtrace, resolving a crash using just the automatic crash report can be quite difficult. Often, the only practical solution is to wait for a manual ABRT bug report to arrive, since ABRT's manual bug reports to Red Hat Bugzilla include detailed backtraces.

Sadly, ABRT's manual bug report process suffers from a very large number of quality issues.

Highest-priority issues:

Other major issues:

Each of these issues is, on its own, a serious quality problem for Fedora Workstation. Combined, it's really quite a lot.

Pet peeve quality issues:

The WG should work with the ABRT developers to understand their priorities, figure out how we can reduce the number of quality issues, and determine whether we have a plausible path forward for ABRT ❤.

❤ Future wishlist

  • Flatpak crash support. We're in trouble if we're betting on flatpak but don't ensure it's possible for users to easily report crashes in flatpak applications.
  • Report crashes directly to GitLab. This is required for Flatpak crash reporting, but also it's a pragmatic reaction to the fact that for GNOME packages, Red Hat Bugzilla is basically an unmonitored dumpster fire where bug reports go to be ignored. We're far better at dealing with crash reports upstream than we are on Red Hat Bugzilla. I'd like to start autoclosing downstream bug reports against GNOME components (with exceptions for blocker, freeze exception, or downstream packaging bugs) so to make that successful, ABRT should learn to report directly upstream.

Report crashes directly to GitLab. This is required for Flatpak crash reporting, but also it's a pragmatic reaction to the fact that for GNOME packages, Red Hat Bugzilla is basically an unmonitored dumpster fire where bug reports go to be ignored. We're far better at dealing with crash reports upstream than we are on Red Hat Bugzilla. I'd like to start autoclosing downstream bug reports against GNOME components (with exceptions for blocker, freeze exception, or downstream packaging bugs) so to make that successful, ABRT should learn to report directly upstream.

I do not think this will end well at all. I think we're going to see people just ignoring them upstream like they do downstream.

I do not think this will end well at all. I think we're going to see people just ignoring them upstream like they do downstream.

In practice, this does happen in some GNOME projects. But on the whole, we are far better at responding to bug reports upstream than we are downstream. Let's continue this thread of discussion in #131.

I believe most issues are pure engineering, but we could probably do something about the overall quality of reports if we were able to define what makes a good report.

FAF could also be involved in reporting to bug trackers for some finer-grained controls of whatever (I don’t know, attachments to include, something), but then there’s a level of indirection and some challenges to overcome to prevent spam (my thinking here is if we don’t require reporters to have a Bugzilla account, but now I see how that could be annoying, not being able to query for information).

I believe most issues are pure engineering, but we could probably do something about the overall quality of reports if we were able to define what makes a good report.

I think the reports that ABRT uploads are generally already of good quality. The problems I identified above are mainly user and developer experience issues.

  • Abrt bug reports depend on the retrace server properly processing a stack trace, which means the retrace server needs to have a consistent environment with all the relevant debug/debuginfo packages installed. This sometimes isn't the case, e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1806231 where it had no idea what the g-i-s package even was and questioned whether I really had Fedora installed, haha. So I had to install 6G of debug packages to process the stack trace locally. In a given cycle, this happens to me maybe 1/2 dozen times. The vast majority of users bail and don't file a bug at all.
  • Fedora QA tracks blockers via RHBZ, not upstream.
  • A real improvement would be automatically filing both RHBZ and upstream bug reports. At the least, one of them is primary with all necessary attachments, and a secondary containing a summary and a "see also" for the primary bug.

Probably auto-filing bugs upstream and linking them to downstream bugs in RHBZ would work better than straight-up avoiding filing bugs downstream.

Abrt bug reports depend on the retrace server properly processing a stack trace, which means the retrace server needs to have a consistent environment with all the relevant debug/debuginfo packages installed. This sometimes isn't the case, e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1806231 where it had no idea what the g-i-s package even was and questioned whether I really had Fedora installed, haha. So I had to install 6G of debug packages to process the stack trace locally. In a given cycle, this happens to me maybe 1/2 dozen times. The vast majority of users bail and don't file a bug at all.

The branching of Rawhide is always a special time of year and there’s always a bit of lag between that and us adding the version and repository. If that still happens after, say, the Bodhi activation point, it’s a bigger problem.

Probably auto-filing bugs upstream and linking them to downstream bugs in RHBZ would work better than straight-up avoiding filing bugs downstream.

Please, I created #131 for this, let's discuss it there.

Metadata Update from @catanzaro:
- Issue tagged with: meeting-request

a year ago

Metadata Update from @chrismurphy:
- Issue untagged with: meeting-request
- Issue tagged with: meeting

a year ago

Discussed at today's meeting. Ernestas will provide us with occasional progress updates in this issue, and we'll revisit a few months from now.

Metadata Update from @catanzaro:
- Issue untagged with: meeting

a year ago

ABRT deletes core dumps far too aggressively, even while user is trying to report the crash

https://github.com/abrt/abrt/pull/1481 should address this. https://github.com/abrt/abrt/issues/1475, if (properly) implemented, should prevent the issue from happening altogether. Might aim for F33 with the latter.

Retrace server should not fail

In the end, we’ve decided to wait until we get the new hardware to redeploy everything on RHEL 8. With the amount of crud on the current server, we cannot guarantee that we would be able to replicate the deployment and be able to roll back.

I don’t have the dates, but it’s still “soon”.

Bug 1878317 - doesn't offer option to process stack traces locally is now a Fedora 33 beta blocker.

The decision to classify this bug as an "AcceptedBlocker" was made on the grounds that it "hinders execution of required Beta test plans or dramatically reduces test coverage"...

Metadata Update from @chrismurphy:
- Issue tagged with: meeting

a year ago

Bug 1878317 - doesn't offer option to process stack traces locally is now a Fedora 33 beta blocker.

There are two more. Barring some miracle, we will almost surely slip another week.

Metadata Update from @chrismurphy:
- Issue untagged with: meeting

a year ago

Adding the F34 milestone, mostly to ensure that we continue to track this.

Metadata Update from @aday:
- Issue set to the milestone: Fedora 34

11 months ago

Metadata Update from @catanzaro:
- Issue tagged with: meeting

10 months ago

Discussed again at today's WG meeting.

  • I think the longstanding user interface problems are not worth the cost and that we should ship only the non-GUI components of ABRT for automatic bug reporting, but not Bugzilla.
  • Neal and Langdon are strongly in favor of keeping ARBT in its entirety, including the report to Bugzilla GUI. In particular, Langdon hasn't noticed the problems Michael complains about.
  • Ernestas has left Red Hat. Tomas will invite Miroslav Suchy, head of ABRT team, to a future Working Group meeting.
  • Neal and Michael will further prioritize the existing list of priority issues in the first comment. It currently lists eight major issues, but we should further prioritize this so we have just a few top-priority issues.

Metadata Update from @catanzaro:
- Issue untagged with: meeting

9 months ago
  • Neal and Michael will further prioritize the existing list of priority issues in the first comment. It currently lists eight major issues, but we should further prioritize this so we have just a few top-priority issues.

We've split the "major issues" list in the first comment into two halves, "highest-priority" and "other major issues." The highest-priority list includes the three bullet points we found most important. It's actually five issues, but two of those are closely-related to other issues.

  • Ernestas has left Red Hat. Tomas will invite Miroslav Suchy, head of ABRT team, to a future Working Group meeting.

He'll attend next week, so I'll put this topic first on the agenda for next week.

Metadata Update from @catanzaro:
- Issue tagged with: meeting

9 months ago

Kevin mentioned the bugzilla dashboard which I didn't know about.

https://bugzilla.redhat.com/page.cgi?id=productdashboard.html&tab=summary&product=Fedora&bug_status=open&assignee_table_length=25

As for QA there isn't a systematic way of using bugzilla for learning about bugs or problem areas. In no particular order problem areas are uncovered by: blocker bug app, openqa testing, and the various compose reports that get sent to the devel@ list.

@adamwill

Well, also significant in that context are human mailing list posts (and IRC discussion, forum discussion, etc). We do follow those forums and jump on any problems that seem to need jumping on. Major issues tend to 'bubble up' from Bugzilla to the mailing lists especially. Crash reports are often useful for figuring out what is going on with a bug at that point, or at least identifying duplicates and getting a sense of the scale of the problem.

We do haphazardly use the Bugzilla reports too. For instance a year or two back I just decided to try and triage all gnome-shell bug reports, and I did a bunch of work digging through abrt crash reports there (which made up a large chunk of all the bug reports). Some of them did turn out to be quite significant bugs we could isolate and fix, IIRC.

There is one kind of systematic GNOME-specific issue here, which is that the backtraces abrt produces often aren't much use in diagnosing crashes in the Shell because the bug really happened "somewhere else", e.g. in the javascript code. But even then, abrt does attach a snippet of the system logs from around the time of the crash, which can help.

Metadata Update from @catanzaro:
- Issue untagged with: meeting
- Issue tagged with: meeting-request

9 months ago

Discussed at today's WG meeting.

Other high-priority and major issues:

  • Work on flatpak support is stalled, problems discussed here
  • Work on direct reporting GNOME bugs to upstream GitLab is underway.

@msuchy, would January 5 work OK for the next WG meeting? (If not, we can use a later date.)

@catanzaro I prefer date week later. On 5th I have planned PTO - it may or not happen. It depends on current local pandemic situation. 12th is definitely ok with me.

12th is booked too, so let's plan for the 17th.

12th is booked too, so let's plan for the 17th.

Hi @msuchy, I guess you still want to attend? Does the 17th sound good?

I understand you are no longer managing ABRT team anymore. Perhaps we should invite the new manager as well? Who would that be? (Of course, the whole team is welcome to attend.)

I can attend. I will also notify the new tech lead who is @msrb

I will attend as well ;)

Metadata Update from @catanzaro:
- Issue untagged with: meeting-request
- Issue tagged with: meeting

8 months ago

Discussed at today's meeting. ABRT developers accepted some of my suggestions in Possible sensitive data detected is almost always a false positive.

Partially discussed today, though we were short on time. Miroslav has posted a short summary in the issue.

Metadata Update from @catanzaro:
- Issue untagged with: meeting

8 months ago

Metadata Update from @catanzaro:
- Issue set to the milestone: None (was: Fedora 34)

7 months ago

Current status is retrace jobs seem to always fail. Bug was reported in October last year, but it's still broken.

Login to comment on this ticket.

Metadata