#1377 enable kdump addon by default
Closed None Opened 5 years ago by jianzhang.

= phenomenon =

= background analysis =

= implementation recommendation =
Kdump can be added to fedora by command "inst.kdump_addon" during installation.But it is inconvenient to add the command this way. Many people even don't know it. So can this command be added by default to let more people be able to use kdump?


With my FESCo hat on: You should probably discuss this with the kernel people given that they will be the most impacted, and then file a Change request.

With my kernel maintainer hat on: no. We're not staffed to make sure kdump works 100% of the time, and we know that it breaks often on non-server machines. One of our team members tried to use it this month to debug an upstream issue and it never worked. The closest it got to working was saying it booted, but then no vmcore was ever written. It seems to be very fragile. For kdump to be enabled by default, it would take several releases of it working continuously without breakage when we do a kernel rebase, etc.

The RHEL7 kdump is based on Fedora 19, most of the user space codes are same.

Enable kdump addon by default does not means it will enable kdump by default it just give user a choice while installing Fedora they can enable kdump with the anaconda addon, the "enable check button" is unchecked by default"

The problem for Fedora kdump is we did not get enought test from community, we only test it in house, if enable kdump addon we can encourage more people to try it if they are interested.

BTW, current kdump anaconda addon is added in boot.iso, but it is not in live media yet. By default user can not see it, you need explicitly specify inst.kdump_addon in grub cmdline while installing.

Replying to [comment:2 yangrr]:

The RHEL7 kdump is based on Fedora 19, most of the user space codes are same.

That isn't really relevant to how Fedora operates. The issues are rarely with the userspace tools and more often with the kernel. It seems kdump is tested heavily for RHEL, where the kernel remains fairly static and the machine types are similar in nature. Fedora's kernel rebases regularly and it's used on machines that range from low end netbooks, ARM boards, laptops, and all the way to servers.

Enable kdump addon by default does not means it will enable kdump by default it just give user a choice while installing Fedora they can enable kdump with the anaconda addon, the "enable check button" is unchecked by default"

Why would we present the user with something to enable if there's no reasonable expectation for it to work?

The problem for Fedora kdump is we did not get enought test from community, we only test it in house, if enable kdump addon we can encourage more people to try it if they are interested.

That's true, but I don't think it's reasonable to offer it to every possible user and hope they'll report issues and work through them. I would recommend a few rounds of Fedora Test Days around kdump. That way you get the people interested in helping.

Replying to [comment:4 jwboyer]:

Enable kdump addon by default does not means it will enable kdump by default it just give user a choice while installing Fedora they can enable kdump with the anaconda addon, the "enable check button" is unchecked by default"

Why would we present the user with something to enable if there's no reasonable expectation for it to work?

I think there is reasonable expectation for it to work. We expect kexec/kdump to work upstream and fedora is based on upstream kernel.

Secondly, that's the only way to test kdump and hope is that issues will be reported, bugs will be opened and developers will fix the issues. If you don't enable it at all, then you have no chance of getting bug reports.

Thirdly, it will not be enabled by default. And if you user chooses to enable it and it does not work, only penalty is 128MB of reserved memory. If it works, user got useful debugging information to report.

The problem for Fedora kdump is we did not get enought test from community, we only test it in house, if enable kdump addon we can encourage more people to try it if they are interested.

That's true, but I don't think it's reasonable to offer it to every possible user and hope they'll report issues and work through them. I would recommend a few rounds of Fedora Test Days around kdump. That way you get the people interested in helping.

I think if you want to improve kdump upstream and in fedora, we should enable it. Otherwise it will always be a rhel only feature and once in a while you will ask user to enable kdump and provide vmcore on
fedora and it will have lesser chances of working and then there will be another round of blog posts that kdump does not work with fedora.

To me the only way to improve the situation is to atleast enable kdump addon by default and make it easier for user to enable kdump if they like to.

Replying to [comment:5 vgoyal]:

Replying to [comment:4 jwboyer]:

Enable kdump addon by default does not means it will enable kdump by default it just give user a choice while installing Fedora they can enable kdump with the anaconda addon, the "enable check button" is unchecked by default"

Why would we present the user with something to enable if there's no reasonable expectation for it to work?

I think there is reasonable expectation for it to work. We expect kexec/kdump to work upstream and fedora is based on upstream kernel.

And yet we see very mixed results because of the wide variety of platforms.

Secondly, that's the only way to test kdump and hope is that issues will be reported, bugs will be opened and developers will fix the issues. If you don't enable it at all, then you have no chance of getting bug reports.

kdump is enabled in Fedora. It has always been enabled.

Thirdly, it will not be enabled by default. And if you user chooses to enable it and it does not work, only penalty is 128MB of reserved memory. If it works, user got useful debugging information to report.

This ticket is to enable it by default... so now I'm confused what we're talking about.

The problem for Fedora kdump is we did not get enought test from community, we only test it in house, if enable kdump addon we can encourage more people to try it if they are interested.

That's true, but I don't think it's reasonable to offer it to every possible user and hope they'll report issues and work through them. I would recommend a few rounds of Fedora Test Days around kdump. That way you get the people interested in helping.

I think if you want to improve kdump upstream and in fedora, we should enable it. Otherwise it will always be a rhel only feature and once in a while you will ask user to enable kdump and provide vmcore on
fedora and it will have lesser chances of working and then there will be another round of blog posts that kdump does not work with fedora.

To me the only way to improve the situation is to atleast enable kdump addon by default and make it easier for user to enable kdump if they like to.

There's two sides to this though.

1) Users are going to be very frustrated when they hit a bug already, and then if kdump (which is supposed to help debug things) also doesn't work, they now have 2 bugs to report. They won't see it as improving things. They'll see it as wasting their time because they get no direct benefit from kdump. It's for the kernel maintainers to use, not the users (in the general sense).

2) Getting and uploading a vmcore is time consuming and not something all users are even prepared to do. I believe their size prevents them from being attached to bugzilla, and Fedora has no easy infrastructure to store them. Even if a user overcomes all of that and provides one, it still might not get used by the kernel maintainers, which is again a waste of the user's time. If we had an automated way to do this for the user, so that they don't even need to think about it, then maybe that wouldn't be a huge issue.

Please don't misunderstand me. I think kdump is a valuable tool in certain situations. However, it's time consuming to do the dump, upload the vmcore, and on the kernel maintainer's side to actually use it. The issue I'm trying to avoid is it getting enabled by default and then providing very little value to anyone because it's either overkill or lost in the sea of bugs we have that are higher priority. Getting a vmcore for every kernel bug reported is not required and quite possibly overkill. So enabling it by default seems unnecessary.

I understand the catch-22 of kdump possibly being needed for a high priority bug in Fedora and then it isn't working because few people are testing it, but enabling it by default for all users is not the answer there. I think we can accomplish testing through test days, better communication with the kdump people when we rebase a Fedora kernel, etc.

Replying to [comment:5 vgoyal]:

Replying to [comment:4 jwboyer]:

Why would we present the user with something to enable if there's no reasonable expectation for it to work?

I think there is reasonable expectation for it to work. We expect kexec/kdump to work upstream and fedora is based on upstream kernel.

Secondly, that's the only way to test kdump and hope is that issues will be reported, bugs will be opened and developers will fix the issues. If you don't enable it at all, then you have no chance of getting bug reports.

For people like me who are generally aware of kdump but don’t know much about its implementation state, is there at least a vague consensus on what kinds of problems/bugs are we talking about? (And how would enabling the kdump addon help fix them?)

Is it general level of bugs in the code? Is it a strict hardware dependency that makes it impossible to use kdump on a large number of systems? Is it a loose hardware dependency that allows setting up kdump but frequently unexpectedly fails due to hardware variability? Is it a difficulty of setting it up correctly? Something else?

Replying to [comment:7 mitr]:

Why would we present the user with something to enable if there's no reasonable expectation for it to work?

I think there is reasonable expectation for it to work. We expect kexec/kdump to work upstream and fedora is based on upstream kernel.

Secondly, that's the only way to test kdump and hope is that issues will be reported, bugs will be opened and developers will fix the issues. If you don't enable it at all, then you have no chance of getting bug reports.

For people like me who are generally aware of kdump but don’t know much about its implementation state, is there at least a vague consensus on what kinds of problems/bugs are we talking about? (And how would enabling the kdump addon help fix them?)

Now a days, most of the problems happen because of drivers. Many a times drivers are not equipped to initialize devices properly in second kernel. That's a continuous maintenance effort.

From code maturity point of view, kdump code is stable. It is close to 10 year old code in kernel now.

Enabling add on by default allows people to enable kdump easily in fedora (if they like to) and that would also mean that it will get tested more frequently and that should translate into to bugs being reported and fixed and hence improving the state of kdump in fedora and upstream in general.

Is it general level of bugs in the code?

Nope.

Is it a strict hardware dependency that makes it impossible to use kdump on a large number of systems? > Is it a loose hardware dependency that allows setting up kdump but frequently unexpectedly fails due to > hardware variability? Is it a difficulty of setting it up correctly? Something else?

There was a phase where configuration of kdump was a problem and now things are much better in that aspect. Now a days most problem happen because of device driver failure or somehthing new shows up
in kernel and early boot of second kernel fails.

Sometimes it becomes difficult to debug those problems as failures happen very early in the boot and
VGA console does not work in second kernel and user is left guessing that what happened and there are
no messages on console. That's why most of the time having a serial console to debug kdump problem is
a must.

Right now kdump team is doing testing with USB serial console and see how well that is working.

So to answer your question, kdump is a subsystem which requires constant maintenance because often
new kernels and new drivers are broken and need to be fixed to make them work with kdump. It is not
a stability issue of kdump code as such.

The only chance to detect these new problems early is to make it easy for users to enable kdump and
allow them to report issues.

Otherwise, these issues will only be detected when things show up in rhel and testing is done by rhel
developers and teams. I feel that's too late and fedora will not benefit from this model.

(This got forgotten… who is the right Anaconda maintainer to add to this ticket?)

Replying to [comment:8 vgoyal]:

Sometimes it becomes difficult to debug those problems as failures happen very early in the boot and
VGA console does not work in second kernel and user is left guessing that what happened and there are
no messages on console. That's why most of the time having a serial console to debug kdump problem is
a must.

A serial console is really not an expectation for default Fedora installs, though, especially for the Workstation product.

I suppose I am not that worried about the second boot failing (or are data corruptions a risk?) or hanging, if the screen had a clear indication that the kernel crashed before the second boot could fail or hang. But I suspect that is an impossible thing to ask for as well. (The second boot failing in a way that causes a reboot is not a problem.)

Replying to [comment:9 mitr]:

(This got forgotten… who is the right Anaconda maintainer to add to this ticket?)

Should be vpodzime@redhat.com, how can I add him in cc?

Replying to [comment:6 jwboyer]:

Replying to [comment:5 vgoyal]:

Replying to [comment:4 jwboyer]:

Enable kdump addon by default does not means it will enable kdump by default it just give user a choice while installing Fedora they can enable kdump with the anaconda addon, the "enable check button" is unchecked by default"

Why would we present the user with something to enable if there's no reasonable expectation for it to work?

I think there is reasonable expectation for it to work. We expect kexec/kdump to work upstream and fedora is based on upstream kernel.

And yet we see very mixed results because of the wide variety of platforms.

I think we care take care of the problems case by case?

Secondly, that's the only way to test kdump and hope is that issues will be reported, bugs will be opened and developers will fix the issues. If you don't enable it at all, then you have no chance of getting bug reports.

kdump is enabled in Fedora. It has always been enabled.

I will assume it is the kernel config option, if so yes. But kdump service to load kernel, create, and load kdump initrd is not enabled by default. Also we need to manually set kernel cmdline then reboot to reserve memory for kdump

Thirdly, it will not be enabled by default. And if you user chooses to enable it and it does not work, only penalty is 128MB of reserved memory. If it works, user got useful debugging information to report.

This ticket is to enable it by default... so now I'm confused what we're talking about.

I think there's still misunderstanding about the "by default" (see comment #2), let me explain it again:

kdump addon is an anaconda addon, it will be show in installation gui only when there's kernel cmdline param "inst.kdump_addon". So it is be default disabled.

Suppose kdump addon is enabled, user can see it in installer gui, then we can choose to enable kdump with some radio buttons. Kdump is always by default disabled in the addon UI.

The problem for Fedora kdump is we did not get enought test from community, we only test it in house, if enable kdump addon we can encourage more people to try it if they are interested.

That's true, but I don't think it's reasonable to offer it to every possible user and hope they'll report issues and work through them. I would recommend a few rounds of Fedora Test Days around kdump. That way you get the people interested in helping.

I think if you want to improve kdump upstream and in fedora, we should enable it. Otherwise it will always be a rhel only feature and once in a while you will ask user to enable kdump and provide vmcore on
fedora and it will have lesser chances of working and then there will be another round of blog posts that kdump does not work with fedora.

To me the only way to improve the situation is to atleast enable kdump addon by default and make it easier for user to enable kdump if they like to.

There's two sides to this though.

1) Users are going to be very frustrated when they hit a bug already, and then if kdump (which is supposed to help debug things) also doesn't work, they now have 2 bugs to report. They won't see it as improving things. They'll see it as wasting their time because they get no direct benefit from kdump. It's for the kernel maintainers to use, not the users (in the general sense).

It is indeed a problem, but we will have no way to improve kdump if we do not get enough bug reports..

2) Getting and uploading a vmcore is time consuming and not something all users are even prepared to do. I believe their size prevents them from being attached to bugzilla, and Fedora has no easy infrastructure to store them. Even if a user overcomes all of that and provides one, it still might not get used by the kernel maintainers, which is again a waste of the user's time. If we had an automated way to do this for the user, so that they don't even need to think about it, then maybe that wouldn't be a huge issue.

Please don't misunderstand me. I think kdump is a valuable tool in certain situations. However, it's time consuming to do the dump, upload the vmcore, and on the kernel maintainer's side to actually use it. The issue I'm trying to avoid is it getting enabled by default and then providing very little value to anyone because it's either overkill or lost in the sea of bugs we have that are higher priority. Getting a vmcore for every kernel bug reported is not required and quite possibly overkill. So enabling it by default seems unnecessary.

I understand the catch-22 of kdump possibly being needed for a high priority bug in Fedora and then it isn't working because few people are testing it, but enabling it by default for all users is not the answer there. I think we can accomplish testing through test days, better communication with the kdump people when we rebase a Fedora kernel, etc.

Speaking from the user experience pov, I don't think we want to add any more questions in the installer. It needs to be good enough to be on by default, otherwise we shouldn't offer it to our users. And it needs to be integrated in the existing problem reporting infrastructure of Fedora (abrt, retrace, etc).

This ticket will be discussed in the FESCo meeting on Wednesday at 18:00UTC in #fedora-meeting on irc.freenode.net.

From today’s FESCo meeting: Enabling the kdump addon by default was rejected (-5)

Login to comment on this ticket.

Metadata