#9273 rsyslog was not running on fedorapeople
Opened 21 days ago by smooge. Modified 8 days ago

Describe what you would like us to do:

Went to see why fedoraplanet was not updating and found that no logs to /var/log/ had been written since reboot. Found that rsyslogd was dieing on reboot due to too many open files. After much searching found that it was because the number of user journals it is trying to open and other files are too much.

  1. Need nagios to monitor that rsyslogd is running on all hosts
  2. Need to update rsyslogd.conf to allow for number of new hosts.
  3. Need to fix error spewing to console on fedorapeople from patch work
rsyslogd: imjournal: rename() failed for new path: '/var/lib/rsyslog/imjournal.state': No such file or directory [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
rsyslogd: imjournal: rename() failed for new path: '/var/lib/rsyslog/imjournal.state': No such file or directory [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
rsyslogd: imjournal: rename() failed for new path: '/var/lib/rsyslog/imjournal.state': No such file or directory [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
rsyslogd: imjournal: rename() failed for new path: '/var/lib/rsyslog/imjournal.state': No such file or directory [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
rsyslogd: imjournal: rename() failed for new path: '/var/lib/rsyslog/imjournal.state': No such file or directory [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]

# audit2allow -a
#============= syslogd_t ==============
allow syslogd_t var_run_t:file { read unlink };

Checked restorecon and other variables and it did not 'fix' anything. ls -lZ showed all files have the same appropriate selinux context.


When do you need this to be done by? (YYYY/MM/DD)



@smooge could you give more context about 'monitor rsyslogd is running on all hosts'
did you mean check status of deamon ?
I have a restricted access to people02, which does not allow me to know how it behaved rsyslogd during the incident.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

21 days ago

@smooge did you see my last comment ?

sorry I am on vacation for a while. I am just looking for a status check that rsyslogd is running on the system.

So, I think we want something like the check_varnish_proc but a check_rsyslog_proc

And then we want it to run on all machines, so look at how say the 'mail_queue' check is set...

ok i see ..
i will work on it and open a PR

@smooge @kevin which threshold ranges you would like to use for ?
the same for varnishd deamon ?

Same as we use for varnishd I think:

-c 1:2

ie, if it's not running it's critical...

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Blocked