Issue #22: investigate logging mechanisms for taskotron runner - libtaskotron

taskotron / libtaskotron

#22 investigate logging mechanisms for taskotron runner

Closed: Fixed None Opened 10 years ago by tflink.

At the moment, all output from the runner is done with python 'print' statements. This is not the best way to be doing things but before implementing anything better, some investigation needs to be done.

[[http://pythonhosted.org/Logbook/index.html|Logbook]] looks to be an interesting project that could suit our needs well. I don't think we need any of the enhancements it brings over the stdlib logging module yet but I'm interested in the idea of sending errors off to a central location. As the implementation of taskotron grows, I think that would help us keep on top of client failures.

Investigate logbook and determine:
- is logbook mature enough to use in production
- would the logging enhancements be useful enough to us to justify using a less-standard logging mechanism
- how much extra effort would it take to use logbook instead of logging

mkrizek commented 10 years ago

Here's quick summary of my investigation:

logbook
* http://pythonhosted.org/Logbook/
* 3 years of development
* last version and last commit are 3 months old
* haven't found any comments on how stable and production-ready the module is -- either not many use it or there are no problems with it :)
* packaged in Fedora
* the documentation says that stdlib logging and logbook are two-way compatible: "Because of that, Logbook is two-way compatible with logging and one-way compatible with warnings. If you want, you can let all logging calls redirect to the logbook handlers or the other way round, depending on what your desired setup looks like. That way you can enjoy the best of both worlds."
-- using logbook instead of stdlib logging shouldn't take much extra effort, it seems, they are very similar
* I might be wrong but from the docs I got a feeling that logbook offers about the same functionality as stdlib logging but is somewhat easier to use
* I am really not sure that we'd need enhancements of logbook and stdlib logging offers central logging

logstash (central logging)
* http://logstash.net/
* I hear that it's production-ready
* not packaged in Fedora but available through ruby gems
* http://cookbook.logstash.net/recipes/logging-from-python/
* http://cookbook.logstash.net/recipes/syslog-pri/
* stdlib logging offers sending logs to a central server (http://docs.python.org/2/library/logging.handlers.html#sysloghandler)

Do you have different feeling about logbook or any other thoughts?

tflink commented 10 years ago

Thanks for doing the research.

== Logbook ==
That pretty much matches what I suspected. I was intrigued by the idea of being able to fire off events/tickets on log messages over a certain severity but I suppose that if we end up centralizing logging with something like logstash, that doesn't have so much appeal.

It sounds like you're of the opinion that while logbook may be easier to use, there doesn't seem to be enough benefit for us to justify using it over the stdlib logging?

== Logstash ==

I didn't know that logstash wasn't packaged yet. I know that lmacken has expressed interest in logstash and knows the original author. I'm not all that interested in maintaining yet another non-python package for part of our production infrastructure. [[http://logstash.net/docs/1.3.3/repositories|The logstash devs have created packages for centos]], though so it may not be a huge issue - especially if we run it on rhel instead of fedora.

I've said this to martin before, but I'm of the opinion that some form of centralized logging is pretty much a requirement for a production deployment of taskotron (assuming we can spare the storage and other resources). Those of use who have done the "search through all the clients one at a time for logs" dance before can speak to just how much fun that is :)

Another feature that is missing from your list is integration with other tools, specifically [[http://www.elasticsearch.org/overview/kibana/|kibana]] and [[http://www.elasticsearch.org/overview/|elasticsearch]]. These tools would give us log visualizations and indexing, respectively. Test log indexing is something that I've been thinking about long before taskotron came along and while I'm not sure if elasticsearch could provide all of what I have in mind, I do think that we'd benefit from the ability to search through all of the logs from taskotron when triaging issues.

== Other Log Collectors ==

Two alternatives to logstash are [[http://flume.apache.org/|Apache Flume]] and [[http://fluentd.org/|fluentd]]. From what I've read, flume is designed to work with hadoop clusters and since it writes to HDFS, is pretty much a non-starter for us. Fluentd and logstash seem to be pretty comparable, other than language differences, the summary I read was that they identical for many uses - fluentd requires a little less configuration and logstash is a little more flexible but neither one of those issues is by much.

If all else fails, we could always use collectd or rsyslog, to collect logs from all the hosts and services involved.

Any other thoughts on logbook or log collectors?

mkrizek commented 10 years ago

It sounds like you're of the opinion that while logbook may be easier to use, there doesn't seem to be enough benefit for us to justify using it over the stdlib logging?

Yes, In general I prefer anything stdlib, moreover if the less-standard logging doesn't give us that much advantage (at least from my POV).

Another feature that is missing from your list is integration with other tools, specifically kibana and elasticsearch.

I forgot to mention this. Thanks for bringing this up. It seems like both logstash and fluentd support these.

ad. Log Collectors, I found a few good articles on the subject:
http://jasonwilder.com/blog/2012/01/03/centralized-logging/
http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/
http://jasonwilder.com/blog/2013/11/19/fluentd-vs-logstash/

Which are probably what you read since they match your summary. I guess the choice between the two goes down to how easy (packaging mostly) can those be installed.

BTW the thing that worries me about fluentd is (taken from one of the links above):

If you use the open-source version, you’ll need to install Fluentd from source or via gem install. Since Fluentd is primarily developed by a commercial company, their deb and rpm packages are configured to send data to their hosted centralized logging platform.

mkrizek commented 10 years ago

So I set up a centralized logstash instance [1] in VM and then run rpmlint task to send logs to the server, it seems to be working fine. Setting it up was pretty straight-forward. I used RPMs for logstash, elasticsearch [2][3][4] and for redis (Fedora repo) as well. We can use e.g. mongodb instead of redis as database if we want. Redis seems to be recommended though.

I have a few questions regarding logging:

What streams do we want to use for logging?
* stdout/stderr -- for debugging
* file (rotating) -- for production, logstash shipper sends its content to the central logstash server
* syslog -- do we want this as well? We use syslog for blockerbugs since infra requires it, if I am not mistaken

Are tasks running in stand-alone mode supposed to use taskotron's logger? I noticed task-rpmlint imports tap from libtaskotron anyway.

What fields do we want logs to be searched on on the central server?
* task name
* host
* file
* args
-- envr
-- arch
-- other?
The format of log messages needs to be adjusted a little so they can be searched on easily in kibana (logstash/elasticsearch web UI).

Would it make sense to configure logging through an yaml file? Or do it in the source/project's config file as usual?

[1] http://logstash.net/docs/1.3.3/tutorials/getting-started-centralized
[2] https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.10.noarch.rpm
[3] http://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-1.3.3-1_centos.noarch.rpm
[4] http://www.elasticsearch.org/blog/apt-and-yum-repositories/

tflink commented 10 years ago

Would there be a good reason for using something other than redis? IIRC, they added redis to the suggested deployment to address issues that people were seeing in production.

I'm not sure I understand your question about streams, though. At a minimum, I want to collect the task logs (stderr and stdout). Beyond that, it really depends on how well we can keep the various log types separated. I've not spent enough time with logstash to really understand how information can be tagged and how it's represented after shipping to the server.

I had been planning for tasks to use their own logger, or at least differentiate from the libtaskotron logger so that we can separate the logs. That being said, I expect that some tasks will not be written in python and we need to have a way to monitor their logs. The first thing that comes to mind is to have the directive handle logging for the non-python case but I'm open to other suggestions.

As far as the fields to index, I'm most interested in host and taskname initially. Is it possible/practical to index the log messages themselves? It'd be nice to be able to search for some specific error message to see where else it happened.

Do you know what changes will need to be made to the log messages? We haven't started logging in the runner or the tasks yet and I'd like to start putting things in a format that logstash/kibana/elasticsearch works well with.

As far as configuration goes, I don't have my heart set on any particular method as long as everything works equally well in production, staging and locally on peoples' machines. The runner and tasks need to be able to run outside of the production environment without extra setup and configuration.

Overall, it sounds like you think logstash will work well for us? Do you have any insight on what kind of storage and memory overhead will be required for indexing? How configurable are the retention policies?

tflink commented 9 years ago

While we haven't implemented anything yet, I think that enough research has been done on this for now. Closing task

Metadata

Assignee

None

Tags

None

Blocking

#12

Taskotron 0.1 Runner

Depending on

None

Priority

Normal

taskotron / libtaskotron

Source Code

#22 investigate logging mechanisms for taskotron runner Closed: Fixed None Opened 10 years ago by tflink.

Metadata

#22 investigate logging mechanisms for taskotron runner

Closed: Fixed None Opened 10 years ago by tflink.