#4045 ansible cron job
Closed: Fixed None Opened 6 years ago by toshio.

= phenomenon =

We need to write a cron job to run ansible-playbook against all of our ansible managed hosts tobring their configuration into sync with what's in the ansible config git repo.

= background analysis =

For our hosts managed by puppet, the hosts check in hourly for files that are different on the host than in puppet.

For our ansible hosts, there is currently no equivalent. This means that if the config files on the hosts get out of sync with what's in ansible, they won't be brought back in line until someone happens to run ansible-playbook on them manually.

We need to write a cron job to do this.

= implementation recommendation =

  • The cron job needs to run as root on lockbox01 and try to bring each host to the current config.
  • We decided this should run once daily instead of hourly.
  • We had also talked about having the script warn instead of modify. ie: the script would invoke ansible-playbook in some sort of dry-run mode. If nothing would be modified, it would exit. If it would modify something, it would record what host would see what modifications and then email those to the admin@ alias. Admins would take care of propagating the changes.

I will have a look and see what I can do.

Here is first try. I would really need some info to successfully complete the job.

I would really appreciate if somebody could help me with these questions.

  1. What is the output if you run: '''ansible-playbook playbook.yml --check --diff'''?
  2. Who is the receiver of the email with the output from the above command? Is it just one or are there several receivers? Do they all get the same email?

Email should go to admin@fp.o which will forward it to all the administrators.

CC'ing sysadmin-main as well.

Output looks like this:
{{{
$ sudo -i ansible-playbook /home/fedora/toshio/ansible2/playbooks/hosts/elections-dev.cloud.fedoraproject.org.yml --check --diff

PLAY [check/create instance] ********

TASK: [check it out] **********
skipping: [209.132.184.162]

TASK: [spin it up] ************
skipping: [209.132.184.162]

TASK: [assign it a special ip] ********
skipping: [209.132.184.162]

TASK: [wait for the reassignation] ********
skipping: [209.132.184.162]

TASK: [attach volumes to the system] ******
skipping: [209.132.184.162] => (item=-d /dev/vdb vol-0000000e)

TASK: [add infra repo] ********
skipping: [209.132.184.162]

TASK: [install cloud-utils] *********

ok: [209.132.184.162]

TASK: [growpart the second partition (/) to full size] ******
skipping: [209.132.184.162]

TASK: [reboot the box] ********
fatal: [209.132.184.162] => error while evaluating conditional: {% if ${growpart.rc} == 0 %} True {% else %} False {% endif %}

FATAL: all hosts have already failed -- aborting

PLAY RECAP **********
logs written to: /var/log/ansible/elections-dev.cloud.fedoraproject.org/2013/10/18/16.16.59
to retry, use: --limit @/root/elections-dev.cloud.fedoraproject.org.retry

209.132.184.162 : ok=2 changed=0 unreachable=1 failed=0
}}}

OK just one email then. But what about output. Is it ok or would you like to see something else in email?

Well, in the "normal" case, all the playbooks would run and there would be 0 changes.

So, I would prefer the email to just note "CHANGED" or "FAILED" tasks. Just ignore the OK ones.

One other thing in the mix is that we already have logging of all ansible actions that mails a report out daily of non OK stuff. So, we could just leverage that and not have this generate any output itself...

  1. How do you recognize "CHANGED" and "FAILED" tasks?

Do you see something like

TASK: [description of task]

changed : [host] => description of change

TASK : [description of task]

failed : [host] => error description

in output?

Which "tags" specify changed and failed tasks?


  1. Could you explain how could we leverage logging?
    Are you suggesting that script should just run command

'''ansible-playbook playbook.yml --check --diff'''.

This command as seen in above output logs things. And logging then deals with emailing non ok stuff.

So, in our ansible repo under callback_plugins we have a logdetail.py. This gets called from any ansible actions.

It writes a log and keeps it and sends out a daily email with changed/failed things. We likely need to add it to our upcoming ansible documentation.

I was suggesting that we leverage this by simply running all of them via cron, but then letting the daily log handle mailing logs out about it.

So, it would simplfy the cron job down to simply figuring out what playbooks to run.

I will see what I can do. I will have a look at that script, rethink the situation and hopefully be back with a solution.

Thanks janeznemanic! :)

Did some investigation and here is how I see things.

If you run ''ansible-playbook playbookname.yml --check --diff'', then plugin logdetail.py gets called that does all the necessary logging. I think that we just need to add a few lines of code to logdetail.py that would make sure that if playbook is run in a dry-run mode this fact gets recorded in log. In my opinion method playbook_on_start() would have to be defined so that in case of dry-run mode this gets logged. Besides that we would need a simple script that would run all playbooks as cron. I think it is possible to use what I have posted. And we don't have to bother with mailing since that is already configured.

Kevin is that what your thinking?

Yep. Exactly.

I am thinking we can just run it normally too... if everything is working as it should be, we shouldn't get any changes from running playbooks like this. We could adjust that in the cron script though.

Cron script to run playbooks for all hosts.
run_ansible-playbook_cron.py

Guys I would really appreciate some feedback on this ticket. I am not sure but I think I need to add cron job to modules/ansible/manifests/init.pp. Help needed.

I've had this open in a tab to look at for a few days... I just haven't gotten to it yet. ;)

I will try today. Please be patient.

Take your time Kevin. No hurry. I just wanted to remind others.

ok, finally got time to poke at this. ;)

It looks pretty good for a first cut... however, not sure at this point that we want to land it right before freeze. It's going to generate a lot of noise at first I fear.

I'm going to try and work on running it manually and cleaning things up, and we can hook it up after f20 is out the door. ;)

Thanks again for working on it.

No problem Kevin. I enjoyed it. Let me know if I need to change anything.

ok, finally commited this. ;)

We should start getting reports tomorrow.

Thanks again.

Login to comment on this ticket.

Metadata