Issue #9102: Do we audit log the playbook execution somewhere? - fedora-infrastructure

fedora-infrastructure

#9102 Do we audit log the playbook execution somewhere?

Closed: Fixed 3 years ago by cverna. Opened 3 years ago by praiskup.

It happens from time to time that something happens on some copr VM, and
the only explanation is somebody done something manually or someone run
the playbook.

Any manual action is hard to track, but it would be awesome to at least
have a chance to ask the person who executed the playbook on batcave to
ask why it was run.

Not so important example here, but something/someone restarted
copr-backend.target a day ago on copr-be.aws for some reason (I was
measuring how long the service is able to run without restart, and how
much memory it eats). This was probably done by our playbook because it
changed the backend configuration file (restart handler notified).

cverna commented 3 years ago

You can look at datagrepper https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.ansible.playbook.start&delta=127800

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

praiskup commented 3 years ago

Thank you for the reference, so this is the reason:

ansible.playbook.start kevin started an ansible run of master.yml JSON 2 days ago - 2020-06-30 00:03:29 UTC

So nothing really specific to copr, and it is good to know there's such wide-range playbook.

kevin commented 3 years ago

So, you should assume at ALL TIMES:

ansible git repo is the desired state of the application/vm's.
anyone can run the ansible playbook at any time

This is why it's important to make sure playbooks are idempotent. That is, if you run the playbook once it puts everything in the desired state. If you run the playbook again after that it does nothing at all (because the machine is in the desired state).

One of the templates must have changed in order to cause the restart.

If you need to test something like this, please comment it out (with a commit message/note) and it would be skipped.

I normally exclude copr* from master playbook runs, but I guess I didn't in that case... sorry if it messed up your testing.

praiskup commented 3 years ago

So, you should assume at ALL TIMES:

Absolutely. We always expected that this can happen, but historically we
were curious who did something and why (and we concluded it was just the
security cron doing package updates for CVEs that time).

I've heard that we should be running the Ansible playbooks on a daily
basis, automatically - but with the --check option. This is something I
was not able to confirm :-) so I suppose this is not happening.

But so far I didn't know that with master.yml the playbooks may be run
pretty often. This is not a problem at all! This was just news for me
and I just didn't notice something like this happening before.

The thing here was that I expected there were some problems with Copr,
so someone did some action to fix something - and I wanted to know why.

One of the templates must have changed in order to cause the restart.

Agreed, probably whitespace change. I am sometimes a bit lazy to run
playbooks for very small changes -- so I just push the change to
ansible.git and then do the change manually in production ... This is the
probably the mistake that caused the respin. Sorry for noise.

I normally exclude copr* from master playbook runs, but I guess I didn't in that case... sorry if it messed up your testing.

Please don't exclude copr here just because you are afraid that it
will break something. It shouldn't. That playbook should just work no
matter what. Btw., now even more than before --- even this issue was
originally triggered by my curiosity about stability; because I'm trying
to experiment how long we can live without manually touching the copr
VMs.

kevin commented 3 years ago

Absolutely. We always expected that this can happen, but historically we
were curious who did something and why (and we concluded it was just the
security cron doing package updates for CVEs that time).

Sure, good to always know why...

I've heard that we should be running the Ansible playbooks on a daily
basis, automatically - but with the --check option. This is something I
was not able to confirm :-) so I suppose this is not happening.

It was before the move, I think it might be broken since then. Basically it runs over all playbooks under ansible/playbooks/ with --check --diff
--check runs dont show up in datagrepper. It then sends a email with all the changes out after it runs.
I've long wished that someday we could get to a point where it would run and have no output. ;) Alas, thats hard to do.

But so far I didn't know that with master.yml the playbooks may be run
pretty often. This is not a problem at all! This was just news for me
and I just didn't notice something like this happening before.

Fair enough. I have been trying to get it to complete over our new infrastructure in the new datacenter. It's showed problems with a number of things.
(not copr that I am aware of)

The thing here was that I expected there were some problems with Copr,
so someone did some action to fix something - and I wanted to know why.

One of the templates must have changed in order to cause the restart.

Agreed, probably whitespace change. I am sometimes a bit lazy to run
playbooks for very small changes -- so I just push the change to
ansible.git and then do the change manually in production ... This is the
probably the mistake that caused the respin. Sorry for noise.

I normally exclude copr* from master playbook runs, but I guess I didn't in that case... sorry if it messed up your testing.

Please don't exclude copr here just because you are afraid that it
will break something. It shouldn't. That playbook should just work no
matter what. Btw., now even more than before --- even this issue was
originally triggered by my curiosity about stability; because I'm trying
to experiment how long we can live without manually touching the copr

Ok, sounds good. I don't know off hand... are the copr playbooks idempotent? If you run them all now will it change 0 items?

praiskup commented 3 years ago

Ok, sounds good. I don't know off hand... are the copr playbooks idempotent? If you run them all now will it change 0 items?

Yes. They should be.

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

Needs Review

fedora-infrastructure

Source Code

#9102 Do we audit log the playbook execution somewhere? Closed: Fixed 3 years ago by cverna. Opened 3 years ago by praiskup.

Metadata

#9102 Do we audit log the playbook execution somewhere?

Closed: Fixed 3 years ago by cverna. Opened 3 years ago by praiskup.