It happens from time to time that something happens on some copr VM, and the only explanation is somebody done something manually or someone run the playbook.
Any manual action is hard to track, but it would be awesome to at least have a chance to ask the person who executed the playbook on batcave to ask why it was run.
Not so important example here, but something/someone restarted copr-backend.target a day ago on copr-be.aws for some reason (I was measuring how long the service is able to run without restart, and how much memory it eats). This was probably done by our playbook because it changed the backend configuration file (restart handler notified).
You can look at datagrepper https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.ansible.playbook.start&delta=127800
Metadata Update from @cverna: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Thank you for the reference, so this is the reason:
ansible.playbook.start kevin started an ansible run of master.yml JSON 2 days ago - 2020-06-30 00:03:29 UTC
So nothing really specific to copr, and it is good to know there's such wide-range playbook.
So, you should assume at ALL TIMES:
This is why it's important to make sure playbooks are idempotent. That is, if you run the playbook once it puts everything in the desired state. If you run the playbook again after that it does nothing at all (because the machine is in the desired state).
One of the templates must have changed in order to cause the restart.
If you need to test something like this, please comment it out (with a commit message/note) and it would be skipped.
I normally exclude copr* from master playbook runs, but I guess I didn't in that case... sorry if it messed up your testing.
Absolutely. We always expected that this can happen, but historically we were curious who did something and why (and we concluded it was just the security cron doing package updates for CVEs that time).
I've heard that we should be running the Ansible playbooks on a daily basis, automatically - but with the --check option. This is something I was not able to confirm :-) so I suppose this is not happening.
But so far I didn't know that with master.yml the playbooks may be run pretty often. This is not a problem at all! This was just news for me and I just didn't notice something like this happening before.
master.yml
The thing here was that I expected there were some problems with Copr, so someone did some action to fix something - and I wanted to know why.
Agreed, probably whitespace change. I am sometimes a bit lazy to run playbooks for very small changes -- so I just push the change to ansible.git and then do the change manually in production ... This is the probably the mistake that caused the respin. Sorry for noise.
Please don't exclude copr here just because you are afraid that it will break something. It shouldn't. That playbook should just work no matter what. Btw., now even more than before --- even this issue was originally triggered by my curiosity about stability; because I'm trying to experiment how long we can live without manually touching the copr VMs.
Sure, good to always know why...
It was before the move, I think it might be broken since then. Basically it runs over all playbooks under ansible/playbooks/ with --check --diff --check runs dont show up in datagrepper. It then sends a email with all the changes out after it runs. I've long wished that someday we could get to a point where it would run and have no output. ;) Alas, thats hard to do.
Fair enough. I have been trying to get it to complete over our new infrastructure in the new datacenter. It's showed problems with a number of things. (not copr that I am aware of)
The thing here was that I expected there were some problems with Copr, so someone did some action to fix something - and I wanted to know why. One of the templates must have changed in order to cause the restart. Agreed, probably whitespace change. I am sometimes a bit lazy to run playbooks for very small changes -- so I just push the change to ansible.git and then do the change manually in production ... This is the probably the mistake that caused the respin. Sorry for noise. I normally exclude copr* from master playbook runs, but I guess I didn't in that case... sorry if it messed up your testing. Please don't exclude copr here just because you are afraid that it will break something. It shouldn't. That playbook should just work no matter what. Btw., now even more than before --- even this issue was originally triggered by my curiosity about stability; because I'm trying to experiment how long we can live without manually touching the copr
Please don't exclude copr here just because you are afraid that it will break something. It shouldn't. That playbook should just work no matter what. Btw., now even more than before --- even this issue was originally triggered by my curiosity about stability; because I'm trying to experiment how long we can live without manually touching the copr
Ok, sounds good. I don't know off hand... are the copr playbooks idempotent? If you run them all now will it change 0 items?
Yes. They should be.
Login to comment on this ticket.