#2 Deploy script
Closed 3 years ago by mattia. Opened 4 years ago by mattia.

So we have 2 choice for deployment, currently the static page is built on the sundries boxes then it is synced on the proxies where httpd serves the page.

We can :
1 . package the new app as an rpm and build it in the .infra tag (so that we don't have to get it to proper Fedora). Then install that rpm on the sundries
2. install the dependencies of the new app on the sundries and do a git clone of the repo on the box.

I think the best option would be to deploy by rpm and run directly on sundries. The following steps are needed:

  • draft a new release
  • package as rpm and create a review ticket
  • modify ansible playbook to deploy the new script

Optionally:

  • create a new API token in bugzilla and use that instead of username/password authentication

Ah, I've just realized that I don't have rights to create a new release...

Ah, I've just realized that I don't have rights to create a new release...

You should have the rights now.

Also you don't necessarily need a review ticket. We can just build and tag the rpm in the infra repo.

Also I think the sundries are RHEL7 boxes, so using the infra repo should be easier

Also you don't necessarily need a review ticket. We can just build and tag the rpm in the infra repo.
Also I think the sundries are RHEL7 boxes, so using the infra repo should be easier

I have created a release and packaged it, I've set up a test repository on COPR: https://copr.fedorainfracloud.org/coprs/mattia/review_stats/

I can't find any information about an infra repo: how can I build and tag the package into it?

Also you don't necessarily need a review ticket. We can just build and tag the rpm in the infra repo.
Also I think the sundries are RHEL7 boxes, so using the infra repo should be easier

I have created a release and packaged it, I've set up a test repository on COPR: https://copr.fedorainfracloud.org/coprs/mattia/review_stats/
I can't find any information about an infra repo: how can I build and tag the package into it?

You can build directly there, I think only a few people from the Infra team can build against it. Basically it is just a Koji target (for example epel7-infra https://koji.fedoraproject.org/koji/buildtargetinfo?targetID=185) that we use to build package against and then generate a package repo that we use on the boxes in the infra (https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/files/common/rhel-infra-tags.repo).

If you can do a scratch build of your package in koji and create a infra ticket with the link to the build asking for the package to be built against the epel7-infra target that should do it.

Hope that makes things more clear.

I have tried to deploy this in staging [0][1][2] but it seems that the app requires python-systemd which is not available for python3 on rhel7.

Do we really need bindings to systemd ? It might be easier to just drop that requirement or make it optional.

[0] - https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=ba005c4364522007726a0fb80f954b21d9367830
[1] - https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=38904071a3a05bf16dd98a6e1293ea0da8ddff08
[2] - https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=41221aa4f084b7d4aca079ebd60852049f3eabfb

I have tried to deploy this in staging [0][1][2] but it seems that the app requires python-systemd which is not available for python3 on rhel7.
Do we really need bindings to systemd ? It might be easier to just drop that requirement or make it optional.

To drop python-systemd requirement I will need to modify the script to make logging to journal optional.
If we decide so, we will loose the opportunity to log run information (and errors) to journal. Is there any other method to inform infrastructure that something wrong happened while running a script?
Currently is also possible to send emails when the script encounter an error, but it can become annoying since it sends one email every log message sent... or we can continue to have cron sending emails just like it works now.

I can try to drop python-systemd and use the method described here, but it refers to python2 and it cannot be tested locally...

I have tried to deploy this in staging [0][1][2] but it seems that the app requires python-systemd which is not available for python3 on rhel7.
Do we really need bindings to systemd ? It might be easier to just drop that requirement or make it optional.

To drop python-systemd requirement I will need to modify the script to make logging to journal optional.
If we decide so, we will loose the opportunity to log run information (and errors) to journal. Is there any other method to inform infrastructure that something wrong happened while running a script?
Currently is also possible to send emails when the script encounter an error, but it can become annoying since it sends one email every log message sent... or we can continue to have cron sending emails just like it works now.

Yes I think the current way is that the cronjob will send emails only on errors. So if everything runs correctly no emails, otherwise we get an email with the errors.

I can try to drop python-systemd and use the method described here, but it refers to python2 and it cannot be tested locally...

Yeah I am not how up to date this is TBH.

I made logging to python-systemd optional by -l flag and rebuilt the package as version 5.0.3:
https://koji.fedoraproject.org/koji/taskinfo?taskID=42192505

I'm going to open another ticket to have it tagged in infra-repo: https://pagure.io/fedora-infrastructure/issue/8723

@mattia I think you need to remove https://pagure.io/Fedora-Infra/review_stats/blob/master/f/setup.py#_23

Otherwise it will still try to find the dependency

@mattia I think you need to remove https://pagure.io/Fedora-Infra/review_stats/blob/master/f/setup.py#_23
Otherwise it will still try to find the dependency

I've put a conditional in specfile:

# Disable python-systemd dependency on EPEL7
%if 0%{?el7}
sed -i "/'systemd-python',/d" ./setup.py
%endif

that will remove that line on EPEL7 in the %setup step, so that the same code/specfile can be eventually be used in future on systems with python3-systemd.

It seems that the Openshift deployment is failing because the /etc/review-stats/config.cfg can't be found. Maybe it only needs to the ansible playbook to be runned again?

It seems that the Openshift deployment is failing because the /etc/review-stats/config.cfg can't be found. Maybe it only needs to the ansible playbook to be runned again?

Just ran the playbook again, let see if that helps :-)

Ok The issue was with the Dockerfile entrypoint [0], now the cronjob starts but fails because it does not have write permission to /tmp/review-stats.

This is only temporary and now we need to request a nfs persistent volume in OpenShift so that we can store the output of the cronjob there, and then mount that nfs directory on the sundries to be served.

I ll file the infra ticket for that.

[0] - https://pagure.io/Fedora-Infra/review_stats/c/165b4d43e8602c3f5680df2f4c2a5d0d4d4a4c9d?branch=master

@mattia ah cool, I ll update that :)

Ok so the cron job runs and the static files are generated on the sundries box. Now the cron that rsync the files from the sundries to the proxies is failing because of permission issues.

I ll investage that further monday :)

Thanks.
As far as I understand, we have some leftovers of the old deployment method that conflicts with the new. And the sync script moves files with root:root permission instead of apache:apache.
I made two patches, see if they're correct (I also don't think the sync script is needed on prod, because files are generated directly on sundries by the old script, but I left it for now).

0001-review-stats-remove-leftovers-of-the-old-deployment-.patch

0001-review-stats-sync-files-with-correct-permission-on-s.patch

Forgive me, I'm totally confused on how things are working.
From what I understand, the review-stat openshift app is run (for staging) by roles/openshift-apps/review-stats; here the app image is built, the script is run and put the output in the openshift persistent volume. Then what? Where's the cron script that sync the persistent volume to sundries?

So, everything related to staging should be removed from roles/review-stats/build. (?)

Looking in /srv/web/review-stats on sundries01.stg.phx2.fedoraproject.org I can see pages from the new script updated every hour, but I suppose these aren't coming from the openshift app, but from the old deployment... however, https://stg.fedoraproject.org/PackageReviewStatus/ is not updates since 21/3 so I don't understand from where those pages come from.

Forgive me, I'm totally confused on how things are working.

Sorry I meant to look at this during this week. But I have been busy with other stuff :(

From what I understand, the review-stat openshift app is run (for staging) by roles/openshift-apps/review-stats; here the app image is built, the script is run and put the output in the openshift persistent volume. Then what? Where's the cron script that sync the persistent volume to sundries?

So the persistent volume in OpenShift is an nfs shared drive, this is the same drive that is mounted on the sundries so we don't need a cron script for that.

So, everything related to staging should be removed from roles/review-stats/build. (?)

I wanted to have everything working in staging before moving to prod and then delete the old role.

Looking in /srv/web/review-stats on sundries01.stg.phx2.fedoraproject.org I can see pages from the new script updated every hour, but I suppose these aren't coming from the openshift app, but from the old deployment... however, https://stg.fedoraproject.org/PackageReviewStatus/ is not updates since 21/3 so I don't understand from where those pages come from.

No these are coming from Openshift, we have a cron job that sync the files from the sundries to all of our proxies to be served by httpd. This is what is failing currently.

Does that help ?

The permissions of the file generated are too restricitive I think

-rw-------. 1 1000870000 root  77690 Apr  4 12:00 blockers.html
-rw-------. 1 1000870000 root 112654 Apr  4 12:00 hidden.html
-rw-------. 1 1000870000 root   6712 Apr  4 12:00 index.html
-rw-------. 1 1000870000 root 166446 Apr  4 12:00 in_progress.html
-rw-------. 1 1000870000 root  52537 Apr  4 12:00 needsponsor.html
-rw-------. 1 1000870000 root   4176 Apr  4 12:00 reviewable_epel.html
-rw-------. 1 1000870000 root 135971 Apr  4 12:00 reviewable.html
drwxr-xr-x. 2 1000870000 root   4096 Mar 21 18:20 static
-rw-------. 1 1000870000 root 445231 Apr  4 12:00 submitters.html
-rw-------. 1 1000870000 root   2828 Apr  4 12:00 trivial.html

I think instead of 600 the html should be at least 644. I am not sure if this is because of how these files are created by the app.

Thanks. I think this was caused by using shutil.copy2 which preserves file permissions while copying from the temporary file to destination. I've changed the script to use shutil.copyfile.

It is now working, I made the script to set permissions to 644 on the produced files. The static content is copied with proper permissions, so no need to make changes there.

It is now working, I made the script to set permissions to 644 on the produced files. The static content is copied with proper permissions, so no need to make changes there.

Cool I ll try to switch production later today or tomorrow :smile:

Thanks for working on this.

Ok I have deployed this to production and it working fine https://fedoraproject.org/PackageReviewStatus/.

I let you announce it on the different mailing list if you want :-)

Thank you! I will write something on devel soon.

One last question: is there any change to get access to os-master*, at least in staging, to be able to deploy and test any changes? Or to automatically redeploy the app whenever a new build is done?
I did few changes yesterday on staging branch and I was able to rebuild the image from os web UI, I thought it was sufficient for the new code to run, but I think I had got to redeploy it, also? In that case, I can't find anything in the web UI...

Thank you! I will write something on devel soon.

One last question: is there any change to get access to os-master*, at least in staging, to be able to deploy and test any changes? Or to automatically redeploy the app whenever a new build is done?

so yeah I think you should be able to rebuild from the web console, you can also login using the oc command tool see https://fedora-infra-docs.readthedocs.io/en/latest/dev-guide/openshift.html#command-line-interface

I did few changes yesterday on staging branch and I was able to rebuild the image from os web UI, I thought it was sufficient for the new code to run, but I think I had got to redeploy it, also? In that case, I can't find anything in the web UI...

The cluster was in a bad shape so I guess some of the failure you have experienced comes from that. In the UI the cron jobs are in the cluster console. You should be able to access it, clicking on the "Application Console" drop down menu at the top. Then under workload there will be info about the cronjobs.

$ oc rollout latest dc/build -n review-stats
Error from server (Forbidden): deploymentconfigs.apps.openshift.io "build" is forbidden: User "mattia" cannot update deploymentconfigs.apps.openshift.io in the namespace "review-stats": no RBAC policy matched

Should I request permissions by opening an infra ticket (at least for staging)? Or I will need to ask infra team to deploy the app every time I create a new build?

We don't give people rights to deploymentconfigs, because that means it could be changed out of ansible and mess things up.

That said, if you run the playbook it should do a rollout at the end... or if you do that a lot we can make a special playbook for it.

Thank you for all the support in getting this done! Everything works well, so I'm going to close the ticket.

Metadata Update from @mattia:
- Issue status updated to: Closed (was: Open)

3 years ago

We don't give people rights to deploymentconfigs, because that means it could be changed out of ansible and mess things up.
That said, if you run the playbook it should do a rollout at the end... or if you do that a lot we can make a special playbook for it.

Yeah also that deploymentconfig is not useful, I created to debug the permission issues we had with the file generated. I will delete it.

What runs in OpenShift is a cronjob every hours at HH:00 so all you need to do is trigger a new build (you can do that from the UI) and wait for the next run of the cronjob.

Hope that clarifies some things :smile:

Thank you for all the support in getting this done! Everything works well, so I'm going to close the ticket.

Thanks for working on this :fireworks:

Login to comment on this ticket.