The combined httpd logs on log01 need to be compressed around once a year to maintain space for new logs. Currently the disk is at 98% with most of the data being logs from 2023 and 2024. The following should be able to compress things down a lot. [I find using gz saves almost as much as xz and is quicker and easier to grep through later]
find /mnt/fedora_stats/combined-http/2023 -type f -name '*.log' | xargs gzip -v find /mnt/fedora_stats/combined-http/2024/0[12345] -type f -name '*.log' | xargs gzip -v
I ran it for 2023.
Long term we should probably have a monthly cron job that does month-2 ... or maybe month-1 on the 15th.
Also might be worth seeing if bzip2 saves enough more that it's worth using.
With easier method, we can simply run compress for all *.log files except those under latest directory .
find /mnt/fedora_stats/combined-http ! -path /mnt/fedora_stats/combined-http/latest -type f -name "*.log" -print | xargs -n1 xz -1 > /dev/null
PR created https://pagure.io/fedora-infra/ansible/pull-request/2132
I ended up doing 1-5 months for 2024 too, current status:
% du -sh /mnt/fedora_stats/combined-http 5.6T /mnt/fedora_stats/combined-http
...however the netapps have a .snapshot dir. which has a bunch of the uncompressed files in it, so it still looks bad.
Metadata Update from @zlopez: - Issue assigned to james - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: high-gain, low-trouble, ops
@seddik I believe that is a good start to what is needed, but that further thought and review would be needed. The find script would run into a couple of problems: 1. The latest directory is just symlinks to the last file compressed. The result of that find would be to break all the symlinks as they would point to now 'non-existant' files. 2. Another problem is also that it will take a very long time to traverse through all the directories under /mnt/fedora_stats/combined-http/ to find the few files which still need to be compressed. 3. All of the log files are compressed as gzip versus xz. This was mainly due to the fact that xz takes a very long time to compress these logs and was using enough CPU/memory on a previous incarnation of log01 to cause problems. gzip compresses to 80-90% of the xz size but takes a lot shorter time and less CPU/memory.
/mnt/fedora_stats/combined-http/
We could change 3, but we would need to uncompress and recompress the logs (or you can end up with .gz.xz files or some other problem). Would it be possible to write a script which figures out what the year and month-1 is (so a run in January 2025 would compress December 2024) and does something like find /mnt/fedora_stats/combined-http/${YEAR}/${MONTH} -type f -name "*.log" -print | xargs -n1 ${COMPRESSOR} -v
find /mnt/fedora_stats/combined-http/${YEAR}/${MONTH} -type f -name "*.log" -print | xargs -n1 ${COMPRESSOR} -v
where COMPRESSOR could be gzip, xz, bzip2, or even good old compress :laughing: ? I think it will need a script which 'knows' the last month and compresses just those logs under there. That could be put in
@james I believe that if the .snapshot becomes a problem, it can be cleared by @kevin when he gets back..
I believe snapshot is auto clearing things out as space is needed, and will clear them out over time as well ... so going to leave it unless it looks like we need to do something.
For previous (or two) month(s) GNU date can do that: $(date --date='2 Month ago' +%Y/%m)
I believe countme would be fine if we compressed the logs immediately, but keeping 1-2 months uncompressed seems like not a big deal and marginally safer for everything.
The .snapshot dir is a way the netapp can expose snapshots to the user. It's read-only, you shouldn't be able to do anything except read the existing snapshots.
They should roll out over time... but if you need specific ones deleted or whatever we can do that via the cli.
I also can just disable it showing that for that volume. I suspect we enabled it sometime when we wanted to pick through old snapshots to get old data restored... we likely don't need it day to day.
Hi Kevin, sorry I may have confused things. What I thought James was talking about is that when doing a very large compression of files, the snapshots will still contain links to all the original files so there is no savings in disk space (and in fact we used up even more). This normally goes away as older snapshots get removed, but when we are at 98% or higher, I thought a netapp admin sometimes needed to manually remove a snapshot so the space would be clearer.
I'm not 100% certain but from what I saw over the weekend when the netapp gets to 99% it starts to drop the old files from the snapshots so it doesn't run out of space due to snapshots. So I'm pretty sure it's just a cosmetic thing where df will show 98% usage for five weeks (oldest current snapshot is a weekly from 2024-06-16) and then suddenly 35% usage when the last snapshot linking to the old uncompressed files goes away.
tl;dr +1 for doing nothing and getting a UI present when flock is over.
I am pretty sure it does not delete any snapshots. ;)
What it does do is a lot more confusing, but makes sense from a 'keep everything online as long as possible':
When a volume gets really close to full, it just automatically grows it so it's not full anymore. Of course it will reach a limit where it's N* the desired size and stop doing that, but in general it always just grows if it can.
Look for example at this:
https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=log01.iad2.fedoraproject.org;plugin=df;plugin_instance=mnt-fedora_stats;type=df_complex;begin=-604800
You can see the volume grew itself when it was close to full...
The snapshots will age out and eventually be deleted, but it keeps a fair number for a while.
Ahh, well in that case we might want to manually delete a bunch of old snapshots because the uncompressed logs use about 1TB a month (so it'll grow by at least that amount until the old files age out). Maybe talk about it in the regular meeting on Thursday? Unless you want to just do it.
Heres the snapshots:
fedora_stats weekly_sun_0055.2024-06-16_0055 33.48GB 0% 1% weekly_sun_0055.2024-06-23_0055 45.54GB 0% 1% daily_0055.2024-06-26_0055 271.7MB 0% 0% daily_0055.2024-06-27_0055 250.8MB 0% 0% daily_0055.2024-06-28_0055 40.25GB 0% 1% daily_0055.2024-06-29_0055 236.6MB 0% 0% daily_0055.2024-06-30_0055 404KB 0% 0% weekly_sun_0055.2024-06-30_0055 260.8MB 0% 0% daily_0055.2024-07-01_0055 280.2MB 0% 0% daily_0055.2024-07-02_0055 261.2MB 0% 0% daily_0055.2024-07-03_0055 39.07GB 0% 1% daily_0055.2024-07-04_0055 172.8MB 0% 0% daily_0055.2024-07-05_0055 209.9GB 1% 3% daily_0055.2024-07-06_0055 3.69TB 26% 39% daily_0055.2024-07-07_0055 1.20MB 0% 0% weekly_sun_0055.2024-07-07_0055 4.07TB 29% 41% daily_0055.2024-07-08_0055 230.8MB 0% 0% 2hour_even_55.2024-07-09_0055 428KB 0% 0% daily_0055.2024-07-09_0055 500KB 0% 0% 2hour_even_55.2024-07-09_0255 1.28MB 0% 0% 2hour_even_55.2024-07-09_0455 214.9MB 0% 0% 2hour_even_55.2024-07-09_0655 15.94MB 0% 0% 2hour_even_55.2024-07-09_0855 512KB 0% 0% 2hour_even_55.2024-07-09_1055 512KB 0% 0% 2hour_even_55.2024-07-09_1255 504KB 0% 0% 2hour_even_55.2024-07-09_1455 488KB 0% 0% 2hour_even_55.2024-07-09_1655 480KB 0% 0% 2hour_even_55.2024-07-09_1855 480KB 0% 0% 2hour_even_55.2024-07-09_2055 468KB 0% 0% 2hour_even_55.2024-07-09_2255 468KB 0% 0%
So, probibly have to delete until weekly_sun_0055.2024-07-07_0055 ? but sure we can discuss.
ok, I deleted a bunch of snapshots and now:
ntap-iad2-c02-fedora01-nfs01a:/fedora_stats 8.7T 6.4T 2.4T 74% /mnt/fedora_stats
Should we close this? Or is there more to do?
There is still the PR open.
FWIW I opened a new PR with the changes we'd talked about:
https://pagure.io/fedora-infra/ansible/pull-request/2149
Second PR got merged and rolled out.
Metadata Update from @james: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.