Normally the service updates on Thursday fo the week for the week previous data. Today the data did not update and I am not sure what is 'broken' in any of the databases. The CentOS database did update so it is probably an issue with the base DB for /var/lib/countme/totals.db or the /var/lib/countme/raw.db
@mattdm I am not sure what is broken or what can be done to fix it. I am putting this ticket in so it can be tracked.
Update on the problem:
Usually the logs are updated in totals.db and totals.csv for the previous week on a Thursday. However this week, they were not updated until the Friday run. I don't see anything in the logs to say that dates were missed or anything else which might have caused the delay by 24 hours.
Metadata Update from @humaton: - Issue tagged with: low-gain, medium-trouble, ops
Huh, weird. Curious to see what happens this week!
So, what action should we take here? Or it's working, but behaving differently?
@kevin Seems like something took a really, really long time to process? Might be a symptom of something else wrong.
I can close this and if the problem occurs again on Thursday reopen it for a developer to look at it? I went looking at it a bit and could not find an 'ops' side which would have caused the problem.
Metadata Update from @smooge: - Issue untagged with: ops - Issue tagged with: dev
Sounds reasonable. Also it might be something that could be looked at by the folks working on it this next quarter.
Metadata Update from @smooge: - Issue close_status updated to: Initiative Worthy - Issue status updated to: Closed (was: Open)
It updated today, but... brokenly:
<img alt="2023-03-26-fedora_updates_systems-timeseries-line-release.png" src="/fedora-infrastructure/issue/raw/files/798d50ecc2a989d18c86a7bffa747837701d9a01e5c531267520460144baa94a-2023-03-26-fedora_updates_systems-timeseries-line-release.png" />
Metadata Update from @smooge: - Issue status updated to: Open (was: Closed)
Reopened. I expect that there are multiple issues going on here which will need to be dealt with. My guesses are the following: a. proxy logs did not get synced? b. raw.db has a problem and the last month will need to be rerun c. something new.
I am not able to help on this currently.
OK looked at the logs in /mnt/fedora_stats/combined-http/2023/03/*/mirrors.fedoraproject.org-access.log and they are all there and similar sizes. So I am going with b or c.
Thanks Smooge. I'll escalate this with CPE.
I can start looking at this pretty soon, but I'm not sure what I need access to ... and I don't even seem to be able to assign my self to this ticket :).
you will need access to systems: - bastion01 - batcave01 - log01 pagure: https://pagure.io/mirrors-countme https://pagure.io/fedora-infra/ansible/blob/main/f/roles/web-data-analysis
I've added you permissions now ,you should be able to assign/etc.
The problem as far as I can tell is some sort of 'problem' in the raw.db file which creeps in and then causes the second job which updates the totals.db to fail. I normally have to take a 'fresh' date and create a new raw.db by
su -s /bin/bash countme cd /var/lib/countme/ mv raw.db raw.db-broke.$(date -I) cp totals.db totals.db-broke.$(date -I) rawdb="/var/lib/countme/raw.db" totsdb="/var/lib/countme/totals.db" totscsv="/var/lib/countme/totals.csv" for year in 2023; do for month in 1 2 3 4; do for day in $(seq -w 31); do logfile="${year}/${month}/${day}/mirrors.fedoraproject.org-access.log" if [[ -f ${logfile} ]]; then parse-access-log.py --progress --sqlite ${rawdb} ${logfile} fi done done done bash countme-update-totals.sh --rawdb ${rawdb} --totals-db ${totsdb} --totals-csv ${totscsv} --progress
That said, long term is to look at the existing 9.4GB raw.db file and figure out why it is broken.
Metadata Update from @phsmoura: - Issue priority set to: Waiting on Assignee (was: Needs Review)
edited the text as I don't remember if I used a fresh one or used the old one over.
Please retag this as something other than "low-gain".
As discussed elsewhere, infinitely-growing raw.db could / should be refactored away.
Metadata Update from @kevin: - Issue untagged with: low-gain - Issue tagged with: high-gain
So I've now finished the reimport for this year, not sure how to check the graphs etc.
Ok so what I did in the past was do a fresh import to a new raw.db and then did a
countme-totals.py --update-from /var/lib/countme/raw.db --csv-dump /var/lib/countme/totals1.csv --progress /var/lib/countme/totals1.db
to compare what is saw with the old /var/lib/countme/totals.db and /varlib/countme/totals.csv. If the code I was hacking 'worked' then the totals1.csv should have the results for the bad week fixed.
/var/lib/countme/totals.db
/varlib/countme/totals.csv
Since it was a while since I had done this, i wanted to test to make sure it was still valid. I started this this morning as an example for people to look at later.
Ok so there is no difference in the data for the week of 2023-03-20 which was the problematic week before.
===== Fedora Base Stats ===== 2023-03-13 fedora-3. x86_64 398354 2023-03-13 fedora-3. aarch64 10825 2023-03-13 fedora-3. ppc64le 1759 2023-03-13 fedora-3. s390x 74 ===== Fedora Base Stats ===== 2023-03-20 fedora-3. x86_64 167674 2023-03-20 fedora-3. aarch64 6936 2023-03-20 fedora-3. ppc64le 1264 2023-03-20 fedora-3. s390x 10 ===== Fedora Base Stats ===== 2023-03-27 fedora-3. x86_64 401520 2023-03-27 fedora-3. aarch64 11532 2023-03-27 fedora-3. ppc64le 1694 2023-03-27 fedora-3. s390x 78
So something is going on for that 'week' of data. The logs for the month all look the same size and looking at the centos.org csv files on the equivalent time does not see a dip.
OK the 'raw' countme for that week would look to have been higher:
$ for i in */mirrors.fedoraproject.org-access.log; do echo -n $i": " > grep -c countme= $i > done 01/mirrors.fedoraproject.org-access.log: 693529 02/mirrors.fedoraproject.org-access.log: 609849 03/mirrors.fedoraproject.org-access.log: 560413 04/mirrors.fedoraproject.org-access.log: 482537 05/mirrors.fedoraproject.org-access.log: 388629 06/mirrors.fedoraproject.org-access.log: 364415 07/mirrors.fedoraproject.org-access.log: 3299640 08/mirrors.fedoraproject.org-access.log: 670951 09/mirrors.fedoraproject.org-access.log: 570565 10/mirrors.fedoraproject.org-access.log: 537646 11/mirrors.fedoraproject.org-access.log: 488814 12/mirrors.fedoraproject.org-access.log: 389636 13/mirrors.fedoraproject.org-access.log: 376449 14/mirrors.fedoraproject.org-access.log: 3331590 15/mirrors.fedoraproject.org-access.log: 692252 16/mirrors.fedoraproject.org-access.log: 583591 17/mirrors.fedoraproject.org-access.log: 573738 18/mirrors.fedoraproject.org-access.log: 518574 19/mirrors.fedoraproject.org-access.log: 412978 20/mirrors.fedoraproject.org-access.log: 392952 21/mirrors.fedoraproject.org-access.log: 3363989 22/mirrors.fedoraproject.org-access.log: 708976 23/mirrors.fedoraproject.org-access.log: 635738 24/mirrors.fedoraproject.org-access.log: 583137 25/mirrors.fedoraproject.org-access.log: 546008 26/mirrors.fedoraproject.org-access.log: 412433 27/mirrors.fedoraproject.org-access.log: 389356 28/mirrors.fedoraproject.org-access.log: 3382676 29/mirrors.fedoraproject.org-access.log: 719028 30/mirrors.fedoraproject.org-access.log: 558641 31/mirrors.fedoraproject.org-access.log: 488492
[The logs cover the day before so the one labeled 07 is really covering the dates for 2023-03-06. ] The numbers per week are fairly all the same so something else is happening during that week in the raw.db. Sadly what that is.. I don't know.
Just to double check I did:
for i in */mirrors.fedoraproject.org-access.log; do echo -n $i": " grep countme= $i | grep repo=fedora-3 | grep -c arch=x86_64 done
And the result was:
13/mirrors.fedoraproject.org-access.log: 33985 14/mirrors.fedoraproject.org-access.log: 275058 15/mirrors.fedoraproject.org-access.log: 85379 16/mirrors.fedoraproject.org-access.log: 65477 17/mirrors.fedoraproject.org-access.log: 54937 18/mirrors.fedoraproject.org-access.log: 47494 19/mirrors.fedoraproject.org-access.log: 38657 20/mirrors.fedoraproject.org-access.log: 35773 21/mirrors.fedoraproject.org-access.log: 275680 22/mirrors.fedoraproject.org-access.log: 85915 23/mirrors.fedoraproject.org-access.log: 66847 24/mirrors.fedoraproject.org-access.log: 55650 25/mirrors.fedoraproject.org-access.log: 53508 26/mirrors.fedoraproject.org-access.log: 38929
Which is 600,987 for the 13th and 612,302 for the 20th.
And the triple check:
time_t $(( 1581292800+(604800*162) )) Sun Mar 19 20:00:00 2023 Mon Mar 20 00:00:00 2023 GMT RAW.db: sqlite> select count(*) from countme_raw where timestamp >= (1581292800+(604800*161)) AND timestamp <= (1581292800+(604800*162)); 6483599 sqlite> select count(*) from countme_raw where timestamp >= (1581292800+(604800*162)) AND timestamp <= (1581292800+(604800*163)); 3262313 sqlite> select count(*) from countme_raw where timestamp >= (1581292800+(604800*163)) AND timestamp <= (1581292800+(604800*164)); 6197846
Ok, I think I've found the problem: https://pagure.io/mirrors-countme/pull-request/60
New stats.
===== Fedora Base Stats ===== 2023-03-13 fedora-3. x86_64 397174 2023-03-13 fedora-3. aarch64 10798 2023-03-13 fedora-3. ppc64le 1759 2023-03-13 fedora-3. s390x 74 ===== Fedora Base Stats ===== 2023-03-20 fedora-3. x86_64 402814 2023-03-20 fedora-3. aarch64 11604 2023-03-20 fedora-3. ppc64le 1703 2023-03-20 fedora-3. s390x 72
Current week's run seems to have the missing data, but is instead missing everything before 2023. :)
I manually merged the old data back in to totals.csv, and did the copy to the public locations.
I assume raw.db doesn't need all the old data (even though we are copying that file publicly too).
Oh -- I don't use or look at totals.csv at all. I use totals.db.
Anyway -- I see you have fixed it. Thanks @james!
Great. Lets close this then?
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.