#1 sqlite3.IntegrityError: NOT NULL constraint failed: countme_raw.repo_arch
Closed 3 years ago by wwoods. Opened 3 years ago by wwoods.

Something in the Fedora mirror logs from Sep 16 caused parse-access-log.py to crash:

[wwoods@log01 ~][PROD-IAD2]$ ./countme-update.sh --checkoutdir ~/mirrors-countme
log 1/20, date=13/Sep/2020:   0%|                                                              | 12.0k/6.44G [00:00<40:52, 2.63Mb/s]                                    
log 2/20, date=14/Sep/2020: 100%|██████████████████████████████████████████████████████████████| 6.56G/6.56G [02:58<00:00, 36.6Mb/s]                                    
log 3/20, date=15/Sep/2020: 100%|██████████████████████████████████████████████████████████████| 6.71G/6.71G [03:00<00:00, 37.1Mb/s]                                    
log 4/20, date=16/Sep/2020:  41%|█████████████████████████▋                                    | 2.80G/6.78G [01:15<01:46, 37.3Mb/s]Traceback (most recent call last):  
  File "/home/fedora/wwoods/mirrors-countme/parse-access-log.py", line 122, in <module>                                                                                 
    main()
  File "/home/fedora/wwoods/mirrors-countme/parse-access-log.py", line 114, in main
    args.writer.write_items(match_iter)
  File "/home/fedora/wwoods/mirrors-countme/countme/__init__.py", line 319, in write_items                                                                              
    self._con.executemany(self._insert_item, items)
sqlite3.IntegrityError: NOT NULL constraint failed: countme_raw.repo_arch

This means we tried to insert an item into the database that had a NULL repo_arch, which is required by the database schema - because it should always be present for well-formed requests from libdnf.

This means we either matched a malformed log entry - a countme=N line that's missing the arch=XXX item - or we mangled/lost the repo_arch item during parsing.

The countme log matcher/parser doesn't require arch=XXX item in the URL's query string, but it does only match log lines where the status is 200 or 302. In theory, those should only be returned if the mirrormanager server accepts the query - and the server is supposed to reject a URL that has no arch. But perhaps this assumption is invalid.

I'm currently trying to find the exact log line that caused the error.

(NOTE: this is a little tricky, because It's unclear from the output which exact file this error corresponds to. This is partly because of how the log files are named - log entries for 16/Sep/2020 are found in /mnt/fedora_stats/combined-http/2020/09/17/mirrors.fedoraproject.org-access.log. It would be helpful for parse-access-log.py to give the exact file / line number corresponding to the error!)

I'm also unsure whether the rawdb is now in an inconsistent state, with part of 16/Sep/2020 logs inserted (and part missing) - which would cause future runs to skip this day, thus dropping a bunch of data. Will investigate that next; might be a separate issue.


Ah, found it in /mnt/fedora_stats/combined-http/2020/09/17/mirrors.fedoraproject.org-access.log.

Doing grep 'countme=' $log | grep -v 'arch=' gives:

240.159.140.173 - - [16/Sep/2020:09:58:29 +0000] "GET /metalink?repo=fedora-modular-32&x86_64&country=nl&countme=1 HTTP/2.0" 200 304 "-" "libdnf (Fedora 32; container; Linux.x86_64)"
240.159.140.173 - - [16/Sep/2020:09:59:30 +0000] "GET /metalink?repo=fedora-modular-32&x86_64&countme=1 HTTP/2.0" 200 304 "-" "libdnf (Fedora 32; container; Linux.x86_64)"

(actual IP address redacted)

The returned result for that URL is:

<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/" type="dynamic" pubdate="Mon, 05 Oct 2020 17:14:55 GMT" generator="mirrormanager" xmlns:mm0="http://fedorahosted.org/mirrormanager">
<!-- # either path=, or repo= and arch= must be specified
-->
</metalink>

So, the assumption that mirrormanager won't return a 200 for an error/invalid request (i.e. one without a arch= is bad. The question is: should we ignore valid countme requests for misconfigured/missing repos, or should we change the database schema to accommodate them?

In terms of overall statistics, it's currently irrelevant - this is the first time this has happened in 7 months of logging - but ignoring it means we don't know how often it's happening, which seems like a useful data point.

Either way... crashing is definitely the wrong answer, so that needs to be fixed.

Crash fixed in commit 04d3a2f by dropping countme requests without arch=XXX or repo=XXX. Tested with the above logs and all seems fine.

Metadata Update from @wwoods:
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata