#7 rpmlint is hammering KDE infra (and probably other projects as well)
Closed: Fixed None Opened 7 years ago by kparal.

See this:
https://mail.kde.org/pipermail/kde-distro-packagers/2016-March/000142.html

Every time there's a new build of KDE related packages, we run rpmlint on every new rpm (for every arch) and srpm. Each time, an URL field and Source field gets tested whether it is accessible. Many of KDE packages have old URL links, which result in errors like these:

kf5-kdbusaddons.x86_64: W: invalid-url URL: http://projects.kde.org/kdbusaddons The read operation timed out

http://projects.kde.org/kdbusaddons redirects to https://quickgit.kde.org/ (when it doesn't time out) which is an expensive call for KDE infra.

Example spec file:
http://pkgs.fedoraproject.org/cgit/rpms/kf5-kdbusaddons.git/tree/kf5-kdbusaddons.spec
Example log:
https://taskotron.fedoraproject.org/artifacts/all/b25fbde0-eef1-11e5-b4e1-525400120b80/task_output/kf5-kdbusaddons-5.20.0-1.fc24.log

Obviously the KDE packages should get fixed, but the question is whether we want to run these network checks at all as part of our automation suite, because it is stressing third-party web servers and even slowing down our checks.

Here's a IRC log:

<puiterwijk> tflink: https://mail.kde.org/pipermail/kde-distro-packagers/2016-March/000142.html  This seems to be taskotron
<puiterwijk> rdieter: ^
<puiterwijk> Well, it's most likely a Fedora box. And the only thing I can find in our ansible repo that has rpmlint installed is taskotron
<puiterwijk> tflink, rdieter: yep. Check https://taskotron.fedoraproject.org/resultsdb/jobs?page=292   (or by the time you're reading this, probably later), around the "2016-03-20 23:18:23.763190" block of results: lots and lots of rpmlint jobs for kde-based things
<puiterwijk> rdieter: so, as the email says, you might want to look at changing your package so it doesn't do that. (though I'm not sure what rpmlint does that does this)
<puiterwijk> Or, if you think they're speaking nonsense, talk with them
<rdieter> puiterwijk: I'm not sure where that automated running of rpmlint is coming from
<rdieter> what is initiating all those taskotron jobs?
<kparal> kf5-kdbusaddons.x86_64: W: invalid-url URL: http://projects.kde.org/kdbusaddons The read operation timed out
<puiterwijk> rdieter: that's from bodhi submissions
<kparal> it redirects to https://quickgit.kde.org/
<puiterwijk> As soon as you submit a build to bodhi, it's also scheduled for tests in taskotron
<kparal> for every new build we run rpmlint on every rpm.arch
<kparal> puiterwijk: it's for every koji build
<puiterwijk> Ah, okay. I thought that bodhi triggered it, okay
<rdieter> ok, so how to get it to *not* do that?
<rdieter> change all our pacakges to have bogus URL: tags ?
<kparal> I think we can configure rpmlint to avoid source url checks
<kparal> or fix the urls
<puiterwijk> kparal: well, I think this is not Source url check, but URL: check
<puiterwijk> disabling those should be just fine I'd say
<kparal> http://pkgs.fedoraproject.org/cgit/rpms/kf5-kdbusaddons.git/tree/kf5-kdbusaddons.spec
<kparal> URL:            http://projects.kde.org/kdbusaddons
<kparal> you're right
<puiterwijk> Yeah. That's the URL that we don't need to check
<puiterwijk> I would say checking Source: urls is useful. URL: entries, not so much
<puiterwijk> Worst case, we can do: rpmlint -o "NetworkEnabled False"
* kparal shrugs. in taskotron, we can create a global configuration. we still don't have a way to specify a package-local configuration for rpmlint
<rdieter> puiterwijk: I think that is reasonable
<rdieter> I doubt the intention of adding these checks per build intended to hit the network significantly

@rdieter thinks we should avoid running URL checks as part of rpmlint. We can even disable all network checks with "NetworkEnabled False". We should figure out what we want to do here and then fix task-rpmlint to contain a default rpmlint config file and point rpmlint to it when running it. That should be very easy to implement.


This ticket had assigned some Differential requests:
D805

So, the URL check runs for every binary rpm and once for source rpm. So if the package has two more subpackages (say -docs and -devel) and there are 3 main arches, we check the URL 10 times in total (3x3+1), in a quick succession. If large number of packages are rebuilt simultaneously which point to the same umbrella project or repo (gnome, kde, perl, pypi, latex), we can really put quite a load on the website.

I have looked at rpmlint configuration. I haven't found any good documentation for the available options, but when digging in the source code, it seems that by using -o "NetworkEnabled False", we would disable URL checking for tags URL, DistURL and BugURL, and for Source and Patch source definitions.

After thinking about this a bit, I think it's reasonable to completely disable network-based checking for automated testing which occurs very frequently (our case). It's true that we will lose some coverage (package maintainers will not be notified about unreachable homepages or sources), OTOH we will not create any false alarms either (when those webpages are just temporarily unavailable or loaded, that's really not something we want to bother the maintainer about), and most importantly we will not unnecessarily stress various web resources just because we run our tests frequently. We will also execute the checks faster (no waiting for network requests and timeouts). It would be useful to enable the network checks once in a while (let's say check the homepage url max once per month), but that seems very complicated to implement. And from the two choices of having it on or off, disabling network checks seems like the better option to me.

Thoughts?

I'm not quite understanding why it's a bad thing to check the URLs in a spec file. I'm aware of the incident which triggered this ticket but wasn't the root cause of that a bad URL in the spec file?

I'm not crazy about hitting other projects' infra hard like that but I'm also not crazy about removing coverage after a single incident in 6 years.

but I agree with @kaparl that adding the logic to turn it off and on based on time is not worth the time and complexity.

supporting per-srpm rpmlint rules feels like a better solution for this than just removing the testing. that'd take a while to do but I also think we could use it in more places than just rpmlint

I implemented network checking just for SRPMs, but not for other RPMs, in D805. I had to split the execution into two parts, otherwise I'd have to modify rpmlint internals (I didn't find a way how to make it work just with rpmlint configuration).

Has the question of whether we should really disable network checking been answered?

I should have been more clear. The new behavior is that network checking is still performed, but with considerably fewer network requests (we do it just once per SRPM). There is no reduced test coverage, because the network checks on SRPMs is a superset of network checks on other RPMs.

D805 has been pushed and will be deployed to dev and stg machines. If everything works well, we will deploy it to production as well in some time.

I'll also try to mass-file bugs against KDE packages containing invalid URLs as described above.

I have mass-filed bugs against all packages which contain URL redirecting to https://quickgit.kde.org (exactly, not any subproject). Their tracker is here:
https://bugzilla.redhat.com/show_bug.cgi?id=1325128
and their list view is here:
https://bugzilla.redhat.com/buglist.cgi?classification=Fedora&f1=blocked&list_id=4920573&o1=casesubstring&order=component%2Cchangeddate%20DESC&product=Fedora&query_based_on=&query_format=advanced&v1=1325128

! In #760#10441, @mkrizek wrote:
Seems to be working - http://taskotron-dev.fedoraproject.org/artifacts/all/d5bc38f4-fd7e-11e5-b5cd-525400571835/task_output/gegl-0.2.0-28.fc24.log ?

Yes, the output is in the new format and it's not crashing, so hopefully we also eliminated the extra network requests :-) Let's wait over the weekend and it if looks good, we can deploy to production.

Let's leave this ticket open until we have this deployed in production.

This has been pushed to production, closing the ticket.

Metadata Update from @kparal:
- Issue tagged with: easyfix

6 years ago

Login to comment on this ticket.

Metadata