#251 double slashes in URLs don't work with local resultsdb frontend
Closed: Fixed None Opened 8 years ago by kparal.

When working on support for #502 in specific tasks, I found out that my fake_fedorainfra instance contains comments like these:

Taskotron: upgradepath test PASSED on noarch. Result log: http://127.0.0.1:5001//results/230 (results are informative only)

Those links don't work by default with the resultsdb frontend, because of the double slash. We already had that issue in D373, it is caused by werkzeug used by Flask as an internal http server. So theoretically it should not affect regular deployment with Apache. We will see once we deploy this on our dev machine.

Anyway, I'd like to fix this even for development setups. This is currently in bodhi_comment_directive:

            resultsdb_id = detail._internal.get('resultsdb_result_id', None)
            if resultsdb_id:
                log_url = "%s/results/%s" % (Config.resultsdb_frontend, resultsdb_id)

So it is trivial to fix in this instance, but I guess we will see more issues like these pop up from time to time. Maybe it would be worth it to think about some more generic and less error-prone solution, like stripping the slashes from all urls in our config by default, or using something like os.path.join() but for URLs.


This ticket had assigned some Differential requests:
D410

I've done some searching, and to sum it up - the only real way to "solve" this is to decide on a scheme for the "base"-urls in the config files, and just stick with it.
Python's urlparse module just splits the string to netloc and path (~ does not split path to components), and I have not found anything like abspath() for URLs in the standard library.

I'd support going with the "no trailing slashes" way - when joining URL's by hand, I find "%s/%s" % (base, path) somewhat better looking than "%s%s" % (base, path), and it IMO is even less error-prone, as an left-over trailing slash will just "break" the Werkzeug usecase, but Apache (~ our deployment target) handles double slashes gracefully.

Thoughts?

We can stick to a certain format, but that does not fix the issue for people who provide a different URL into the config file and fail to adhere to this format (and the errors might be quite confusing in this case and hard to figure out). I was thinking about one of these approaches:

  1. Somewhere in config.py manage a list of config keys containing URLs and always strip (or add) a trailing slash when loading the config file.

  2. Instead of joining strings like %s/%s, use a clever method everywhere, which will eliminate double slashes. Unfortunately it seems it has to be custom made, because urlparse.urljoin() does what we want only for trailing slashes:

In [9]: urlparse.urljoin('http://linux.org/api/','show')
Out[9]: 'http://linux.org/api/show'

In [11]: urlparse.urljoin('http://linux.org/api','show')
Out[11]: 'http://linux.org/show'

I would strongly advise against urllib.urljoin() as it can cause even worse problems than double-slash (which are absolutely complying with the RFC):

>>> urlparse.urljoin('http://domain.net/foo', '/bar')
'http://domain.net/bar'
>>> urlparse.urljoin('http://domain.net/foo', 'bar')
'http://domain.net/bar'
>>> urlparse.urljoin('http://domain.net/foo/', 'bar')
'http://domain.net/foo/bar'
>>> urlparse.urljoin('http://domain.net/foo/', '/bar')
'http://domain.net/bar'
  1. Somewhere in config.py manage a list of config keys containing URLs and always strip (or add) a trailing slash when loading the config file.
    Although it seems a bit like an overkill, it IMHO is the only way to make sure we sanitize the user input everywhere. It does not really feel like "the thing to do", but I can't think of anything else, that would automagically make sure we strip all the trailing slashes in our base-urls.

  2. Instead of joining strings like %s/%s, use a clever method everywhere, which will eliminate double slashes. Unfortunately it seems it has to be custom made, because urlparse.urljoin() does what we want only for trailing slashes:

lang=python
def urljoin(base, *args):
    return('%s/%s' % (base.rstrip('/'), '/'.join([a.strip('/') for a in args]))

But then again, these only solve libtaskotron, we would have to make sure to do the same for all the other currently existing, and new projects. I'm not against using one of the solutions above, I just wonder, whether it would not be "cleaner" to keep the urls trailing-slash-less, and add a comment explaining how Werkzeug 'misbehaves', and why it is important to keep the URLs that way, when one changes the configs for local non-Apache "deployments".

Login to comment on this ticket.

Metadata