Our RSS1.0, RSS2.0 and Atom feeds do not specify the Content-Encoding in the HTTP response headers.
Content-Encoding
Do you have a test I can use to verify I've fixed it.
{{{ import httplib from pprint import pprint
def printheaders(host, path, port=80): print host + path h = httplib.HTTPConnection(host, port) h.request('GET', path) r = h.getresponse() pprint(r.getheaders())
printheaders('planet.fedoraproject.org', '/rss20.xml') printheaders('feeds.washingtonpost.com', '/wp-dyn/rss/politics/index_xml')
================================================
planet.fedoraproject.org/rss20.xml [('content-length', '139120'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.3'), ('last-modified', 'Wed, 18 Feb 2009 21:26:19 GMT'), ('connection', 'close'), ('etag', '"8e01b4-21f70-46338120cf4c0"'), ('date', 'Wed, 18 Feb 2009 21:29:55 GMT'), ('content-type', 'text/xml')] feeds.washingtonpost.com/wp-dyn/rss/politics/index_xml [('x-content-type-options', 'nosniff'), ('transfer-encoding', 'chunked'), ('expires', 'Wed, 18 Feb 2009 21:29:56 GMT'), ('server', 'GFE/1.3'), ('last-modified', 'Wed, 18 Feb 2009 07:21:47 GMT'), ('etag', 'idw5V9pVKPxoUfy4ck4opdZSAqs'), ('cache-control', 'private, max-age=0'), ('date', 'Wed, 18 Feb 2009 21:29:56 GMT'), ('content-type', 'text/xml; charset=iso-8859-1')] }}}
Specifying <?xml encoding="UTF-8"?> in our feeds would probably be sufficient, although I don't think that adding the charset to the content-type would hurt.
okay I added AddCharset UTF-8 .xml
to the planet apache config
and now I see ('content-type', 'text/xml; charset=utf-8')
Is that sufficient to get your parser to work?
Excellent, this did the trick. Thanks a lot, Seth!
Log in to comment on this ticket.