#7161 content-encoding header incorrectly set for .tar.gz files
Closed: Fixed 5 years ago Opened 5 years ago by bmbouter.

  • Describe what you need us to do:

Have the content-encoding response header only set if the webserver itself is providing compression. When serving .tar.gz files from http://repos.fedorapeople.org/ the content-encoding: gzip is set. Consider the exact same file being served from http://repos.fedorapeople.org/ versus S3.

From http://repos.fedorapeople.org/

[bmbouter@localhost pulp_python]$ curl -I https://repos.fedorapeople.org/pulp/pulp/fixtures/python-pypi/packages/shelf_reader-0.1-py2-none-any.whl
HTTP/1.1 200 OK
Date: Mon, 13 Aug 2018 18:57:54 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Last-Modified: Mon, 13 Aug 2018 02:53:10 GMT
ETag: "57b7-57348319d1186"
Accept-Ranges: bytes
Content-Length: 22455
Cache-Control: max-age=1800
Expires: Mon, 13 Aug 2018 19:27:54 GMT
Vary: Accept-Encoding,User-Agent
X-GitProject: (null)
AppTime: D=266
AppServer: people02.fedoraproject.org

From S3, notice there is no content-encoding set:

[bmbouter@localhost pulp_python]$ curl -I https://files.pythonhosted.org/packages/77/e0/2156a3da94ee16466a5936394caf7e89873a9b46eed72a9912bc90e42dbf/shelf_reader-0.1-py2-none-any.whl
HTTP/2 200 
x-amz-id-2: B6mC7AYwpc9DeSHPmMvUZeAFmdDW2DYXui6R/W8rXZhMGO40eyQLIfB65WB4SVQD0ub4ADhrqXw=
x-amz-request-id: 550F8F66900F994D
last-modified: Thu, 19 May 2016 18:59:09 GMT
etag: "69b867d206f1ff984651aeef25fc54f9"
x-amz-version-id: 4TGoYTc_51lerne.zoXHNsLphLO4s7Xh
content-type: application/octet-stream
server: AmazonS3
cache-control: max-age=365000000, immutable, public
accept-ranges: bytes
date: Mon, 13 Aug 2018 18:58:45 GMT
age: 1806714
x-served-by: cache-sea1025-SEA, cache-dca17732-DCA
x-cache: HIT, HIT
x-cache-hits: 1, 1
x-timer: S1534186725.016088,VS0,VE1
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-frame-options: deny
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
x-robots-header: noindex
content-length: 22455

Interestingly, this blog post seems to describe this problem exactly: https://blogs.msdn.microsoft.com/wndp/2006/08/21/content-encoding-content-type/

  • When do you need this? (YYYY/MM/DD)
    When you can. Soon would be great because we currently have to sync from production mirrors instead.

  • When is this no longer needed or useful? (YYYY/MM/DD)
    N/A

  • If we cannot complete your request, what is the impact?
    We would have to move off of fedorapeople for our fixture data hosting needs I guess.


Metadata Update from @smooge:
- Issue assigned to smooge

5 years ago

Commit 45626dc9e22a476e4fa7bd67705cd743c7e0300e made it so .tar.gz files on people were getting seen as text files. Commit dbd5d1419cf228e428dd549b42a13df82f9eda96 tries to fix using the logic from 4039e6bc32429c0e8014ba835f167f074b04d1e3 and others

Bug has been confirmed fix by reporter

Metadata Update from @smooge:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

I just tested a change and now I get the correct result:

[bmbouter@localhost pulp_python]$ curl -I https://repos.fedorapeople.org/pulp/pulp/fixtures/python-pypi/packages/shelf-reader-0.1.tar.gz
HTTP/1.1 200 OK
Date: Mon, 13 Aug 2018 20:30:07 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Last-Modified: Mon, 13 Aug 2018 02:53:10 GMT
ETag: "4a99-57348319d1186"
Accept-Ranges: bytes
Content-Length: 19097
Cache-Control: max-age=1800
Expires: Mon, 13 Aug 2018 21:00:07 GMT
X-GitProject: (null)
AppTime: D=216
AppServer: people02.fedoraproject.org
Content-Type: application/x-gzip

Metadata Update from @bmbouter:
- Issue status updated to: Open (was: Closed)

5 years ago

Metadata Update from @bmbouter:
- Issue status updated to: Closed (was: Open)

5 years ago

We're experiencing this same (or, similar) issue again. Here are the headers:

FedoraPeople

(pulp) [vagrant@pulp3 pulp]$ curl -I https://repos.fedorapeople.org/repos/pulp/pulp/fixtures/python-pypi/packages/Django-1.10.3-py2.py3-none-any.whl
HTTP/1.1 200 OK
Date: Fri, 04 Jan 2019 15:51:56 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Last-Modified: Sun, 30 Dec 2018 04:30:35 GMT
ETag: "6844b7-57e35c1ef82a1"
Accept-Ranges: bytes
Content-Length: 6833335
Cache-Control: max-age=1800
Expires: Fri, 04 Jan 2019 16:21:56 GMT
Vary: Accept-Encoding,User-Agent
X-GitProject: (null)
AppTime: D=181
AppServer: people02.fedoraproject.org

PyPI

(pulp) [vagrant@pulp3 pulp]$ curl -I https://files.pythonhosted.org/packages/b5/33/1ab8727270fa6b354545d8100fe15bc23c9b57950c49a72919f34216f167/pulpcore-3.0.0a1.tar.gz
HTTP/2 200 
x-amz-id-2: woBRRF6CjzELaipjNMIU6SKoFLuDRVJ6eaqfdfM53uzhYfa9h4OxWCxWc8LdoKqtwaXn6HBl1O8=
x-amz-request-id: 61BE7FDA5AB5D7C3
last-modified: Tue, 26 Sep 2017 15:27:13 GMT
etag: "c65450d831e33ef4fca83a3dc73b5c2d"
x-amz-version-id: MGB_aefFFadyJxhzh0EH1oleOBQfPi0s
content-type: binary/octet-stream
server: AmazonS3
cache-control: max-age=365000000, immutable, public
accept-ranges: bytes
date: Fri, 04 Jan 2019 15:52:11 GMT
age: 2804
x-served-by: cache-sea1030-SEA, cache-mdw17382-MDW
x-cache: HIT, HIT
x-cache-hits: 1, 1
x-timer: S1546617131.183972,VS0,VE2
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-frame-options: deny
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
x-robots-header: noindex
content-length: 74559

Metadata Update from @bmbouter:
- Issue status updated to: Open (was: Closed)

5 years ago

I am not sure what the problem you are running into. The original problem was dealing with .tar.gz files and if I do a curl against a .tar.gz file it says:

[smooge@smoogen-laptop tmp]$ curl -I https://repos.fedorapeople.org/pulp/pulp/fixtures/python-pypi/packages/shelf-reader-0.1.tar.gz
HTTP/1.1 200 OK
Date: Sat, 05 Jan 2019 20:15:21 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Last-Modified: Sun, 30 Dec 2018 04:30:37 GMT
ETag: "4a99-57e35c20c1709"
Accept-Ranges: bytes
Content-Length: 19097
Cache-Control: max-age=1800
Expires: Sat, 05 Jan 2019 20:45:21 GMT
X-GitProject: (null)
AppTime: D=44672
AppServer: people02.fedoraproject.org
Content-Type: application/x-gzip

A .whl file is not listed in /etc/mime.types so it is going to be treated as a general file.

After more investigation I think you're right, I don't think it has anything to do with the headers. Whereas our previous issue was causing the file to be saved into the wrong format, I think what we're actually experiencing here is a corrupted download due to an SSL error.

I went digging in the logs and found these errors:

https://paste.fedoraproject.org/paste/lbsRHzrBH2j77FdoXqBPWg

We will continue investigating the root cause, but I believe this issue can be closed. Thanks @smooge

I will close this one. If something is related to what the .whl files are needing to be seen as by the webserver, please open a different ticket so we can track it appropriately.

Metadata Update from @smooge:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Thanks @smooge for helping make the fedora infra great.

Login to comment on this ticket.

Metadata