#76 stores gzipped file although content-type is plain
Opened 4 years ago by maha. Modified 4 years ago

When using spectool -g specfile to download sources, one of the sources is saved as gzipped file, although the content-type says plain text and is also expected to be plain text.

You can use the following spec to reproduce it: https://src.fedoraproject.org/rpms/tor/raw/c734b9e2bd65408ca3df4e591d83e68a22262f6d/f/tor.spec

Also you can see with curl, that content is clearly text/plain and is also fetched properly by curl.

But with spectool it ends up as the gzip compressed content.

$ podman run -it fedora:33
# dnf install -y wget rpmdevtools
[...]
[root@22149421e988 /]# wget https://src.fedoraproject.org/rpms/tor/raw/c734b9e2bd65408ca3df4e591d83e68a22262f6d/f/tor.spec
--2021-01-23 10:18:27--  https://src.fedoraproject.org/rpms/tor/raw/c734b9e2bd65408ca3df4e591d83e68a22262f6d/f/tor.spec
Resolving src.fedoraproject.org (src.fedoraproject.org)... 38.145.60.21, 38.145.60.20
Connecting to src.fedoraproject.org (src.fedoraproject.org)|38.145.60.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29935 (29K) [text/plain]
Saving to: ‘tor.spec’

tor.spec                           100%[=============================================================>]  29.23K   151KB/s    in 0.2s    

2021-01-23 10:18:28 (151 KB/s) - ‘tor.spec’ saved [29935/29935]

[root@22149421e988 /]# spectool -g tor.spec 
Downloading: https://dist.torproject.org/tor-0.4.5.4-rc.tar.gz
100% of   7.5 MiB |###############################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Downloaded: tor-0.4.5.4-rc.tar.gz
Downloading: https://dist.torproject.org/tor-0.4.5.4-rc.tar.gz.asc
100% of 670.0 B |#################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Downloaded: tor-0.4.5.4-rc.tar.gz.asc
[root@22149421e988 /]# file tor-0.4.5.4-rc.tar.gz.asc
tor-0.4.5.4-rc.tar.gz.asc: gzip compressed data, from Unix, original size modulo 2^32 833
[root@22149421e988 /]# zcat tor-0.4.5.4-rc.tar.gz.asc 
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEegKzUh3HXFQroBVFav7m1J6StgEFAmALAOoACgkQav7m1J6S
tgFw4w/6AzmpCbd6r7Xk1XtpXE9MnGoJFdWKAhyIoCWcPLB+LNjRERgFCWcnGXqg
nkr0lPIrhvJ6T0k72Wkn8Tp9v4GlxIGxBew2KA2ImTNDw8Uf0wTDOqHQ5ulVdaEP
fvV8dY91lOnXPK9sMjpobeK9zzFjzg5CQc0fUtrQNy9o4o9D2/gy1dz2ZTEYsPxX
/UgDtyhoAD7T9CG9m3zUO5ORM38pKoPlFn3SGFz2Syv0gGTmaiMUniEZUT2y4Jtq
0S9lg631OVnRF672QkgIqV9Vn1JOSh3Ykhx9V7mEKLSUhgHYNllPP8ooy7C/zVUV
vNi5cZJ4NEXL3kFELGXq85VXHn8yY8LDD2PuxPJz3qFscGSL2TkdZR3QjqP5cica
QEzgT0z3Ga3eZ5GDvlPGrYh4fNpuBPP4pbsn+qSYSQSMz07xssbnDSs9ovF9gedg
tcQhF1FnnV4XBd/m+4RJyBjvo84HRekibaFhSokcE56uw4a2CDU5i0ABdzQTUaXr
lc6GONkxAEdMTCok61r0NlH5bBVwkMYEpw66C99MJtu2ZrrEOL0RNCGQwqsQveDO
qNXL7Uj3JAUZYyBKM9cAWwF2lS6HVAFkaCnnynFSfxBsymHJUFvdJGtZdMBCfhCS
6AK95423J/jqYJHqdhGZPOjaKrtCHRqI7es29njyaFaykQ8juuw=
=ELYZ
-----END PGP SIGNATURE-----
[root@22149421e988 /]# curl -v https://dist.torproject.org/tor-0.4.5.4-rc.tar.gz.asc
*   Trying 116.202.120.166:443...
* Connected to dist.torproject.org (116.202.120.166) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=dist.torproject.org
*  start date: Nov 27 00:55:50 2020 GMT
*  expire date: Feb 25 00:55:50 2021 GMT
*  subjectAltName: host "dist.torproject.org" matched cert's "dist.torproject.org"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
> GET /tor-0.4.5.4-rc.tar.gz.asc HTTP/1.1
> Host: dist.torproject.org
> User-Agent: curl/7.71.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sat, 23 Jan 2021 10:19:34 GMT
< Server: Apache
< X-Content-Type-Options: nosniff
< X-Frame-Options: sameorigin
< X-Xss-Protection: 1
< Referrer-Policy: no-referrer
< Strict-Transport-Security: max-age=15768000; preload
< Content-Security-Policy: default-src 'self';
< Last-Modified: Fri, 22 Jan 2021 16:45:45 GMT
< ETag: "341-5b97feb7409ec"
< Accept-Ranges: bytes
< Content-Length: 833
< Cache-Control: max-age=3600
< Expires: Sat, 23 Jan 2021 11:19:34 GMT
< Vary: Accept-Encoding
< Content-Type: text/plain
< 
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEegKzUh3HXFQroBVFav7m1J6StgEFAmALAOoACgkQav7m1J6S
tgFw4w/6AzmpCbd6r7Xk1XtpXE9MnGoJFdWKAhyIoCWcPLB+LNjRERgFCWcnGXqg
nkr0lPIrhvJ6T0k72Wkn8Tp9v4GlxIGxBew2KA2ImTNDw8Uf0wTDOqHQ5ulVdaEP
fvV8dY91lOnXPK9sMjpobeK9zzFjzg5CQc0fUtrQNy9o4o9D2/gy1dz2ZTEYsPxX
/UgDtyhoAD7T9CG9m3zUO5ORM38pKoPlFn3SGFz2Syv0gGTmaiMUniEZUT2y4Jtq
0S9lg631OVnRF672QkgIqV9Vn1JOSh3Ykhx9V7mEKLSUhgHYNllPP8ooy7C/zVUV
vNi5cZJ4NEXL3kFELGXq85VXHn8yY8LDD2PuxPJz3qFscGSL2TkdZR3QjqP5cica
QEzgT0z3Ga3eZ5GDvlPGrYh4fNpuBPP4pbsn+qSYSQSMz07xssbnDSs9ovF9gedg
tcQhF1FnnV4XBd/m+4RJyBjvo84HRekibaFhSokcE56uw4a2CDU5i0ABdzQTUaXr
lc6GONkxAEdMTCok61r0NlH5bBVwkMYEpw66C99MJtu2ZrrEOL0RNCGQwqsQveDO
qNXL7Uj3JAUZYyBKM9cAWwF2lS6HVAFkaCnnynFSfxBsymHJUFvdJGtZdMBCfhCS
6AK95423J/jqYJHqdhGZPOjaKrtCHRqI7es29njyaFaykQ8juuw=
=ELYZ
-----END PGP SIGNATURE-----
* Connection #0 to host dist.torproject.org left intact
[root@22149421e988 /]# rpm -qi rpmdevtools
Name        : rpmdevtools
Version     : 9.3
Release     : 1.fc33
Architecture: noarch
Install Date: Sat Jan 23 10:17:03 2021
Group       : Unspecified
Size        : 222089
License     : GPLv2+ and GPLv2
Signature   : RSA/SHA256, Wed Jan 20 12:26:53 2021, Key ID 49fd77499570ff31
Source RPM  : rpmdevtools-9.3-1.fc33.src.rpm
Build Date  : Wed Jan 20 12:10:23 2021
Build Host  : buildvm-x86-07.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://pagure.io/rpmdevtools
Bug URL     : https://bugz.fedoraproject.org/rpmdevtools
Summary     : RPM Development Tools

What fresh hell is this? :cry:

Looks like this is fallout from fixing files that should be compressed being uncompressed during download ( #72 ) ... and it leads to files that should not be compressed staying compressed after download. Nice.

I see that the server sets Content-Encoding: gzip when downloading this .asc file ...

But how is spectool supposed to know what to do?

  • some servers claims to ship gzip-encoded content when serving .tar.gz files, leading to on-the-fly decompression during download unless decode_content=True
  • some servers gzip-encode plain-text files, and those should be decompressed on-the-fly ...

I see that in the first case, Content-Encoding: gzip and Content-Type: application/x-gzip is set, and in the second, Content-Encoding: gzip and Content-Type: text/plain is set.

I hope there's a smarter way to distinguish those than to check if the Content-Type header is set to a hard-coded list of known plain-text or compressed file formats :(

Does sending the request with Accept-Encoding: identity solve this problem?

@churchyard That was the advice I got on #fedora-python, but it did not solve #72 ... so I'm not sure what to do here. On the other hand we apparently need to work around weird server configurations that claim double-gzip-compression, and on the other hand we should successfully decompress gzip-encoded plain text. :cry:

I wonder how does curl do this.

I wonder if we should use curl (or wget) to do this :)

urlgrabber is a wrapper around pycurl, and that might work better?

https://pagure.io/rpmdevtools/pull-request/77 fixes the problem for me. However, if the server sends gzipped content anyway, it will stay gzipped.

Isn't urlgrabber abandoned?

No, it's still maintained by @brejoc and myself.

Alright, sorry for the confusion.

Log in to comment on this ticket.

Metadata