#9989 MBS denial-of-service with submitting modulemd-packager-v3 documents
Closed: Upstream 2 years ago by kevin. Opened 2 years ago by ppisar.

MBS does not accept new builds:

$ time fedpkg module-build 
Submitting the module build...
Could not execute module_build: HTTPSConnectionPool(host='mbs.fedoraproject.org', port=443): Read timed out. (read timeout=120)

real    2m1,900s
user    0m0,503s
sys     0m0,114s

Though the end-point can be read:

$ HEAD https://mbs.fedoraproject.org:443/
200 OK
Connection: close
Date: Thu, 27 May 2021 12:27:06 GMT
Server: Apache
Content-Length: 38688
Content-Type: application/json
Access-Control-Allow-Origin: *
Apptime: D=1542490
Client-Date: Thu, 27 May 2021 12:27:07 GMT

@breilly Could you please take a look at this? Thanks.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

Metadata Update from @mizdebsk:
- Issue tagged with: mbs

2 years ago

I see a number of builds now... is this problem still happening?

What module were you trying to build?

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

That peculiar. I still experience the problem. I'm building perl-CGI:4.51:

$ fedpkg --verbose --debug  module-build 
Creating repo object from /home/petr/fedora-modules/perl-CGI
Submitting the module build...
Could not execute module_build: HTTPSConnectionPool(host='mbs.fedoraproject.org', port=443): Read timed out. (read timeout=120)
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib64/python3.9/http/client.py", line 1345, in getresponse
    response.begin()
  File "/usr/lib64/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib64/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib64/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib64/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 719, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3.9/site-packages/urllib3/packages/six.py", line 703, in reraise
    raise value
  File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 423, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 330, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='mbs.fedoraproject.org', port=443): Read timed out. (read timeout=120)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/fedpkg", line 33, in <module>
    sys.exit(load_entry_point('fedpkg==1.40', 'console_scripts', 'fedpkg')())
  File "/usr/lib/python3.9/site-packages/fedpkg/__main__.py", line 87, in main
    sys.exit(client.args.command())
  File "/usr/lib/python3.9/site-packages/pyrpkg/cli.py", line 2285, in module_build
    build_ids = self._cmd.module_submit_build(
  File "/usr/lib/python3.9/site-packages/pyrpkg/__init__.py", line 3782, in module_submit_build
    resp = self.module_send_authorized_request(
  File "/usr/lib/python3.9/site-packages/pyrpkg/__init__.py", line 3693, in module_send_authorized_request
    resp = oidc.send_request(
  File "/usr/lib/python3.9/site-packages/openidc_client/__init__.py", line 232, in send_request
    resp = requests.request(method, *args, **ckwargs)
  File "/usr/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='mbs.fedoraproject.org', port=443): Read timed out. (read timeout=120)

While debugging it I started to get 503 HTTP failure from [2605:bc80:3010:600:dead:beef:cafe:fed9]:https:

$ fedpkg module-build 
Could not execute module_build: The following error occurred while trying to determine the API versions supported by MBS: 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
[...]
  <body class="503">
[...]

And then it again timed out and now it again returns 503. What does happen? Did I kill the backend by repeating the timing out requests?

I suspected an IPv6 Patch MTU discovery issues, so I enforced IPv4 connection, but I still experience the time out. Unfortunatelly this is HTTPS, so I have no idea what happens inside:

16:50:53.025630 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [S], seq 2013105343, win 64240, options [mss 1460,sackOK,TS val 3344247745 ecr 0,nop,wscale 7], length 0
16:50:53.137014 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [S.], seq 2767071507, ack 2013105344, win 65160, options [mss 1460,sackOK,TS val 708511604 ecr 3344247745,nop,wscale 7], length 0
16:50:53.137110 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [.], ack 1, win 502, options [nop,nop,TS val 3344247856 ecr 708511604], length 0
16:50:53.162548 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [P.], seq 1:518, ack 1, win 502, options [nop,nop,TS val 3344247881 ecr 708511604], length 517
16:50:53.282437 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [.], ack 518, win 506, options [nop,nop,TS val 708511750 ecr 3344247881], length 0
16:50:53.309915 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [P.], seq 1:2897, ack 518, win 506, options [nop,nop,TS val 708511773 ecr 3344247881], length 2896
16:50:53.309972 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [.], ack 2897, win 496, options [nop,nop,TS val 3344248029 ecr 708511773], length 0
16:50:53.309916 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [P.], seq 2897:4057, ack 518, win 506, options [nop,nop,TS val 708511773 ecr 3344247881], length 1160
16:50:53.310013 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [.], ack 4057, win 489, options [nop,nop,TS val 3344248029 ecr 708511773], length 0
16:50:53.312122 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [P.], seq 518:598, ack 4057, win 501, options [nop,nop,TS val 3344248031 ecr 708511773], length 80
16:50:53.312834 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [P.], seq 598:801, ack 4057, win 501, options [nop,nop,TS val 3344248032 ecr 708511773], length 203
16:50:53.424472 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [.], ack 598, win 506, options [nop,nop,TS val 708511892 ecr 3344248031], length 0
16:50:53.424473 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [.], ack 801, win 505, options [nop,nop,TS val 708511892 ecr 3344248032], length 0
16:50:53.424473 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [P.], seq 4057:4360, ack 801, win 505, options [nop,nop,TS val 708511893 ecr 3344248032], length 303
16:50:53.424553 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [.], ack 4360, win 501, options [nop,nop,TS val 3344248143 ecr 708511893], length 0
16:50:53.424775 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [P.], seq 4360:4663, ack 801, win 505, options [nop,nop,TS val 708511893 ecr 3344248032], length 303
16:50:53.424810 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [.], ack 4663, win 501, options [nop,nop,TS val 3344248144 ecr 708511893], length 0
16:50:53.481515 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [P.], seq 4663:5343, ack 801, win 505, options [nop,nop,TS val 708511949 ecr 3344248032], length 680
16:50:53.481583 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [.], ack 5343, win 501, options [nop,nop,TS val 3344248200 ecr 708511949], length 0
16:50:53.483708 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [F.], seq 801, ack 5343, win 501, options [nop,nop,TS val 3344248203 ecr 708511949], length 0
16:50:53.535871 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [S], seq 1195844511, win 64240, options [mss 1460,sackOK,TS val 3344248255 ecr 0,nop,wscale 7], length 0
16:50:53.595535 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [P.], seq 5343:5367, ack 802, win 505, options [nop,nop,TS val 708512064 ecr 3344248203], length 24
16:50:53.595595 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [R], seq 2013106145, win 0, length 0
16:50:53.596109 IP 67.219.144.68.https > 10.0.0.226.55084: Flags [F.], seq 5367, ack 802, win 505, options [nop,nop,TS val 708512064 ecr 3344248203], length 0
16:50:53.596124 IP 10.0.0.226.55084 > 67.219.144.68.https: Flags [R], seq 2013106145, win 0, length 0
16:50:53.646953 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [S.], seq 977630917, ack 1195844512, win 65160, options [mss 1460,sackOK,TS val 708512115 ecr 3344248255,nop,wscale 7], length 0
16:50:53.647038 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [.], ack 1, win 502, options [nop,nop,TS val 3344248366 ecr 708512115], length 0
16:50:53.670813 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [P.], seq 1:518, ack 1, win 502, options [nop,nop,TS val 3344248390 ecr 708512115], length 517
16:50:53.784041 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [.], ack 518, win 506, options [nop,nop,TS val 708512251 ecr 3344248390], length 0
16:50:53.805542 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [.], seq 1:1449, ack 518, win 506, options [nop,nop,TS val 708512273 ecr 3344248390], length 1448
16:50:53.805600 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [.], ack 1449, win 501, options [nop,nop,TS val 3344248524 ecr 708512273], length 0
16:50:53.805800 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [P.], seq 1449:2897, ack 518, win 506, options [nop,nop,TS val 708512273 ecr 3344248390], length 1448
16:50:53.805825 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [.], ack 2897, win 496, options [nop,nop,TS val 3344248525 ecr 708512273], length 0
16:50:53.805800 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [P.], seq 2897:4057, ack 518, win 506, options [nop,nop,TS val 708512273 ecr 3344248390], length 1160
16:50:53.805840 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [.], ack 4057, win 489, options [nop,nop,TS val 3344248525 ecr 708512273], length 0
16:50:53.807660 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [P.], seq 518:598, ack 4057, win 501, options [nop,nop,TS val 3344248527 ecr 708512273], length 80
16:50:53.808481 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [P.], seq 598:956, ack 4057, win 501, options [nop,nop,TS val 3344248527 ecr 708512273], length 358
16:50:53.808725 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [P.], seq 956:1120, ack 4057, win 501, options [nop,nop,TS val 3344248528 ecr 708512273], length 164
16:50:53.916962 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [.], ack 598, win 506, options [nop,nop,TS val 708512386 ecr 3344248527], length 0
16:50:53.918100 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [P.], seq 4057:4360, ack 598, win 506, options [nop,nop,TS val 708512386 ecr 3344248527], length 303
16:50:53.918149 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [.], ack 4360, win 501, options [nop,nop,TS val 3344248637 ecr 708512386], length 0
16:50:53.918392 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [P.], seq 4360:4663, ack 598, win 506, options [nop,nop,TS val 708512386 ecr 3344248527], length 303
16:50:53.918418 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [.], ack 4663, win 501, options [nop,nop,TS val 3344248637 ecr 708512386], length 0
16:50:53.929275 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [.], ack 956, win 504, options [nop,nop,TS val 708512398 ecr 3344248527], length 0
16:50:53.929275 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [.], ack 1120, win 503, options [nop,nop,TS val 708512398 ecr 3344248528], length 0

[Here it times out]

16:52:53.910519 IP 10.0.0.226.55088 > 67.219.144.68.https: Flags [F.], seq 1120, ack 4663, win 501, options [nop,nop,TS val 3344368629 ecr 708512398], length 0
16:52:54.065205 IP 67.219.144.68.https > 10.0.0.226.55088: Flags [.], ack 1121, win 503, options [nop,nop,TS val 708632532 ecr 3344368629], length 0

Is there a way od debugging the fedpkg communication?

Actually just a few minutes ago it stopped responding at all... and I had to reboot the vm. ;(

Does it seem any better now?

Actually just a few minutes ago it stopped responding at all... and I had to reboot the vm. ;(

Does it seem any better now?

It completely crashed again. ;(

I doubled it's memory and cpus.

See now?

Now it's alive. I located the only place in /usr/lib/python3.9/site-packages/pyrpkg/init.py which sets timeout to 120 second. It's in module_send_authorized_request() method. I added a debugging like this:

        url = self.module_get_url(build_id=None, action='POST')
→       print("HIT module_send_authorized_request()", 'POST', url, body,
                auth_method, oidc_id_provider, oidc_client_id,
                            oidc_client_secret, oidc_scopes, 120)
        resp = self.module_send_authorized_request(
            'POST', url, body, auth_method, oidc_id_provider, oidc_client_id,
            oidc_client_secret, oidc_scopes, timeout=120)
        print("FINISHED")

and executed it:

$ fedpkg module-build 
Submitting the module build...
HIT module_send_authorized_request() POST https://mbs.fedoraproject.org/module-build-service/2/module-builds/ {'scmurl': 'https://src.fedoraproject.org/modules/perl-CGI.git?#eb6b8198854b1e57a7cac9ae191b23e76a9961f5', 'branch': '4.51', 'scratch': False} oidc https://id.fedoraproject.org/openidc/ mbs-authorizer notsecret ['openid', 'https://id.fedoraproject.org/scope/groups', 'https://mbs.fedoraproject.org/oidc/submit-build'] 120

And it halts there and after a timeout fedpkg prints:

Could not execute module_build: HTTPSConnectionPool(host='mbs.fedoraproject.org', port=443): Read timed out. (read timeout=120)

So I identified the problematic request. I won't retry it again to keep it working for others. But please forward this issue to MBS maintainers. The https://src.fedoraproject.org/modules/perl-CGI.git?#eb6b8198854b1e57a7cac9ae191b23e76a9961f5 exhibits a new module format and MBS might experience a leak or a loop on it.

I see what the problem is - this is a build using the newest packagerv3 format. MBS Fedora deployments are a bit behind, so the changes required are yet to be deployed. I'll see about getting a newer MBS deployed ASAP.

Metadata Update from @ppisar:
- Issue status updated to: Open (was: Closed)

2 years ago

@ppisar the latest MBS has been deployed to prod, it should be able to handle PackagerV3 module builds now.

Thanks. I confirm that now it accepts the build. Though it's sluggish – it took 74 seconds to accept it. Maybe there is still something suboptimal in MBS.

MBS tasks 12151 and 12152 got stuck after building all components. When cancelling 12151, I get a timeout:

$ fedpkg module-build-cancel 12151
Cancelling module build #12151...
Could not execute module_build_cancel: HTTPSConnectionPool(host='mbs.fedoraproject.org', port=443): Read timed out. (read timeout=60)

Cancelling 12152 succeeded.

Can you resolve it?

I tried canceling 12151 again and it reported that's already cancelled. Then I tried resuming the builds again to finish them and the submission failed with "500 Internal Server Error". Submitted again and it accepted. MBS is faulty.

And again. Cancelling 12151 passed. Cancelling 12152 timed out 4 times. And now 500 for any request.

@ppisar I'm seeing a repeated error which is likely causing the task to hang:
TagError: Package perl-CGI not in list for eln-modular-updates-candidate

Can you confirm that package is in the list?

Not sure about the 500s yet, investigating that.

perl-CGI component is not on the list:

$ koji list-pkgs --tag eln-modular-updates-candidate --show-blocked |grep perl
perl-Module-Install     eln-modular                              releng         
perl-Module-Install-bootstrap eln-modular                              releng         
perl-Module-Package     eln-modular                              releng         
perl                    eln-modular                              releng         
perl-IO-Socket-SSL      eln-modular                              releng         
perl-Moo                eln-modular                              releng         
perl-Unicode-EastAsianWidth eln-modular                              releng         
perl-libintl-perl       eln-modular                              releng         
perl-bootstrap          eln-modular                              releng         
perl-DBD-MySQL          eln-modular                              releng         
perl-FCGI               eln-modular                              releng         
perl-DBI                eln-modular                              releng         
perl-DBD-Pg             eln-modular                              releng         
perl-DBD-SQLite         eln-modular                              releng         
perl-YAML-bootstrap     eln-modular                              releng         
perl-App-cpanminus      eln-modular                              releng         
perl-YAML               eln-modular                              releng         
perl-libwww-perl        eln-modular                              releng         

I did not know that. I will stop building modules for components not listed there.
It would be great if MBS forwarded the tagging error instead of hanging on it.

Agreed, I filed an issue to change the behavior there. @ppisar are you still experiencing issues with the build?

After excluding platform:eln context, I do not experience issues with the build.

Shall we keep this open to track things? Or close it in favor of the above issue?

@breilly any preference?

Lets just close this and track upstream...

Metadata Update from @kevin:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Done