#9821 curl timeout in fedpkg sources when building a big package
Closed: Upstream 3 years ago by kevin. Opened 3 years ago by bellet.

Here is the failing task:

https://kojipkgs.fedoraproject.org//work/tasks/1235/64931235/root.log

(The tarball to be fetched is big, 1.8GB)


For future refrence: please provide a link to the top level task. Pointing to just one log makes it difficult to track things down. :)

Metadata Update from @humaton:
- Issue tagged with: low-gain, medium-trouble, ops

3 years ago

FWIW, the same package successfully built in branch f34, as FlightGear-data-2020.3.8-1.fc34. Rawhide only seems to have this problem.

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

3 years ago

Oddly, I am not seeing a failed rawhide build. ;(

Can you please retry again? or if you like I can try from here.

I retried today, and I still have the problem with curl:

[bellet@bonobo FlightGear-data (rawhide %)]$ fedpkg build
Building FlightGear-data-2020.3.8-1.fc35 for rawhide
Created task: 65022826
Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=65022826
Watching tasks (this may be safely interrupted)...
65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): free
65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): free -> open (buildvm-ppc64le-37.iad2.fedoraproject.org)
  65022828 buildSRPMFromSCM (/rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): open (buildvm-ppc64le-37.iad2.fedoraproject.org)
65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): open (buildvm-ppc64le-37.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information
  0 free  1 open  0 done  1 failed
  65022828 buildSRPMFromSCM (/rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): open (buildvm-ppc64le-37.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information
  0 free  0 open  0 done  2 failed

65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f) failed

could it be related to #9433 although it doesn't fail exactly at the same stage in the building process (it concerns the same package, whose characteristic is an unusual big source file)

I fired off another build of it, going to try and watch it on the builder to see if I can see anything...

I retried today, with a newer curl version, 7.76.1-1.fc35, and the 15 seconds timeout is still there. It looks strange, because the underlying file transfer seems to work fine (at 70-80 Mbits avg) until the abrupt interrupt.

Hey @kdudka do you have any ideas on this strange issue? curl related or something in our proxy stack?

What I find confusing is that the same source has been successfully fetched, and built a package in the f34 branch (almost at the same time I initially tried and failed with the rawhide branch):

https://koji.fedoraproject.org/koji/buildinfo?buildID=1730641
https://koji.fedoraproject.org/koji/taskinfo?taskID=64927893
and
https://kojipkgs.fedoraproject.org//work/tasks/7893/64927893/root.log
The 'fedpkg sources' command worked fine in that environment.

I tried with a newer version of FlightGear-data for rawhide, and I still have this problem:

[bellet@bonobo FlightGear-data (rawhide %)]$ fedpkg build
Building FlightGear-data-2020.3.9-1.fc35 for rawhide
Created task: 70307923
Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=70307923
Watching tasks (this may be safely interrupted)...
70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): free
70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): free -> open (buildvm-ppc64le-06.iad2.fedoraproject.org)
  70307944 buildSRPMFromSCM (/rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): open (buildvm-x86-20.iad2.fedoraproject.org)
70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): open (buildvm-ppc64le-06.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information
  0 free  1 open  0 done  1 failed
  70307944 buildSRPMFromSCM (/rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): open (buildvm-x86-20.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information
  0 free  0 open  0 done  2 failed

70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696) failed

It seems that some mechanism (at which level?) cuts off the download between the builder and the lookaside cache when building the src.rpm file, if it takes more than 15 seconds to complete, which inevitably is what happens when downloading more than 1.5GB of data on a gigabit network link...

Yeah, I can't seem to find anything that should be set to 15s... very strange.

The flow should be:

builder -> proxy01 or proxy10 httpd -> pkgs01 httpd

It's all http/1.1, not h2.

And actually those are all vm's but on hosts on 10G...

curl would not return CURLE_OPERATION_TIMEDOUT (28) unless you set the CURLOPT_*TIMEOUT or CURLOPT_LOW_SPEED_{LIMIT,TIME} options. AFAIK, only CURLOPT_CONNECTTIMEOUT is set to 300 seconds by default.

As far as I understand the workflow, the error (Operation timed out after xxx milliseconds...) comes from here
https://github.com/curl/curl/blob/master/lib/transfer.c#L1279
and so we are in the case where duringconnect is false in
https://github.com/curl/curl/blob/master/lib/connect.c#L184
The possibility I see is that a 15s timeout is set per a curl useroption with CURLOPT_TIMEOUT or CURLOPT_TIMEOUT_MS

On the builder, fedpkg sources calls pyrpkg/lookaside.py from python3-rpkg, and could set pycurl options in
https://pagure.io/rpkg/blob/master/f/pyrpkg/lookaside.py#_164

Yeah, I was going to look at rpkg next. ;(

@onosek Do you have any idea what might be going on here?

I tested again this afternoon, and the problem is still there:
https://koji.fedoraproject.org/koji/taskinfo?taskID=72004625

CGILookasideCache::download() in pyrpkg seems to use only this:

            c.setopt(pycurl.LOW_SPEED_LIMIT, 1000)
            c.setopt(pycurl.LOW_SPEED_TIME, 300)

That should not trigger a timeout after 15 secs. I tried to run fedpkg sources for FlightGear-2020.3.9-data.txz on my f35 VM. It took 3 minutes and 10 seconds of wall time without any timeout. Are you sure that builders use pristine Fedora packages of libcurl, pycurl, pyrpkg, and the like?

Looking at https://kojipkgs.fedoraproject.org//work/tasks/4627/72004627/root.log now again, there is another suspicious thing: the progress meter. If i run fedpkg sources locally, I can see the custom progress meter implemented by pyrpkg. However root.log from the task in question captures the built-in progress meter of libcurl.

Yeah, the difference there is likely because the koji build is using 'fedpkg-minimal', perhaps our issue is there...

Yes! curl has a --max-time 15 parameter in /usr/bin/fedpkg-base !

And bingo... thats the problem. ;(

Would someone be willing to submit a PR on it? Should we drop --max-time or just increase it a bunch?

https://pagure.io/fedpkg-minimal is the upstream repo.

According to commit https://pagure.io/fedpkg-minimal/c/85345d7 the timeout was expected in minutes, not in seconds.

Ooops. ;(

ok, so let me push a fedpkg-minimal that has max-time set to 900 in infra so we can at least get this worked around.

ok, there's a patched fedpkg-minimal in rawhide/f35 now, can you re-try?

Awesome. Sorry this took so long to realize where the problem was. ;(

Lets close this now and keep the upstream one to make sure we adjust it upstream...

Metadata Update from @kevin:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

3 years ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Done
Related Pull Requests
  • #14 Merged 3 years ago