Here is the failing task:
https://kojipkgs.fedoraproject.org//work/tasks/1235/64931235/root.log
(The tarball to be fetched is big, 1.8GB)
For future refrence: please provide a link to the top level task. Pointing to just one log makes it difficult to track things down. :)
Metadata Update from @humaton: - Issue tagged with: low-gain, medium-trouble, ops
FWIW, the same package successfully built in branch f34, as FlightGear-data-2020.3.8-1.fc34. Rawhide only seems to have this problem.
Metadata Update from @smooge: - Issue priority set to: Waiting on Assignee (was: Needs Review)
Oddly, I am not seeing a failed rawhide build. ;(
Can you please retry again? or if you like I can try from here.
I retried today, and I still have the problem with curl:
[bellet@bonobo FlightGear-data (rawhide %)]$ fedpkg build Building FlightGear-data-2020.3.8-1.fc35 for rawhide Created task: 65022826 Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=65022826 Watching tasks (this may be safely interrupted)... 65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): free 65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): free -> open (buildvm-ppc64le-37.iad2.fedoraproject.org) 65022828 buildSRPMFromSCM (/rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): open (buildvm-ppc64le-37.iad2.fedoraproject.org) 65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): open (buildvm-ppc64le-37.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information 0 free 1 open 0 done 1 failed 65022828 buildSRPMFromSCM (/rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f): open (buildvm-ppc64le-37.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information 0 free 0 open 0 done 2 failed 65022826 build (rawhide, /rpms/FlightGear-data.git:0a0a53a0f21c84d8221e07b71528ae4bbce1ae7f) failed
could it be related to #9433 although it doesn't fail exactly at the same stage in the building process (it concerns the same package, whose characteristic is an unusual big source file)
I fired off another build of it, going to try and watch it on the builder to see if I can see anything...
I retried today, with a newer curl version, 7.76.1-1.fc35, and the 15 seconds timeout is still there. It looks strange, because the underlying file transfer seems to work fine (at 70-80 Mbits avg) until the abrupt interrupt.
Hey @kdudka do you have any ideas on this strange issue? curl related or something in our proxy stack?
What I find confusing is that the same source has been successfully fetched, and built a package in the f34 branch (almost at the same time I initially tried and failed with the rawhide branch):
https://koji.fedoraproject.org/koji/buildinfo?buildID=1730641 https://koji.fedoraproject.org/koji/taskinfo?taskID=64927893 and https://kojipkgs.fedoraproject.org//work/tasks/7893/64927893/root.log The 'fedpkg sources' command worked fine in that environment.
I tested again today, and the problem is still there: https://koji.fedoraproject.org/koji/taskinfo?taskID=67228629
I tried with a newer version of FlightGear-data for rawhide, and I still have this problem:
[bellet@bonobo FlightGear-data (rawhide %)]$ fedpkg build Building FlightGear-data-2020.3.9-1.fc35 for rawhide Created task: 70307923 Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=70307923 Watching tasks (this may be safely interrupted)... 70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): free 70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): free -> open (buildvm-ppc64le-06.iad2.fedoraproject.org) 70307944 buildSRPMFromSCM (/rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): open (buildvm-x86-20.iad2.fedoraproject.org) 70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): open (buildvm-ppc64le-06.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information 0 free 1 open 0 done 1 failed 70307944 buildSRPMFromSCM (/rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696): open (buildvm-x86-20.iad2.fedoraproject.org) -> FAILED: BuildError: error retrieving sources, mock exited with status 28; see root.log for more information 0 free 0 open 0 done 2 failed 70307923 build (rawhide, /rpms/FlightGear-data.git:afd507953a0dc29330532f91c755a1a31ac12696) failed
It seems that some mechanism (at which level?) cuts off the download between the builder and the lookaside cache when building the src.rpm file, if it takes more than 15 seconds to complete, which inevitably is what happens when downloading more than 1.5GB of data on a gigabit network link...
Yeah, I can't seem to find anything that should be set to 15s... very strange.
The flow should be:
builder -> proxy01 or proxy10 httpd -> pkgs01 httpd
It's all http/1.1, not h2.
And actually those are all vm's but on hosts on 10G...
curl would not return CURLE_OPERATION_TIMEDOUT (28) unless you set the CURLOPT_*TIMEOUT or CURLOPT_LOW_SPEED_{LIMIT,TIME} options. AFAIK, only CURLOPT_CONNECTTIMEOUT is set to 300 seconds by default.
CURLE_OPERATION_TIMEDOUT (28)
CURLOPT_*TIMEOUT
CURLOPT_LOW_SPEED_{LIMIT,TIME}
CURLOPT_CONNECTTIMEOUT
As far as I understand the workflow, the error (Operation timed out after xxx milliseconds...) comes from here https://github.com/curl/curl/blob/master/lib/transfer.c#L1279 and so we are in the case where duringconnect is false in https://github.com/curl/curl/blob/master/lib/connect.c#L184 The possibility I see is that a 15s timeout is set per a curl useroption with CURLOPT_TIMEOUT or CURLOPT_TIMEOUT_MS
duringconnect
CURLOPT_TIMEOUT
CURLOPT_TIMEOUT_MS
On the builder, fedpkg sources calls pyrpkg/lookaside.py from python3-rpkg, and could set pycurl options in https://pagure.io/rpkg/blob/master/f/pyrpkg/lookaside.py#_164
fedpkg sources
pyrpkg/lookaside.py
Yeah, I was going to look at rpkg next. ;(
@onosek Do you have any idea what might be going on here?
I tested again this afternoon, and the problem is still there: https://koji.fedoraproject.org/koji/taskinfo?taskID=72004625
CGILookasideCache::download() in pyrpkg seems to use only this:
CGILookasideCache::download()
c.setopt(pycurl.LOW_SPEED_LIMIT, 1000) c.setopt(pycurl.LOW_SPEED_TIME, 300)
That should not trigger a timeout after 15 secs. I tried to run fedpkg sources for FlightGear-2020.3.9-data.txz on my f35 VM. It took 3 minutes and 10 seconds of wall time without any timeout. Are you sure that builders use pristine Fedora packages of libcurl, pycurl, pyrpkg, and the like?
FlightGear-2020.3.9-data.txz
Looking at https://kojipkgs.fedoraproject.org//work/tasks/4627/72004627/root.log now again, there is another suspicious thing: the progress meter. If i run fedpkg sources locally, I can see the custom progress meter implemented by pyrpkg. However root.log from the task in question captures the built-in progress meter of libcurl.
pyrpkg
Yeah, the difference there is likely because the koji build is using 'fedpkg-minimal', perhaps our issue is there...
Yes! curl has a --max-time 15 parameter in /usr/bin/fedpkg-base !
--max-time 15
/usr/bin/fedpkg-base
And bingo... thats the problem. ;(
Would someone be willing to submit a PR on it? Should we drop --max-time or just increase it a bunch?
https://pagure.io/fedpkg-minimal is the upstream repo.
According to commit https://pagure.io/fedpkg-minimal/c/85345d7 the timeout was expected in minutes, not in seconds.
Ooops. ;(
ok, so let me push a fedpkg-minimal that has max-time set to 900 in infra so we can at least get this worked around.
ok, there's a patched fedpkg-minimal in rawhide/f35 now, can you re-try?
yes, it works:
https://koji.fedoraproject.org/koji/taskinfo?taskID=72030125
Awesome. Sorry this took so long to realize where the problem was. ;(
Lets close this now and keep the upstream one to make sure we adjust it upstream...
Metadata Update from @kevin: - Issue close_status updated to: Upstream - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.