#7787 Intermittent build failures due to git clone failure
Opened 5 months ago by mbooth. Modified 18 days ago

Two of my modular RPM builds failed today so far:

https://koji.fedoraproject.org/koji/taskinfo?taskID=34828402
https://koji.fedoraproject.org/koji/taskinfo?taskID=34828404

This is an example of the error reported by koji:

$ git clone -n https://src.fedoraproject.org/rpms/jna /var/lib/mock/module-eclipse-2019-03-3120190513095549-5101262b-build-16233253-1162021/root/chroot_tmpdir/scmroot/jna
Cloning into '/var/lib/mock/module-eclipse-2019-03-3120190513095549-5101262b-build-16233253-1162021/root/chroot_tmpdir/scmroot/jna'...
fatal: repository 'https://src.fedoraproject.org/rpms/jna/' not found

I am resubmitting the builds and hoping for the best :-(


This is a bug in how the build gets submitted.
The build is submitted with "SCM URL: git+https://src.fedoraproject.org/rpms/jna?#53cd350fc6b7c458bdff4fd9f7dcb85291648982".
But there needs to a be a ".git" after the repo name.
E.g.: "SCM URL: git+https://src.fedoraproject.org/rpms/jna.git?#53cd350fc6b7c458bdff4fd9f7dcb85291648982"

So I don't know what is submitting this build for you, but if it's MBS (i.e. module), check your modulemd whether it contains the correct git repo url.

@puiterwijk I don't think that's the issue here. All module component builds are submitted from URLs like that, without .git suffix, and they complete just fine. Moreover, overriding SCM URL in modulemd is not allowed in Fedora MBS deployment.

This is a bug in how the build gets submitted.
The build is submitted with "SCM URL: git+https://src.fedoraproject.org/rpms/jna?#53cd350fc6b7c458bdff4fd9f7dcb85291648982".
But there needs to a be a ".git" after the repo name.
E.g.: "SCM URL: git+https://src.fedoraproject.org/rpms/jna.git?#53cd350fc6b7c458bdff4fd9f7dcb85291648982"
So I don't know what is submitting this build for you, but if it's MBS (i.e. module), check your modulemd whether it contains the correct git repo url.

That is not the issue, this build failed one in three times, which means that twice it worked with identical modulemd (hence intermittent)

I don't see any obvious errors in logs. Not sure what could be happening...

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: src.fp.o

5 months ago

Could "eventual consistency" model of repoSpanner be the cause of this? Koji builder could've pulled from a repoSpanner node that hasn't received the commit yet, even though the push succeeded (majority of nodes received the commit).

@mizdebsk I don't think it could be. src.fedoraproject.org does not have repoSpanner. If this were happening on src.stg.fedoraproject.org then I would say it was possible.

@mbooth are you still seeing this?

@mbooth are you still seeing this?

Yes I saw it again this week, although I don't have the link to the build at hand because I was in a rush.

This still happens.
Example: https://koji.fedoraproject.org/koji/taskinfo?taskID=35125115

$ git clone -n https://src.fedoraproject.org/rpms/junit5 /var/lib/mock/module-javapackages-tools-201901-2820190529110025-f469ab01-build-16375846-1171750/root/chroot_tmpdir/scmroot/junit5
Cloning into '/var/lib/mock/module-javapackages-tools-201901-2820190529110025-f469ab01-build-16375846-1171750/root/chroot_tmpdir/scmroot/junit5'...
fatal: repository 'https://src.fedoraproject.org/rpms/junit5/' not found

The same issue just happened to me, outsides of Koji, when I tried to pull from an existing, already cloned repository:

$ git pull
fatal: repository 'https://src.fedoraproject.org/forks/mbi/rpms/velocity.git/' not found

Another occurance today:

https://koji.fedoraproject.org/koji/taskinfo?taskID=35408245

$ git clone -n https://src.fedoraproject.org/rpms/jackson-modules-base /var/lib/mock/module-eclipse-2019-03-3020190608151621-bd986a65-build-16503575-1182591/root/chroot_tmpdir/scmroot/jackson-modules-base
Cloning into '/var/lib/mock/module-eclipse-2019-03-3020190608151621-bd986a65-build-16503575-1182591/root/chroot_tmpdir/scmroot/jackson-modules-base'...
fatal: repository 'https://src.fedoraproject.org/rpms/jackson-modules-base/' not found

It's keeping me on my toes; module builds require serious babysitting:

https://koji.fedoraproject.org/koji/taskinfo?taskID=35521528

BuildError: Error running GIT command "git clone -n https://src.fedoraproject.org/rpms/protobuf /var/lib/mock/module-eclipse-2019-06-2920190613133453-72bf7ed8-build-16546230-1188136/root/chroot_tmpdir/scmroot/protobuf", see checkout.log for details

So, looking at the most recent case of this, I can see pagure on src returning the 404...

10.5.126.52 - - [21/Jul/2019:09:25:20 +0000] "GET /rpms/rust-regex-syntax/info/refs?service=git
-upload-pack HTTP/1.1" 200 534 "-" "git/2.20.1"
10.5.126.52 - - [21/Jul/2019:09:25:21 +0000] "POST /rpms/rust-regex-syntax/git-upload-pack HTTP
/1.1" 404 233 "-" "git/2.20.1"

Then later on the next build it's normal:
10.5.126.51 - - [21/Jul/2019:10:11:55 +0000] "GET /rpms/rust-regex-syntax/info/refs?service=git
-upload-pack HTTP/1.1" 200 534 "-" "git/2.20.1"
10.5.126.51 - - [21/Jul/2019:10:11:56 +0000] "POST /rpms/rust-regex-syntax/git-upload-pack HTTP
/1.1" 200 20149 "-" "git/2.20.1"

@pingou any ideas on this?

It would be nice if someone can look at this. If I rebuild my Eclipse module I have to babysit it closely because it can fail half a dozen times over the course of ~130 package builds:

https://koji.fedoraproject.org/koji/taskinfo?taskID=36757760

BuildError: Error running GIT command "git clone -n https://src.fedoraproject.org/rpms/eclipse-m2e-buildhelper /var/lib/mock/module-eclipse-2019-06-3020190801230108-f19b8f68-build-17119396-1229162/root/chroot_tmpdir/scmroot/eclipse-m2e-buildhelper", see checkout.log for details

I'm already breaking my back trying to keep Eclipse in the distro, having to babysit module builds seems like such a waste of time!

@kevin @pingou How can I best escalate this issue?

@kevin @pingou How can I best escalate this issue?

You just did :)

@kevin @pingou How can I best escalate this issue?

You just did :)

Any progress?

Just trying to do some rebuilds today: Kablammo! https://koji.fedoraproject.org/koji/taskinfo?taskID=37353518

Sadly no. ;( I too have seen this just hitting various src.fedoraproject.org pages...

But looking at this one, I see in logs:

10.5.126.52 - - [30/Aug/2019:13:29:19 +0000] "POST /rpms/jnr-x86asm/git-upload-pack HTTP/1.1" 404 233 "-" "git/2.20.1"

But thats it. Nothing else anywhere.

@pingou can you think of any way to track this down? Could we add some debugging somewhere?

So https://koji.fedoraproject.org/koji/taskinfo?taskID=37407001 fails to clone json_simple, so I wrote a script (attached) that does a git clone -n https://src.fedoraproject.org/rpms/json_simple 500 times.

It completed without error on my machine (took ~14 minutes)
It completed without error from fedorapeople (took ~9 minutes)
It errored from the batcave at the run 138, so there seems to be something network/location related.

debug_dist_git.py

I've added a little debugging and of course I just ran the script twice without error :)

Fun, I keep on trying to reproduce this and I cannot anymore :(

Could you let me know once you could reproduce it again?

Fun, I keep on trying to reproduce this and I cannot anymore :(
Could you let me know once you could reproduce it again?

Yes, of course.

Well that's not helping:

grep nekohtml /var/log/httpd/error_log
[Wed Sep 04 08:09:20.555500 2019] [:error] [pid 21808] 2019-09-04 08:09:20,555 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(10738, name:nekohtml, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 04 08:09:20.555880 2019] [:error] [pid 21808] 2019-09-04 08:09:20,555 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/nekohtml.git/info/refs
[Wed Sep 04 08:09:20.572544 2019] [:error] [pid 21808] 2019-09-04 08:09:20,572 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-advertisement', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/nekohtml.git/info/refs
[Wed Sep 04 11:23:47.653830 2019] [:error] [pid 8989] 2019-09-04 11:23:47,653 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(10738, name:nekohtml, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 04 11:23:47.653916 2019] [:error] [pid 8989] 2019-09-04 11:23:47,653 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(10738, name:nekohtml, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 04 11:23:47.654176 2019] [:error] [pid 8989] 2019-09-04 11:23:47,654 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/nekohtml/info/refs
[Wed Sep 04 11:23:47.654262 2019] [:error] [pid 8989] 2019-09-04 11:23:47,654 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/nekohtml/info/refs
[Wed Sep 04 11:23:47.670996 2019] [:error] [pid 8989] 2019-09-04 11:23:47,670 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-advertisement', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/nekohtml/info/refs
[Wed Sep 04 11:23:47.671102 2019] [:error] [pid 8989] 2019-09-04 11:23:47,670 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-advertisement', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/nekohtml/info/refs
[Wed Sep 04 11:23:47.785337 2019] [:error] [pid 6172] 2019-09-04 11:23:47,785 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(10738, name:nekohtml, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 04 11:23:47.785761 2019] [:error] [pid 6172] 2019-09-04 11:23:47,785 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/nekohtml/git-upload-pack
[Wed Sep 04 11:23:47.800302 2019] [:error] [pid 6172] 2019-09-04 11:23:47,799 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-result', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/nekohtml/git-upload-pack

None of the lines seem to correspond to the build which according to koji failed at: Completed Wed, 04 Sep 2019 08:57:27 UTC.

:(

I'm going to add more debugging which will make the logs very verbose/large but at least it should give us some more tracking info for the next time this happens.

Didn't happen again?

I am currently unable to build anything because of a bad interaction of this bug:

https://pagure.io/fm-orchestrator/issue/1243

With the resolution of this issue:

https://pagure.io/fedora-infrastructure/issue/7862

I am waiting for a infra fix to be deployed before I can proceed:

https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org/thread/LCJO7Q7HUFOOIHDHZOOLEGPVLU43B3OD/

Rest assured I will report here when I finally get another occurrence ;-)

@mbooth sorry I was AFK all of last week (and will be again all of next week :s)

https://koji.fedoraproject.org/koji/taskinfo?taskID=37717036

I see something in the logs:

[Wed Sep 18 10:16:49.077861 2019] [:error] [pid 12741] 2019-09-18 10:16:49,077 [INFO] pagure.ui.clone: Receive a cloning request for project: xmlrpc, username: None, namespace: rpms
[Wed Sep 18 10:16:49.083516 2019] [:error] [pid 12741] 2019-09-18 10:16:49,083 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(23516, name:xmlrpc, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 18 10:16:49.083817 2019] [:error] [pid 12741] 2019-09-18 10:16:49,083 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/xmlrpc/info/refs
[Wed Sep 18 10:16:49.099123 2019] [:error] [pid 12741] 2019-09-18 10:16:49,098 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-advertisement', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/xmlrpc/info/refs
[Wed Sep 18 10:16:49.213392 2019] [:error] [pid 6379] 2019-09-18 10:16:49,213 [INFO] pagure.ui.clone: Receive a cloning request for project: xmlrpc, username: None, namespace: rpms
[Wed Sep 18 10:16:49.213477 2019] [:error] [pid 6379] 2019-09-18 10:16:49,213 [INFO] pagure.ui.clone: Receive a cloning request for project: xmlrpc, username: None, namespace: rpms
[Wed Sep 18 10:16:49.218894 2019] [:error] [pid 6379] 2019-09-18 10:16:49,218 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(23516, name:xmlrpc, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 18 10:16:49.218964 2019] [:error] [pid 6379] 2019-09-18 10:16:49,218 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(23516, name:xmlrpc, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Wed Sep 18 10:16:49.219307 2019] [:error] [pid 6379] 2019-09-18 10:16:49,219 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/xmlrpc/git-upload-pack
[Wed Sep 18 10:16:49.219364 2019] [:error] [pid 6379] 2019-09-18 10:16:49,219 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/xmlrpc/git-upload-pack
[Wed Sep 18 10:16:49.231751 2019] [:error] [pid 6379] 2019-09-18 10:16:49,231 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-result', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/xmlrpc/git-upload-pack
[Wed Sep 18 10:16:49.231841 2019] [:error] [pid 6379] 2019-09-18 10:16:49,231 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-result', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/xmlrpc/git-upload-pack

But the timestamp is 10:16 while in koji it says that the task was created at 10:19 UTC, so three minutes later.

The next entry matching xmlrpc is at 10:21 and is about the trac-xmlrpc-plugin package.

Task failed at Thu, 26 Sep 2019 14:07:00 UTC

Logs show:

# grep eclipse-mpc /var/log/httpd/error_log
[Thu Sep 26 07:19:55.245169 2019] [:error] [pid 7094] 2019-09-26 07:19:55,245 [INFO] pagure.ui.clone: Receive a cloning request for project: eclipse-mpc, username: None, namespace: rpms
[Thu Sep 26 07:19:55.245253 2019] [:error] [pid 7094] 2019-09-26 07:19:55,245 [INFO] pagure.ui.clone: Receive a cloning request for project: eclipse-mpc, username: None, namespace: rpms
[Thu Sep 26 07:19:55.254775 2019] [:error] [pid 7094] 2019-09-26 07:19:55,254 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(3081, name:eclipse-mpc, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Thu Sep 26 07:19:55.254840 2019] [:error] [pid 7094] 2019-09-26 07:19:55,254 [INFO] pagure.ui.clone: None proxying to raw git cloning project: Project(3081, name:eclipse-mpc, namespace:rpms, url:None, is_fork:False,                parent_id:None)
[Thu Sep 26 07:19:55.255052 2019] [:error] [pid 7094] 2019-09-26 07:19:55,254 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/eclipse-mpc.git/info/refs
[Thu Sep 26 07:19:55.255095 2019] [:error] [pid 7094] 2019-09-26 07:19:55,254 [INFO] pagure.ui.clone: Calling: [u'/usr/bin/git', u'http-backend'] - /rpms/eclipse-mpc.git/info/refs
[Thu Sep 26 07:19:55.269350 2019] [:error] [pid 7094] 2019-09-26 07:19:55,269 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-advertisement', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/eclipse-mpc.git/info/refs
[Thu Sep 26 07:19:55.269429 2019] [:error] [pid 7094] 2019-09-26 07:19:55,269 [INFO] pagure.ui.clone: Headers: {'expires': 'Fri, 01 Jan 1980 00:00:00 GMT', 'content-type': 'application/x-git-upload-pack-advertisement', 'pragma': 'no-cache', 'cache-control': 'no-cache, max-age=0, must-revalidate'} - /rpms/eclipse-mpc.git/info/refs

And in the access log:

# grep eclipse-mpc /var/log/httpd/access_log
 - - [26/Sep/2019:05:12:21 +0000] "GET /api/0/rpms/eclipse-mpc HTTP/1.1" 200 696 "-" "python-requests/2.20.0"
 - - [26/Sep/2019:07:19:55 +0000] "GET /rpms/eclipse-mpc.git/info/refs?service=git-upload-pack HTTP/1.1" 200 1426 "-" "git/2.17.0"
 - - [26/Sep/2019:07:54:12 +0000] "GET /api/0/rpms/eclipse-mpc HTTP/1.1" 200 696 "-" "python-requests/2.20.0"
 - - [26/Sep/2019:10:34:39 +0000] "GET /api/0/rpms/eclipse-mpc HTTP/1.1" 200 696 "-" "python-requests/2.20.0"
 - - [26/Sep/2019:13:15:46 +0000] "GET /api/0/rpms/eclipse-mpc HTTP/1.1" 200 696 "-" "python-requests/2.20.0"
 - - [26/Sep/2019:14:06:33 +0000] "GET /rpms/eclipse-mpc/info/refs?service=git-upload-pack HTTP/1.1" 404 233 "-" "git/2.20.1"

So it's not network related as the request is seen in the access log, but not debugging info show in error_log.

I've added a little more debugging so let's see

After much trials and error I think I've found the issue, but not entirely the cause.

I'd be interested to validate my finding next time this happens :)

Login to comment on this ticket.

Metadata
Attachments 1
Attached a month ago View Comment