#2559 Unable to clone RHQ Git repo
Closed: Fixed None Opened 9 years ago by jsanda.

= phenomenon =
For a little more than 24 hours now, I have not been able to clone the RHQ repo (http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=summary). Others has experienced this issue as well. I have tried both the git and ssh protocols. The output I get is,

Cloning into rhq...
remote: Counting objects: 272706, done.
Write failed: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

= reason =

= recommendation =


ok it works for me... so we will need more info:

traceroute git.fedorahosted.org
ssh -v jsanda@git.fedorahosted.org

$ git clone ssh://smooge@git.fedorahosted.org/git/rhq/rhq.git
Initialized empty Git repository in /home/ssmoogen/Sources/rhq/.git/
remote: Counting objects: 272706, done.
remote: Compressing objects: 100% (124420/124420), done.
remote: Total 272706 (delta 129995), reused 191854 (delta 88032)
Receiving objects: 100% (272706/272706), 53.06 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (129995/129995), done.

I am experiencing this on a Fedora 13 box with git version 1.7.3.4. I also experience the problem on Mac OS X with git version 1.7.0. I am trying to clone into an empty directory. Is there anything I can do to further debug the issue and hopefully gain a little more insight into what is going on?

I forgot to mention that I was able to clone the repo on Monday(1/10/11) but not yesterday. So it seems whatever happened may have occurred some time between Monday evening and Tuesday 10:00-ish AM EST.

Well it would seem to be a network issue more than anything with the server. The logs show you getting in and then the connection closing. The directories are readable by your account. I am wondering if there is some firewall or ssh timeout you are running into from RoadRunner at the moment because it takes a while to check out (and I mean a long while.. it is a 981 MB archive)

I agree that this seems to be a networking issue, but I am not convinced that it is just on my end because others are experiencing this as well. I have had two other people on my team reproduce this. One person is in New Jersey and the other person in Czech Republic. We also had someone report this in the community forums at http://community.jboss.org/thread/161100. A little bit ago, I tried cloning the repo from two machines inside the Red Hat network (with no success) where I definitely should have greater bandwidth than I do from my house. Are there any issues that could come into play due to the fact that the repo is nearly 1 GB in size other than what you have already mentioned?

I stand corrected on one thing from my previous comment. I was able to successfully clone the repo from machines inside the Red Hat network. What are our options for compacting our repo?

smooge can you please provide an update with info on the following,

  • What is the largest repo or at least some larger repos (than RHQ) on fedorahosted? I am trying to get some sense of how big the RHQ repo is relative to others.

  • What if any proxies are sitting between fedorahosted and me?

  • If we need to compact/trim the size of the repo, what are some suggestions on how best to do it so that we can avoid this type of situation in the future?

Thanks

For the record, I'm inside the RH network and it has been about 60 minutes since I've issued my git clone command in my shell and this git clone process is sitting there apparently doing nothing. I have no network traffic that I can see and here's what the git output has looked like for the past hour:

$ git clone git://git.fedorahosted.org/rhq/rhq.git
Initialized empty Git repository in /tmp/rhq/.git/
remote: Counting objects: 272870, done.

  • waiting on du so can't answer size of projects yet.

  • since you are doing this over ssh, there are no "proxies" however every nat firewall in between is going to have timeout counters on items. If a connection goes longer than an hour they will usually disconnect. You can fix that with various ssh keep alives (man ssh_config). The ones I find useful are: ServerAliveInterval TCPKeepAlive. It is black magic though as sometimes having TCPKeepAlive no works better for slow connections and sometimes not. ServerAliveInterval should be set so it can keep saying "you alive?" blips which will tell any firewalls in between "hey keep this connection in cache"

  • I don't have any connections at the moment. Will research and try to answer.

some more test results follows:

1) I am not using ssh: protocol - I always used git: in these tests (hence I think that eliminates ssh as the problem). My command is: git clone git://git.fedorahosted.org/rhq/rhq.git

2) I tried this on three machines, all with the same behavior, which will eliminate any possible firewall issues on my local network (see c below):
a) on my Fedora desktop, on the Red Hat internal network via VPN
b) on my Windows XP laptop, NOT on Red Hat internal network, inside my home network
c) on FedoraHosted. I log onto my fedorahosted.org shell (which takes me to people01.fedorahosted.org)

In all three cases, my shell stops at:

remote: Counting objects: 272870, done.

and that's it. I never get any error messages, or any progress messages. It just sits there.

3) Note that a) is on RH internal network and b) is not. That eliminates possible problems stemming from RH network.

Question: for sanity check: is that git clone command correct? I think it is. I take that git: URL address straight from the fedorahosted git webpage here: http://git.fedorahosted.org/git/?p=rhq/rhq.git which says the GIT URL is "git://git.fedorahosted.org/rhq/rhq.git" and that's the one I use in the git clone command. And of course, git clone does at least count the objects from the remote repo, so its "doing something".

OK, more results - its been over 20 minutes, and now my fedorahosted.org shell is starting to compress the data. So this looks like its working, albeit in a slow fashion. However, my desktop has been running for over 3 hours and still nothing. My windows laptop for over 30 minutes.

So, this doesn't eliminate issues on my home network (however, I don't have any firewall on my router, and my windows box has the firewall disabled. I also have been able to clone prior to this week without problems, and I know I haven't changed any firewall settings this week).

I spent time last night and this morning cloning from a machine internal to Red Hat's network and it has been taking upwards of two hours for the clone operation to complete. I understand that RHQ has the fourth largest repo on fedorahosted and as a result cloning will take some time; however, prior to Tuesday, I was able to clone the repo locally from my home network in under an hour.

I did manage to clone the spacewalk repo yesterday, but which I believe is comparable in size to the rhq repo. I fully understand that this is not an issue for everyone. In fact, I had a community user successfully clone the repo last night; however, something must have changed between Monday night and Tuesday morning of this week that has resulted in myself and others being unable to clone the repo.

I also added the following to ~/.ssh/config on my local box,

ServerAliveInterval = 7200

The man page for ssh_config indicated that TCPKeepAlive is on by default.

Ok here are the system wide changes and timeline for the system:

1) Thursday Jan 06 updates were applied to system.

2) Tuesday after problem seems to have started, system was rebooted into new kernel.

Looking at the files created on the 10->12 I see that the following users created N diffs during that time.

      2 ips
      3 jsanda
    132 jshaughn
     96 mazz
     25 pilhuhn

The ips ones pack objects which might be important or not. I am going to have to look and get some outside help on this.

Ok I have done the following in a copied version of the repository.

{{
[root@hosted2 rhq2.git]# git repack --depth=250 --window=250 -a -d -f
Counting objects: 273355, done.
Compressing objects: 100% (212924/212924), done.
Writing objects: 100% (273355/273355), done.
Total 273355 (delta 132916), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.
root@hosted2 rhq2.git]# du -sc ./
8 ./branches
8 ./commit-list
8 ./commit-list-prefix
8 ./config
8 ./description
8 ./HEAD
84 ./hooks
32 ./info
72236 ./objects
912 ./refs
73312 total
[root@hosted2 rhq2.git]# du -sc /srv/git/rhq/rhq.git/

8 /srv/git/rhq/rhq.git/branches
8 /srv/git/rhq/rhq.git/commit-list
8 /srv/git/rhq/rhq.git/commit-list-prefix
8 /srv/git/rhq/rhq.git/config
8 /srv/git/rhq/rhq.git/description
8 /srv/git/rhq/rhq.git/HEAD
84 /srv/git/rhq/rhq.git/hooks
32 /srv/git/rhq/rhq.git/info
1005204 /srv/git/rhq/rhq.git/objects
912 /srv/git/rhq/rhq.git/refs
1006280 total
}}

This makes things a LOT easier to clone. But to confirm for you please try the following:

time git clone ssh://smooge@git.fedorahosted.org/git/rhq/rhq2.git

If it works, then we can see about what needs to be done to repack the main archive.

w00t! Not only was I able to clone your repo, but the operation completed in under 3 minutes. Are you going to try and repack the main repo?

I can do so, but I need to make sure

a) this didn't mess with the main repo.. did you lose anything etc? [it should not but they also say check.]

b) that I can 'close' your repo for a bit so I don't end up with a broken repo when doing it (it will take about an hour to do this).

c) I have your express permission to do this.

Provided we have a full back up of the repo that's on the server before you proceed, we should be covered for a) and as for b) I can send out an email to the appropriate lists notifying people about the maintenance that is taking place. Considering the situation and since we are outside of normal business hours, temporarily 'closing' the repo should be fine. And on behalf of the RHQ/JBossON team you have my permission to proceed. The only thing I ask is that you let me know prior to starting to I can send out an email.

Thanks

Item has been repacked and should be functional now.

{{

$ time git clone -v --progress ssh://smooge@git.fedorahosted.org/git/rhq/rhq.git
Initialized empty Git repository in /home/ssmoogen/Sources/rhq/.git/
remote: Counting objects: 273464, done.
remote: Compressing objects: 100% (80036/80036), done.
remote: Total 273464 (delta 132960), reused 273464 (delta 132960)
Receiving objects: 100% (273464/273464), 46.26 MiB | 950 KiB/s, done.
Resolving deltas: 100% (132960/132960), done.

real 1m25.048s
user 0m9.625s
sys 0m3.795s

}}

Outstanding. I just cloned the repo and cloning took less than two mintues. Thank you for the hard work and extra effort on this. We greatly appreciate. Feel free to close the ticket.

Login to comment on this ticket.

Metadata