using this image Fedora-Atomic-26-20170821.0.x86_64.qcow2 with magnum to deploy a kube cluster , i found out some issues with networking and im not sure if there is something wrong with the image or not.
first of all if i just create a single instance using the above image and i ssh into it , networking is fine and fast as it should be. The problems begin when i use that image with magnum to deploy a cluster then the networking becomes so slow to the point where its useless. It takes 25 min to docker pull even the smallest images from docker hub and it will take 10 seconds to let you ssh from one minion to another etc .
I can verify there are no networking problems in the openstack cloud and all other instances outside of kubernetes have no such problems .
there is plenty of ram and disk space available as well in the master and 2 minions as well
so i would like to ask if someone else has experienced a similar issue ?
some filesystem logs are here
http://paste.openstack.org/show/621099/ and they look good to me
@dimtheo, hmm - I have not.
@strigazi have you seen anything like this?
I don't think that magnum has to anything to do with this problem. It is a fedora-atomic host running kubernetes in system containers and flannel from the host binaries.
Could it be something in the configuration?
I have never seen this, but I'll have more input next when we are going to do a scale test with 100s of vms.
i will be adding a few more compute hosts this week and expand the kubernetes cluster via magnum to 10 nodes. If it is still slow i will redo the whole kube cluster from scratch
i will leave this ticket open
Thanks @dimtheo - let us know what you find.
@dimtheo - any updates?
i tried 2 fedora images and recreated the cluster via magnum at least 20 times . No changes.
very slow perfomance . I had strigazi also take a look and he confirmed the same
the hardware behind the kube cluster is fine , no hardware errors
i can only think of one reason why this is happening. I have magnum pike on openstack ocata :) and this makes me think that there is a problem of these 2 versions trying to co exist .
so i think this has nothing to do with the fedora image
I can't reproduce this behavior in our cloud or in my development environment (a full openstack cloud running inside a single virtual machine).
Differences in the versions of openstack services are totally compatible and almost a standard. Magnum and many openstack services are just API services they don't affect the performance of compute resources.
the problem is because of this openstack bug
so i will close this issue .
Metadata Update from @dustymabe:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)
- Issue tagged with: F26, bug
to comment on this ticket.