The official Fedora cloud images are built with 3GB disk. testcloud can handle these images without issue but when I start using custom images with a larger disk (10G in this particular case), I've been hitting a few problems.
On the first time I try to create an instance with the larger image, I get:
$ testcloud instance create disktest -u file:///home/tflink/taskotron-cloud/f22/20151123-taskotron-f22-10G.qcow2 DEBUG:create instance DEBUG:Local downloads will be stored in /var/lib/testcloud/cache. DEBUG:successfully changed SELinux context for image /var/lib/testcloud/cache/20151123-taskotron-f22-10G.qcow2 DEBUG:Creating instance directories DEBUG:Generated user-data for instance disktest DEBUG:Generated meta-data for instance disktest DEBUG:creating seed image /var/lib/testcloud/instances/disktest/disktest-seed.img libvirt: XML-RPC error : Cannot write data: Transport endpoint is not connected libguestfs: error: could not connect to libvirt (URI = qemu:///session): Cannot write data: Transport endpoint is not connected [code=38 domain=7] ERROR:Seed image generation failed. Exiting Traceback (most recent call last): File "/usr/bin/testcloud", line 9, in <module> load_entry_point('testcloud==0.1.5', 'console_scripts', 'testcloud')() File "/usr/lib/python2.7/site-packages/testcloud/cli.py", line 277, in main args.func(args) File "/usr/lib/python2.7/site-packages/testcloud/cli.py", line 84, in _create_instance tc_instance.prepare() File "/usr/lib/python2.7/site-packages/testcloud/instance.py", line 169, in prepare self._generate_seed_image() File "/usr/lib/python2.7/site-packages/testcloud/instance.py", line 240, in _generate_seed_image raise TestcloudInstanceError("Failure during seed image generation") testcloud.exceptions.TestcloudInstanceError: Failure during seed image generation
If I clean up the instance dir and try again, I've been seeing:
$ testcloud instance create disktest -u file:///home/tflink/taskotron-cloud/f22/20151123-taskotron-f22-10G.qcow2 DEBUG:create instance DEBUG:Local downloads will be stored in /var/lib/testcloud/cache. DEBUG:successfully changed SELinux context for image /var/lib/testcloud/cache/20151123-taskotron-f22-10G.qcow2 DEBUG:Creating instance directories DEBUG:Generated user-data for instance disktest DEBUG:Generated meta-data for instance disktest DEBUG:creating seed image /var/lib/testcloud/instances/disktest/disktest-seed.img INFO:Seed image generated successfully Formatting '/var/lib/testcloud/instances/disktest/disktest-local.qcow2', fmt=qcow2 size=10737418240 backing_file='/var/lib/testcloud/cache/20151123-taskotron-f22-10G.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 INFO:Successfully booted your local cloud image! Traceback (most recent call last): File "/usr/bin/testcloud", line 9, in <module> load_entry_point('testcloud==0.1.5', 'console_scripts', 'testcloud')() File "/usr/lib/python2.7/site-packages/testcloud/cli.py", line 277, in main args.func(args) File "/usr/lib/python2.7/site-packages/testcloud/cli.py", line 93, in _create_instance tc_instance.create_ip_file(vm_ip) File "/usr/lib/python2.7/site-packages/testcloud/instance.py", line 294, in create_ip_file ip_file.write(ip) TypeError: expected a character buffer object
Unfortunately, this error is transient - I don't hit it 100% of the time. That being said, if I introduce a 30 second delay in instance.vm_spawn(), before it returns, I haven't seen the second issue at all.
instance.vm_spawn()
I suspect both of these issues are related to the large image size - specifically how the operations surrounding that larger image take longer than they would for the official images.
Triage the issue and propose a fix. If the two symptoms listed above do not trace back to the same root cause, file new issues.
After some poking, I have a potential solution.
We can't check for domain state because that turns to 'running' as soon as the domain is created. However, the domain appears to have no interface until after cloud-init has run so we can poll and return once we can find an interface in the domain.
The question left is whether we want to delay "boot" completion in test cloud until after the interface exists or if we just want to do that for instance creation. I'm leaning towards adding the delay to instance.start() but introduce a flag or config value which would skip the polling (mostly for debug purposes if there are problems).
Thoughts?
I'm leaning towards adding the delay to instance.start() but introduce a flag or config value which would skip the polling (mostly for debug purposes if there are problems).
That sounds good to me.
Login to comment on this ticket.