#648 Improving contributor experience with virt-install --cloud-init
Closed: Fixed 3 years ago by kparal. Opened 3 years ago by sumantrom.

Since sometime, virt-install has the capabaility to run local qcow2 images[0] and this gives us an opportunity to evaluate if we want to rely on testcloud still.
A lot of regular validation tests for virtualization are being run on KVM and it might be good to have the local cloud test use virt-install.
@kparal and me discussed and figured out, we should move with it virt-install's --cloud-init capablities for upcoming testing of base cloud qcow2 ..
It will be great if people can vote on this and I will make the relavant changes if everyone agrees.

[0] https://blog.wikichoon.com/2020/09/virt-install-cloud-init.html


Well... why?

Testcloud handles downloading the image, preparing it with cloud init and booting it up... everything in a single command.

Apart from that, @sumantrom have you seen how virt-install command looks? You have to specify a ton of parameters and options there. For testcloud, you need only link to qcow2 image...

And for ssh, I am almost sure you won't get ip address for ssh displayed after virt-install and you need to find out. Testcloud displays that nicely after booting the image up.

A lot of regular validation tests for virtualization are being run on KVM and it might be good to have the local cloud test use virt-install.

Testcloud is also using KVM/libvirt.

Also, testcloud is used by tmt/FedoraCI, I think we should keep using it.

Also, testcloud is used by tmt/FedoraCI, I think we should keep using it.

I asked Sumantro to file this ticket. This is not about dropping the testcloud project, that's a misunderstanding. Here we're talking just about replacing testcloud instructions in Fedora QA's wiki matrices, in particular:
https://fedoraproject.org/wiki/Template:Cloud_test_matrix#Cloud_Provider_Setup
(see Testcloud Local box)

Testcloud handles downloading the image, preparing it with cloud init and booting it up... everything in a single command.
Apart from that, @sumantrom have you seen how virt-install command looks? You have to specify a ton of parameters and options there. For testcloud, you need only link to qcow2 image...
And for ssh, I am almost sure you won't get ip address for ssh displayed after virt-install and you need to find out. Testcloud displays that nicely after booting the image up.

All of this is good feedback. I'll try virt-install --cloud-init myself, to compare the experience. In general I think that cmdline options are not a problem, because they can be specified in the testcase/wiki guide. Downloading the image is a trifle. But showing the machine IP address can be a big help for people. If you see any more differences, please add them, thanks!

Overall I'd like to move away from testcloud at least in our release validation testcases. Testcloud has never had a good user experience nor at least basic 'set up' documentation. If we can move to a tool that is well-known and well maintained, I see that as a very positive thing. You don't need to figure out how to control virt-install or virt-manager, because most people already know. There is lots of documentation available. You don't need to figure out where the downloaded image went and how to delete it, because you do it yourself. The VM instance is stored in its usual location, alongside all your other VMs. There's no more manpage to study (oh wait, testcloud doesn't have a manpage).

This might also allow us to stop developing/maintaining testcloud, if we choose to do so (yes, tmt currently uses it, but they can take over the project or convert to virt-install as well -- I'm not saying we need to do this right now, and it is out of scope of this ticket, but it allows us to do it if we want, because there will be one less dependency). The fewer custom stuff that we have and the more upstream projects we use, the better. We can create a separate ticket for this discussion in the future, but let's keep this one just related to release validation testcases.

I'll play with virt-install and post my impressions afterwards.

Testcloud handles downloading the image, preparing it with cloud init and booting it up... everything in a single command.

One more thing. It's great how testcloud is powerful, but as you can see in #647, it doesn't really matter, if people are unable to use it. And that's the purpose of this ticket. Either convert to something better, or improve the tool or documentation so that release validation contributors can actually use testcloud. I know that I grind my teeth every time I try to use (and configure) testcloud and that's not a good sign.

I think we can improve the documentation first. I will gladly do it.

I reported an annoying problem in virt-install --cloud-init here:
https://github.com/virt-manager/virt-manager/issues/178

I believe we want to wait for it to be resolved before we start recommending virt-install --cloud-init to people.

I believe we want to wait for it to be resolved before we start recommending virt-install --cloud-init to people.

I don't think we would start recommending virt-install --cloud-init to people. However, no harm done in improving virt-install experience :)

There are more QoL improvements coming to testcloud and I hope we'll have some discussion about this in test mailing list before making any changes to matrices.

I played with virt-install --cloud-init extensively. The annoying bug is fixed now (there's still one little nitpick remaining), and it works quite well. I think virt-install is easier to use out of the box, because by default it uses the qemu session scope, and if you decide to use the qemu system scope, it asks you for authorization (no need to assign yourself to a group). It also has superior documentation and is widely known. Testcloud has a simpler command line because it's tailored towards the Fedora Cloud use case, it's not a generic tool. It seems both have some ups and downs, and Frantisek really loves testcloud, so let's refer to both tools, there should be no harm in that.

I moved the testcloud wiki page to a better name:
https://fedoraproject.org/wiki/Testcloud_quickstart_guide
and got rid of the newgrp command that was mentioned. I don't intend to work on improving those instructions any more, let's leave that to the testcloud fanbase ;-)

I also created virt-install --cloud-init instructions page:
https://fedoraproject.org/wiki/QA:Local_cloud_testing_with_virt-install

The page is quite verbose, but I wanted to mention the most common actions and also some corner cases. I decided to go with the default 'fedora' user that is present on Fedora Cloud images, instead of having virt-install generate the root user password automatically, which also added more steps. If we wanted, we could create a few lines in bash which would basically do the same as testcloud does now (download the image, prepare the cloud-init config and boot the machine), but I think there's not much need for that. It's still basically a simple two-step process for anyone performing this regularly.

And as the last step, I linked both guides from the Cloud test matrix:
https://fedoraproject.org/w/index.php?title=Template%3ACloud_Setup&type=revision&diff=595272&oldid=557997
which is visible here:
https://fedoraproject.org/wiki/Template:Cloud_test_matrix#Cloud_Provider_Setup

Feedback welcome. If there's no feedback, I'll close this ticket in a few days.

Metadata Update from @kparal:
- Issue assigned to kparal

3 years ago

I tweaked the language in the matrix template very slightly, otherwise that all sounds good.

I played with virt-install --cloud-init extensively. The annoying bug is fixed now (there's still one little nitpick remaining), and it works quite well. I think virt-install is easier to use out of the box, because by default it uses the qemu session scope, and if you decide to use the qemu system scope, it asks you for authorization (no need to assign yourself to a group).

I don't think so. Take a look at the wiki pages you've mentioned [ https://fedoraproject.org/wiki/QA:Local_cloud_testing_with_virt-install ] vs [ https://fedoraproject.org/wiki/Testcloud_quickstart_guide ] . It doesn't seem to me that virt-install is an easier thing to do. What's the issue with adding yourself to the group? Testcloud writes you exactly what you need to type in... it's just one command compared to creating cloudinit-user-data.yaml , copy pasting long and cryptic commands to your shell and then trying to find IP address of the VM you've just created.

Using testcloud to create a vm from scratch... it's just three simple commands.

How is virt-install easier :) ?

(Also, Testcloud handles downloading the image and importing it, so you don't have to track where the downloaded image is. )

It also has superior documentation and is widely known.

Huh? Testcloud has complete documentation and manual file, for all of it's options and use cases.

I moved the testcloud wiki page to a better name:
https://fedoraproject.org/wiki/Testcloud_quickstart_guide
and got rid of the newgrp command that was mentioned. I don't intend to work on improving those instructions any more, let's leave that to the testcloud fanbase ;-)

Anything you'd want to improve there? It seems it has everything you need, I might go ahead and add instance stop/instance remove commands, but that's it.

Feedback welcome. If there's no feedback, I'll close this ticket in a few days.

I am okay with having two options to test cloud mentioned in the matrix. However, testcloud case might be the first one/maybe in bold to direct users to the easier and more straightforward route?

what about others? @lruzicka

Take a look at the wiki pages you've mentioned

I already covered that in my previous comment. If I wanted, I could have written the virt-install guide as short as the testcloud guide. But I wanted to write a better guide.

It doesn't seem to me that virt-install is an easier thing to do

The commands are definitely longer. Many of those options don't need to be there. --os-variant detect=on,name=fedora-unknown is there just to tune performance and some other tweaks (testcloud misses that tweak, so in this case it's doing a worse job here). --noreboot is there just temporarily until this is fixed, but if you don't mind a reboot after the first shutdown, you don't need it), and user-data="/path/to/cloudinit-user-data.yaml" don't need to be there either and you'll get a functional root account generated (I already mentioned that). So yes, it can be simplified a lot, but I decided to go with the long version, because they have some benefits.

But if I think about the actual process of testing Cloud images, in both cases you basically launch a single command. Sure, you might remember the testcloud one by heart, and you'll likely need to invoke the virt-install one from bash history or copy it from the wiki, but the "difficulty" seems the same.
If you want to create the cloudinit config each time and delete it afterwards, then it's an extra step, yes (I'm going to keep it stored in my ISO directory, so it's a one-time action for me).

What's the issue with adding yourself to the group?

There's no issue. But virt-install gives you two options (user and session scope), and for neither you need to add yourself to a group.
I'm not saying it's a big issue, it's not.

and then trying to find IP address of the VM you've just created.

You don't need to, you're automatically connected to the serial console. You can even see the boot messages, unlike with testcloud, and so you can spot if something goes wrong, like a frozen boot process. So that's a plus for virt-install. At the same time, testcloud gives you the IP automatically without running any command, so that's a plus for testcloud.

(Also, Testcloud handles downloading the image and importing it, so you don't have to track where the downloaded image is. )

That's a matter of taste, I guess. I prefer explicit to implicit, and if a tool wastes my disk space because it stores large data silently and fails to prune them automatically, I get angry at it :-) (That's what testcloud does). Also, the process of downloading the ISO you want to test is everywhere in all our test cases, and so I don't see this as a huge benefit or anything. If you have different preferences, sure.

Huh? Testcloud has complete documentation and manual file, for all of it's options and use cases.

Well, most of the "complete documentation" got created very recently (in the last few weeks) as a direct response to me looking at virt-install. It had a terrible documentation for years. So don't Huh me ;-)

It's great that testcloud documentation improved. I hoped we wouldn't need that to spend our energy on that, but I'm not going to rehash my past remarks.

Anything you'd want to improve there? It seems it has everything you need, I might go ahead and add instance stop/instance remove commands, but that's it.

It doesn't say how to stop the machine, start it again if needed, remove it, and reclaim the disk space of the ISO. I think it could also mention that people can operate it through virt-manager or virsh, if they're familiar with that. It's sometimes needed for debugging. And if I remember correctly, there was some problem when trying to use it from virt-manager, the VM was missing a graphics adapter or something. I just remember it was a big pain when there was some boot or cloud-init issue.

I am okay with having two options to test cloud mentioned in the matrix. However, testcloud case might be the first one/maybe in bold to direct users to the easier and more straightforward route?

Bold red blinking text, perhaps? :-) I'm still not sure what game we're playing here. If you're willing to spend the time on improving the wiki guide, and also look at how well testcloud VMs work in virt-manager especially when you need to see the boot messages, and include some instructions about that, I don't really care the order in which the options are written. Both tools are pretty comparable in this specific use case, each having some strong points and downsides.

I added virt-install instructions for three major reasons:
a) it's a well-known upstream tool with good documentation and maintenance
b) I considered testcloud user experience terrible (that is now improved due to better documentation)
c) we could drop the development of a custom tool and invest the time elsewhere

I already covered that in my previous comment. If I wanted, I could have written the virt-install guide as short as the testcloud guide. But I wanted to write a better guide.

Shouldn't test cases on wiki be as simple as possible? It might be just matter of taste, but looking at other TCs, I get an impression that we're trying to make them simple. Linking elsewhere for additional features, not needed to complete the test case seems like that's what we've been doing and should continue to do.

The commands are definitely longer. Many of those options don't need to be there. --os-variant detect=on,name=fedora-unknown is there just to tune performance and some other tweaks (testcloud misses that tweak, so in this case it's doing a worse job here).

What performance tweaking would that do? I didn't dig too much into it, but it doesn't seem to be the case. It's just generic x86 qemu virtual machine after all...

Well, most of the "complete documentation" got created very recently (in the last few weeks) as a direct response to me looking at virt-install. It had a terrible documentation for years. So don't Huh me ;-)

Documentation was already there, just not installed by default (@lruzicka improved that bit, but everything needed to get testcloud running was there). Yeah, it was as a direct response, but that's why we create tickets... to have stuff improved and bugs fixed ;)

It's great that testcloud documentation improved. I hoped we wouldn't need that to spend our energy on that, but I'm not going to rehash my past remarks.

I had hoped we wouldn't have spent energy on writing virt-install documentation... so, matter of perspective I'd say.

Bold red blinking text, perhaps? :-) I'm still not sure what game we're playing here.

"Bold red blinking text" would work too...

I am trying to make testing of Fedora Cloud images easy and accessible to our community, so we can have as many as possible of happy community contributors helping us make Fedora as stable and as bug free as possible. I think that should be a goal of us all. That's the game I am playing here.

While using testcloud (leaving easy to understand and remember command line invocation aside), you don't need to care about finding out and pasting path of the downloaded image to the cli (and changing that path in the middle of the command every time you download a different image). Testcloud does that for you. Yeah, might be a small thing, but it saves you time, make the entire process simpler and lowers barrier for entry a bit.

From the guide on our wiki, it seemed like creating cloudinit-user-data.yaml was necessary. If it's not the case, I think it doesn't need to be on top of the wiki page.

If you're willing to spend the time on improving the wiki guide, and also look at how well testcloud VMs work in virt-manager especially when you need to see the boot messages, and include some instructions about that, I don't really care the order in which the options are written. Both tools are pretty comparable in this specific use case, each having some strong points and downsides.

Yep, I'll take a look at the instructions and add some more to the testcloud page.

a) it's a well-known upstream tool with good documentation and maintenance

Testcloud is as upstream as virt-install in my opinion. With good documentation and maintenance too.

b) I considered testcloud user experience terrible (that is now improved due to better documentation)

:)

c) we could drop the development of a custom tool and invest the time elsewhere

How much time does maintaining testcloud consume? It's a tiny bit of effort I am (and @lruzicka
recently) putting into it. For having all the benefits testcloud gives us mentioned earlier? It seems like a great gain/effort ratio.

Let's have two options, while I am preferring testcloud first as it's something we've been using for pretty long time, we know it works and users are used to it, we can add more tips and hints into the command line output/change defaults to jump into serial output, etc. It seems it's better tailored to our needs.

In the end... it seems like we can agree to disagree here.

Shouldn't test cases on wiki be as simple as possible?

It's not a test case, it's a guide. See the Cloud matrix, that one contains test cases. Those are generic across any cloud provider and expect that you can work with the provider. That's why this is a guide to teach you how to work with "local deployment".

What performance tweaking would that do? I didn't dig too much into it, but it doesn't seem to be the case. It's just generic x86 qemu virtual machine after all...

From man virt-install:

--os-variant, --osinfo
Syntax: --os-variant [OS_VARIANT|OPT1=VAL1,...]
Optimize the guest configuration for a specific operating system (ex. 'fedora29', 'rhel7', 'win10'). While not required, specifying this options is HIGHLY RECOMMENDED, as it can greatly increase performance by specifying virtio among other guest tweaks.

I included it because it was "HIGHLY RECOMMENDED" :smile: I'm not sure if it is needed in this case. Perhaps it will make some podman testing a bit faster? I'm fine with dropping it if we see no benefit and want to make the cmdline shorter.

While using testcloud (leaving easy to understand and remember command line invocation aside), you don't need to care about finding out and pasting path of the downloaded image to the cli (and changing that path in the middle of the command every time you download a different image). Testcloud does that for you. Yeah, might be a small thing, but it saves you time, make the entire process simpler and lowers barrier for entry a bit.

I reordered the command arguments so that the ISO path is at the very end. So with testcloud you need to edit the command and insert a new URL, and with virt-install you need to edit the command and insert a new file path.

From the guide on our wiki, it seemed like creating cloudinit-user-data.yaml was necessary. If it's not the case, I think it doesn't need to be on top of the wiki page.

Let me expand on it more. If you just use --cloud-init, virt-install will create a root user with a temporary password, tell you the password, and when you log in, you're asked to change the password. It's very simple and works for any cloudinit-enabled image, not just Fedora Cloud images. That's how I worked with it initially. However, then I dug into Fedora Cloud instructions, and it seems they expect people to activate the pre-created fedora user - at least I found those instructions in some cloud providers documentation (Fedora Cloud's native documentation misses any info about users, and forums are flooded with users asking how to log in... oh well). So I decided to synchronize with the other places which talked about the fedora user and activate that one instead of root. That requires creating a short cloudinit config file and using --cloud-init user-data="/path/to/cloudinit-user-data.yaml". It's more complicated, but I felt that is the right choice here. I want testers to use Fedora Cloud images as they're supposed to be used, and they seem (I'm not sure!) to be expected to use the fedora user. Since this is a Fedora guide tailored towards Fedora Cloud images, I don't want to make the guide longer and more complex to mention both options (root and fedora users), because there's no point documenting a non-recommended approach.

Does this makes sense? Any thoughts about that?

Testcloud is as upstream as virt-install in my opinion. With good documentation and maintenance too.

Upstream might have been a wrong word here. The idea is that virt-manager/virt-install are generic tools which mostly everybody in the field knows. Testcloud is a Fedora-specific tool (thus I thought about it as "downstream").

How much time does maintaining testcloud consume? It's a tiny bit of effort I am (and @lruzicka
recently) putting into it. For having all the benefits testcloud gives us mentioned earlier? It seems like a great gain/effort ratio.

Well I spent a lot of time on testcloud development in the past. There is still a heap of technical debt in it. There are no "issues" with it because hardly anybody uses it as the end user. It was developed to be used inside Taskotron (now it is used inside Fedora CI) and for that use case it worked OK. We never really tried hard to make it user friendly, because we didn't need to. The first push for user-friendliness were the documentation changes two weeks back. Speaking for myself, I wanted to poke my eye instead of using it every time I ran into some edge case and needed to stray from the beaten path "run it, everything works, shut it down". I did fix some of the pain points, but not all.

If real end users start to use it, you'll see issues created. If something changes in the cloud image/cloud init stack, you'll need to react. The project still has almost no unit tests.

But yes, maintaining it at the current state has low costs (which are still higher than writing one wiki guide). If you want to do it, sure 🤷‍♂️️ But I don't. And next time something breaks, I can point to virt-install and say "here's an alternative tool, please use that one instead". And I don't need to hurry fixing testcloud (I don't need to fix it at all). That's the biggest benefit for me from writing one wiki guide :smiley: (I can also now stop watching the testcloud project on Pagure, yay).

I think the purpose of this ticket is now done - the virt-install instructions were created and can be found at the right places. The discussion seems to be over as well. So I'll mark this ticket as resolved. Frantisek, you are of course free to improve testcloud and its guide further, and if you see it as a better solution, promote it however you see fit, I won't stop you :)

Metadata Update from @kparal:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata