Issue #224: refine formula syntax from PoC - libtaskotron

taskotron / libtaskotron

#224 refine formula syntax from PoC

Closed: Fixed None Opened 9 years ago by tflink.

The task formula syntax from the no-cloud PoC was pretty ugly and needs some refining, to say the least. With disposable clients, we'll be adding the concept of taskrunner clients (that run the actual task) and tasktarget clients (which are the system under test for certain use cases, like cloud image testing). We'll also start actually reading and using the dependency rpms which have been listed (but ignored) for some time.

An example of the environment syntax used by the PoC is:

environment:
    # the machine VM is used to execute this task
    machine:
        image: ''
        size: ''
    # the client VM is used as an execution target
    client:
        name: 'taskotroncloudtestcloud1'
        image: 'http://download.fedoraproject.org/pub/fedora/linux/releases/21/Cloud/Images/x86_64/Fedora-Cloud-Base-20141203-21.x86_64.qcow2'
        size: ''
    rpm:
        - python-requests
        - python-paramiko
        - python-pytest

Note how:
what machine and client mean in this case is really not user-friendly and is very hard-coded. this syntax needs to be improved to make more sense, at the very least
size isn't used - the idea was an ec2/openstack style pre-defined list of VM sizes that could be specified (small, medium, large etc.)
* image specification is a bit awkward, not sure how we want to improve that.

This ticket is a bit vague at the moment, but should be enough to get the process and discussion started.

somebody commented 9 years ago

This ticket had assigned some Differential requests:
D474
D603
D461
D489

kparal commented 8 years ago

I had some issues finding the relevant code, adding links here for reference:
https://bitbucket.org/tflink/libtaskotron/branch/feature/#382-nocloud-disposable
https://bitbucket.org/tflink/task-disposabletest/src/
https://bitbucket.org/tflink/task-democloudcheck/src/

kparal commented 8 years ago

I was trying to come up with a refined formula proposal for disposable client execution, and realized that it will probably need to touch more areas - formula, config and cli. I hope I'm not hijacking this ticket, can create a separate one if needed. Here's my current proposal:

= config =
```lang=yaml
runtask_mode: libvirt # or local (local probably default for dev, libvirt for prod)
libvirt_uri: qemu:///session # or e.g. qemu:///system (session default for dev, system for prod)
libvirt_image_dir: /var/lib/taskotron/images/
libvirt_image_templates:
workstation: f21-workstation-latest.qcow2
server: f21-server-latest.qcow2
default: f21-minimal-latest.qcow2 # defaults to 'default.qcow2'
ssh_privkey: /home/user/.ssh/id_taskotron # defaults to None, meaning ssh connection works automatically

The config defines the default mode under which to run all tasks. The developer can opt in to run everything in VM as well, but they can use qemu:///session, which is a more convenient way to launch VMs - you don't need root, it runs under your account, therefore `runtask` doesn't need root either.

Theoretically people could use libvirt link to a completely different machine, like `qemu+ssh://root@example.com/system`. However I'm not sure it this is not going to complicate image snapshotting, so that might be unsupported.

We could also add `runtask_mode: ssh` and `ssh_uri: user@machine` if you think it would be desirable for people (in this proposal it is only possible from cli).

= cli =
Runtask mode selection:

--libvirt
--local

This overrides config for this single run. `--local` is always required when running destructive tasks (more on that below) in local mode, so that the machine is not destroyed by accident.

Running a single task with a different libvirt configuration (`--libvirt*` implies remote execution):

--libvirt-uri qemu:///session --libvirt-image /var/lib/taskotron/images/f21-minimal-latest.qcow2
--libvirt-uri qemu:///system --libvirt-ssh-privkey ~/.ssh/id_test

Keeping the machine running and not destroying the snapshot after execution (for task debugging purposes):

--no-destroy

Running a single task with a direct connection to a remote machine (`--ssh` implies remote execution):

--ssh user@machine

This allows to pre-create some specific VM in advance (e.g. prepare some needed environment, or install debugging tools when working on a new task, or temporarily re-using some data for efficiency) and then use that particular machine.

= formula =
```lang=yaml
name: newcheck
desc: Example check for disposable clients
maintainer: kparal
destructive: True        # new

input:
    args:
        - koji_tag

environment:
    rpm:
        - gcc
        - make
    image_template: server        # new

actions:
    - name: run newcheck
      python:
          file: newcheck.py
          callable: main
          custom_args: [--debug, --storedir, "${artifactsdir}" , "${koji_tag}"]
      export: newcheck_output

    - name: report results to resultsdb
      resultsdb:
          results: ${newcheck_output}

The new things here are:
destructive: Any task that can damage or inconveniently alter the system is considered (potentially) destructive. This includes the possibility of crashing mid-execution and not running cleanup tasks, even if there are some. A different word might be more appropriate. The safe choice here is probably to default to True. Any destructive task is run only in remote execution and never locally, unless overridden with --local.
image_template: Says which image template should be used for execution. If not present, defaults to 'default'.

You're probably wondering where are the rest of those important changes to task formula. The basic use case is covered by destructive, I believe. It tells us when we need to use disposable clients. I'll try to cover the rest in Q&A.

= Q&A =
//Q: How do I force a task to run locally (for performance / other reasons)?//
A: I believe we cannot allow task formulas to force local runtask mode, because then some untrusted third-party check could run directly on our buildslave. If we have a set of tasks we trust and we know we always want to run them in local mode, we'll need to define them in taskotron-trigger and it will schedule them in buildbot with --local option.

//Q: What happened to the "boot Cloud image for testing" task, as demonstrated in ticket description? How to run multi-VM tasks?//
A: I have a feeling we're getting slightly ahead of ourselves. Multi-VM tasks are not required for disposable clients feature, and I didn't want to complicate it any further, it adds another layer of complexity.

But it's not just that. I have some concerns about the workflow proposed in the PoC. Its idea is to define the target image in the formula, and then let taskotron prepare everything and just hand over the IP. But that was tailored just to the Base Cloud image. The initialization and booting of other types of images will be different. Say Atomic Cloud could be somewhat different (not sure), but Fedora Live image will be definitely different. GNOME Continuous image will be different. A regular disk image will be different. We would have to implement the code the start up anything that our users will want to start up. That seems like a lot of work that could be better left to the authors themselves (e.g. Cloud folks writing their tasks will definitely know better how to boot those images, and will do better job maintaining it - they already have [[http://fedoramagazine.org/using-tunir-test-fedora-cloud-images/ | tools for that]]). We would also need to provide extensive set of options in the task formula to let folks configure every aspect of that VM, if needed (number of disks, disk size, memory, graphics type, boot order, etc).

So, instead, I tried to come up with a different, more universal solution. I imagined two:
libvirt link - We'll give the task a libvirt url (running on one of our bare metal slaves) and they'll be able to create a VM exactly as they need, and do whatever they need with it. We would have to make sure it's somehow separated from other VMs, so that they can't influence anything else. I haven't studied libvirt permissions yet, not sure how easy that would be.
nested virt - If we enabled nested virt for those tasks that need it, they can run VMs inside, everything is contained, and totally in their control. I have experimented with nested virt in the last few days, enabled in on my laptop and home desktop, and it seems to just work. There seems to be slight performance decrease, but nothing serious. For example creating an updated F21 image using virt-builder takes me 2min30s on bare metal, 3min0s with nested virt, and over 30min with emulated virt. Nested virt is enabled by default on AMD, but not yet on Intel. You also need to enable it in VM properties. I investigated why it is enabled by default, and here's my chat with kashyap in #virt on irc.oftc.net: F22477. It seems to be worth a try. And it would allow Cloud folks to reuse their Tunir tests (as linked above) out of the box.

We can definitely try to find other better solutions, or go back thinking about the original one. But I think we don't need to have it solved for the initial disposable clients implementation.

Thoughts? Concerns? Ideas? Thanks.

kparal commented 8 years ago

I have talked to @jskladan and @mkrizek shortly about this in person yesterday. Adding some notes to have it written down.

The general feedback was that the syntax seems quite reasonable.
Martin asked why destructive: True was not a part of environment section - we concluded that it makes sense in any of those two places.
Josef thought that it was a good idea to separate generic disposable client-related changes from the two VMs testing use case (i.e. what I tried to do in the proposal, solve just the first thing and not both at once).
Josef noted that we will first need to have code capable of starting our disposable clients before we can properly design multi VM testing use case (that might share some of that code).

Multiple VM testing remarks (food for thought for the future):
In general the idea of giving tester more control over VMs by allowing them to control libvirt directly (nested virt, libvirt link) seemed well received.
Related to that, Josef suggested that having a formula directive that would create a working VM from a supported disk/iso image and give back an IP would be great, because not everyone will want to program themselves like the cloud folks (I assumed there would be some tool/library for that, but exposing it through a directive can be helpful).
Josef noted that with direct libvirt control, it's simple to create even more VMs and do some multihost-tests - a use case not considered before.
It was mentioned that once we start supporting multi-VM testing, there will be additional changes needed in the task formula in the environment section, e.g. need-virt: True and memory: 8G for specifying if you need to run on a host supporting nested virt and how much RAM do you need as the minimum.

More feedback welcome, ideally before I start working on those patches :-)

tflink commented 8 years ago

I'd prefer to see most specific references to libvirt made more generic - with the exception of the libvirt_uri, that data could be used for other things and we may end up using something different. image_dir, image_templates, ssh-privkey seem to be specific enough to me without referring to libvirt but those are just off-the-top-of-my-head suggestions.

For the image templates - how would requiring a specific fedora release work? Would we need to have templates declared for f21-server, f22-server rawhide-server etc.? When we start doing dist-git tasks, will we be able to figure out which fedora release needs to be used without requiring that change in every branch of the git repo? Would adding a cli option to override the template solve some of that (behaving kinda like how you described --local and --libvirt) or would that just add more complexity that we'd be better off avoiding?

In the cli options, what would --libvirt do? Force a task to be run in a disposable client regardless of whether it's configured for that or not?

I'm not sure I like the idea of giving tasks a libvirt uri to do with what they like, especially if that's a qemu:///system uri. I'm not saying that any of the PoC sytanx was/is perfect - I think that the cloud stuff was an overly complicated mess, and I wrote it :) However, giving tasks direct access to libvirt and a shell account on our virthosts makes me really nervous. Given the choice between only libvirt uri and nested virt as possible solutions, I'd much rather look into nested virt as a possible solution if we start going this route. It was really slow last time I tried it but it sounds like that stuff has been changing quickly as of late and it's been a while since I tried it last.

For all the multi-host/cloud-ish stuff, I'm more of the same mind @jskladan is - I doubt that most folks will be interested in the details behind how an instance is booted and will be looking for something more along the lines of "I want an instance with this image, give me credentials". I'll also note that tunir uses the same backing code to launch instances that we do. I'm not disagreeing that "booting images" for a semi-arbitrary set of images a boatload of complexity but I'm not sure that waiting until we have a perfect abstraction is a) practical or b) the best way forward - as you noted, there is a huge variation in booting images from cloud and atomic to workstation or some form of gnome continuous. How would you feel about one of these approaches:
# support cloud (I suspect atomic wouldn't be much different) for now, making sure people know that our syntax will probably change
# treat image launcing like we're treating directives - separate them out and load them like modules which would allow interested parties to more easily submit patches to support new image types we aren't looking at or haven't heard of yet.

Overall, I like the direction this is going in - the destructive part is a good idea, your method of specifying image templates instead of specific images has promise and the multi-host/cloud-test stuff needed more discussion :)

kparal commented 8 years ago

! In #408#6481, @tflink wrote:
I'd prefer to see most specific references to libvirt made more generic - with the exception of the libvirt_uri, that data could be used for other things and we may end up using something different. image_dir, image_templates, ssh-privkey seem to be specific enough to me without referring to libvirt but those are just off-the-top-of-my-head suggestions.

No objections. I prefixed it with libvirt for a similar reason - if we use generic names and then come up with yet another execution engine with incompatible data files, we might need to rename it to libvirt_image_dir and foo_image_dir. But it's hard to guess which is more likely, so either way works :)

For the image templates - how would requiring a specific fedora release work? Would we need to have templates declared for f21-server, f22-server rawhide-server etc.? When we start doing dist-git tasks, will we be able to figure out which fedora release needs to be used without requiring that change in every branch of the git repo?

This is a topic I avoided in the proposal because I didn't want to complicate it further. But let's dive into it:
For our current basic use cases (all tasks run on the same fedora release), the proposed templates are sufficient.
A simple change would be to add a number to the template name, so e.g. f21-workstation. You would be then able to hardcode your check to run on a specific release (the same thing we did with older depcheck, which worked only on Fedora 18). Of course if we decided to deprecate a certain image (likely due to that release going EOL), we would have to notify all task authors who use this and ask them to bump it (or bump it ourselves).
* It would be great to be able to dynamically determine the version in certain cases. I imagine we could add an input argument to the image_template keyword, similarly what we do with directives. It would crunch either directly a release number, or maybe even parse NVR (depending on what we will have as input arguments in these use cases), and then generate the template name out of it. So maybe something like this:

    image_template: fedora-${version}-server
        version: ${item}

or this:

    image_template: server
        version: ${item}

(and the translation would be processed internally).
That's definitely up to discussion once we need it, but it seems that there are possible ways to implement this nicely.

Would adding a cli option to override the template solve some of that (behaving kinda like how you described --local and --libvirt) or would that just add more complexity that we'd be better off avoiding?

It is already in the proposal:

--libvirt-image /var/lib/taskotron/images/f21-minimal-latest.qcow2

This overrides the image template mentioned in the formula. It seems simple enough to me and useful, mostly for development and debugging - sometimes you might need to run your test on a customized image and this is easier than editing the formula every time.

In the cli options, what would --libvirt do? Force a task to be run in a disposable client regardless of whether it's configured for that or not?

The idea is that a task can not configure whether it runs locally or remotely. It can only signify that it is destructive (and then we ask for double confirmation before we run it locally). --libvirt makes the task run remotely even if your libtaskotron configuration specifies local execution as the default execution mode. I assume that local execution will be the default for developers, because not everyone will want to configure their libvirt, create image templates, etc. And most tasks will probably be safe to execute locally. But if the developer wants to execute a specific destructive task remotely, using --libvirt allows her to do so, instead of editing runtask_mode in taskotron.conf back and forth all the time.

I'm not sure I like the idea of giving tasks a libvirt uri to do with what they like, especially if that's a qemu:///system uri. I'm not saying that any of the PoC sytanx was/is perfect - I think that the cloud stuff was an overly complicated mess, and I wrote it :) However, giving tasks direct access to libvirt and a shell account on our virthosts makes me really nervous. Given the choice between only libvirt uri and nested virt as possible solutions, I'd much rather look into nested virt as a possible solution if we start going this route. It was really slow last time I tried it but it sounds like that stuff has been changing quickly as of late and it's been a while since I tried it last.

I haven't studied libvirt permission model yet, so this was basically just a shot in the dark saying "this could be possible". In my optimistic dreams I imagined we could perhaps even create a separate libvirt link for each task, such that they would be isolated from each other and from the host system. It might or might not be possible, we would have to study that more.

I wanted to make a separation between disposable clients execution and this multi-vm testing, because it seems to me we don't need to solve both at the same time. I think the latter is an extension of the former, and we don't need to complicate the formula/cli/conf design with it right now, because we don't know yet how we will implement it anyway.

For all the multi-host/cloud-ish stuff, I'm more of the same mind @jskladan is - I doubt that most folks will be interested in the details behind how an instance is booted and will be looking for something more along the lines of "I want an instance with this image, give me credentials". I'll also note that tunir uses the same backing code to launch instances that we do. I'm not disagreeing that "booting images" for a semi-arbitrary set of images a boatload of complexity but I'm not sure that waiting until we have a perfect abstraction is a) practical or b) the best way forward - as you noted, there is a huge variation in booting images from cloud and atomic to workstation or some form of gnome continuous. How would you feel about one of these approaches:
# support cloud (I suspect atomic wouldn't be much different) for now, making sure people know that our syntax will probably change
# treat image launcing like we're treating directives - separate them out and load them like modules which would allow interested parties to more easily submit patches to support new image types we aren't looking at or haven't heard of yet.

Yes, supporting the most common image types in the form of directives sounds reasonable to me. I didn't want us to be the only provider of booting a custom image, because then we would be pressed to support and maintain every imaginable image type. If we allow people to run their own tools (i.e. through nested virt) but offer a simplified approach for the most common image types that we decide to support, that seems like win-win situation to me.

Regarding syntax: I imagine that there will be a set of cloud tests that will be run every time the cloud/atomic image gets updated. We would have a custom trigger for this, and the image URL can then be the ${item}, so it would easily plug into our current formula directive syntax.

Do you think we need to flesh out all the details now, before we start implementing all the other proposed changes, or do you see it as a next step/separate task from the rest?

Overall, I like the direction this is going in - the destructive part is a good idea, your method of specifying image templates instead of specific images has promise and the multi-host/cloud-test stuff needed more discussion :)

Thanks, looking forward to further comments.

lbrabec commented 8 years ago

Arguments --local, --libvirt and --ssh seems to be mutually exclusive, the question is what do we do when combination of these arguments is passed to the runtask. Do we specify priority of these arguments (this would be confusing, I think) a or do we raise an exception? Another approach that comes to my mind is to change these arguments to something like --runmode=local|libvirt|ssh. However, this would complicate specifying user and machine for ssh option.

Considering all formulae potentially dangerous is imo good idea, but should we rely on destructive: False in formula? I mean, this sounds like "for faster computer delete system32, destructive: False". So do we need this at all? Remote machine executes the task no matter what and local is forced by --local or --runmode=local.

One good suggestion from @kparal, we could wait for 5 seconds and print warning:

WARNING: RUNNING A TASK MARKED AS DESTRUCTIVE (POTENTIALLY HARMFUL FOR THE SYSTEM). HIT CTRL+C NOW IF THIS IS NOT INTENDED

kparal commented 8 years ago

! In #408#6709, @lbrabec wrote:
Arguments --local, --libvirt and --ssh seems to be mutually exclusive, the question is what do we do when combination of these arguments is passed to the runtask. Do we specify priority of these arguments (this would be confusing, I think) a or do we raise an exception? Another approach that comes to my mind is to change these arguments to something like --runmode=local|libvirt|ssh. However, this would complicate specifying user and machine for ssh option.

Yes, they are mutually exclusive. I'd raise a parser error if user specifies more than one.

Considering all formulae potentially dangerous is imo good idea, but should we rely on destructive: False in formula? I mean, this sounds like "for faster computer delete system32, destructive: False". So do we need this at all? Remote machine executes the task no matter what and local is forced by --local or --runmode=local.

The idea was that local execution mode is the default in config file for development profile, yes, but that does not approve running destructive tasks. You need to have --local on the command line in order to run a destructive task locally. Most task developers won't need that, so they will use the default from the config, they will not specify it on the command line, and therefore they will be protected against running a destructive task accidentally (let's say they specify a wrong path to the task formula, and a different task would be executed).

If we leave out destructive: True/False from the formula, consider everything potentially destructive, and either run it anyway or always require --local, then the developers have no way to distinguish destructive from (supposedly) non-destructive tasks, and will be even less protected.

The idea was not about providing a secure environment in which nothing can happen (bad things can always happen, and people can do mistakes in the destructive option), but rather to allow task creators to raise a flag "my task should not be run without explicit consent, you might lose data". Being a developer, you have to understand that running someone else's code under a privileged user (or your own user) might be a bad idea, and we have no sandbox to protect you (we should make this very clear). Marking some tasks as destructive helps you //a bit//, but certainly not all the way.

Or maybe I just misunderstood your comment?

One good suggestion from @kparal, we could wait for 5 seconds and print warning:
WARNING: RUNNING A TASK MARKED AS DESTRUCTIVE (POTENTIALLY HARMFUL FOR THE SYSTEM). HIT CTRL+C NOW IF THIS IS NOT INTENDED

Just an idea. I'd use this only in development mode, and it offers another layer or protection against destructive tasks. But I can imagine some people might complain that it slows them down when developing a destructive task. I think we could implement this and deal with any concerns once/if they appear.

tflink commented 8 years ago

I realize that I'm a little late to the party on this but is there a reason that --ssh was chosen over --remote?

kparal commented 8 years ago

You're not late, I plan to propose some changes in the last committed diff. I want to remove ssh section from config file, I think that cmdline argument is enough. And adjust ssh privkey defaults.

But I'm not sure what you mean by --remote. I don't see it mentioned anywhere in this ticket. How should it work, same as --ssh (i.e. --remote user@machine:port) or somehow differently?

I see value in using the concrete protocol name, because it implies what it requires (a running sshd server on the machine) and how it works (i.e. you need to have ssh keys installed in ~/.ssh if you don't want to configure ssh_privkey option). Using "remote" would be quite generic and non-explanatory.

tflink commented 8 years ago

! In #408#7016, @kparal wrote:
You're not late, I plan to propose some changes in the last committed diff. I want to remove ssh section from config file, I think that cmdline argument is enough. And adjust ssh privkey defaults.

It's not enough as written but I'm planning to update my last diff with code to make it work better. I agree that the config file stuff is a bit akward but without it, how will a remote task be indicated? cmdline option only and have a blocking "this task is marked as destructive, are you SURE you want to do this? Y/N" or not even allow local execution of destructive tasks?

But I'm not sure what you mean by --remote. I don't see it mentioned anywhere in this ticket. How should it work, same as --ssh (i.e. --remote user@machine:port) or somehow differently?

It's similar to the current --ssh command line option, yes. I mention the new option because 'remote' has more of a relation to 'local' than 'ssh' does - it's a bit more logical in my mind but I'm not dead set on it.

My primary issue with the current implementation of the --ssh option (ignoring the option name itself) is that it can't work with VM spawning - you have to give it user@machine which is impossible to know if the VM hasn't even been spawned yet and honestly, I don't think that's something that users should have to deal with. At a minimum, we'll need to change how the --ssh arg is parsed so that a bare --ssh would be allowed, indicating that a VM should be spawned.

I see value in using the concrete protocol name, because it implies what it requires (a running sshd server on the machine) and how it works (i.e. you need to have ssh keys installed in ~/.ssh if you don't want to configure ssh_privkey option).

If we were only working with pre-defined remote machines as a non-primary use case, I'd agree with you. However, I don't see how --ssh relates to the primary use case that we have in mind for it - indicating that a disposable client should be used for task execution.

Using "remote" would be quite generic and non-explanatory.

Which is one of the reasons that I like it, honestly. Going forward, I want to see more non-taskotron-devs interacting with the cli and I want them to be able to easily switch between local and remote execution paradigms without needing to that we happen to use ssh currently. What happens if we start using something other than ssh in the future? Do we change the cli to match whatever method we switch to and/or duplicate the command line option and the parsing logic? I don't think that it's terribly likely that we'd end up using something other than ssh in the near future but I also think the question is still valid.

As an alternative, I'd prefer to see something like the following:

--remote would force the use of a disposable client, regardless of whether it's indicated in the recipe
--remote user@host would force connection to user@host using a default communications mechanism to do the execution instead of executing locally or spawning a disposable client
--remote protocol://user@host would force connection to user@host using protocol (ssh, telnet, carrier pigeon, etc.) to do the execution instead of executing locally or spawning a disposable client

In my mind, this helps us keep the interface simple for users while still giving us some future-proofing and the ability to not spawn VMs every time without being too complicated or vague for users. It is a little more vague if you know how the inner workings tick but I'd rather optimize for user convenience than our convenience.

Thoughts?

kparal commented 8 years ago

My primary issue with the current implementation of the --ssh option (ignoring the option name itself) is that it can't work with VM spawning - you have to give it user@machine which is impossible to know if the VM hasn't even been spawned yet

I think this is the core of misunderstanding. My idea was to have --ssh completely orthogonal to disposable clients, it's not related at all to VM spawning. The purpose of --ssh is to run the code on a remote (VM or non-VM) machine, it will be most probably be used rarely and mostly for debugging. For example, when you need to hand-craft something in advance, install debugging tools, set up monitoring, etc. Or, when you want to run a destructive task once in a while and you have some existing machine laying around (physical or VM), and you don't want to spend 30 minutes installing our taskotron-vmcreator and creating the disk images in order to set up disposable clients support. In these cases, you'll simply do --ssh user@machine for an already existing, running and configured (with ssh keys) machine and we'll just run it in there.

For the basic use case used in other 99% of times, you'll either use runtask_mode: local/--local, or runtask_mode: libvirt/--libvirt (or we can name it as --remote, if you prefer, but then I guess we should also name it runtask_mode: remote).

Out of these three, --local and --ssh has been implemented, --libvirt (or --remote) has not. The last one is the one you need for testcloud integration. At least that was my idea.

tflink commented 8 years ago

! In #408#7044, @kparal wrote:
My primary issue with the current implementation of the --ssh option (ignoring the option name itself) is that it can't work with VM spawning - you have to give it user@machine which is impossible to know if the VM hasn't even been spawned yet

I think this is the core of misunderstanding. My idea was to have --ssh completely orthogonal to disposable clients, it's not related at all to VM spawning. The purpose of --ssh is to run the code on a remote (VM or non-VM) machine, it will be most probably be used rarely and mostly for debugging. For example, when you need to hand-craft something in advance, install debugging tools, set up monitoring, etc. Or, when you want to run a destructive task once in a while and you have some existing machine laying around (physical or VM), and you don't want to spend 30 minutes installing our taskotron-vmcreator and creating the disk images in order to set up disposable clients support. In these cases, you'll simply do --ssh user@machine for an already existing, running and configured (with ssh keys) machine and we'll just run it in there.

OK, that makes sense. I was under the impression that the --ssh flag was to be used for the VM spawning case as well due to how the code was written.

For the basic use case used in other 99% of times, you'll either use runtask_mode: local/--local, or runtask_mode: libvirt/--libvirt (or we can name it as --remote, if you prefer, but then I guess we should also name it runtask_mode: remote).

Couple of questions:

with the scheme outlined above, how would we handle using openstack if we ever added that support? --openstack?
if the --ssh option is for debugging and not commonly used, do we want it to be a standalone option or would it make sense to have one --remote flag? With the way the code is currently written, I think a single --remote flag would fit better if it isn't too obscure/confusing/diffficult-to-discover

Out of these three, --local and --ssh has been implemented, --libvirt (or --remote) has not. The last one is the one you need for testcloud integration. At least that was my idea.

OK, I'll write up a ticket for this and get started on it, then. As I'm looking at the --help that comes out of the runner, I think it's getting a bit unwieldly and we should put a bit of time into making that a bit cleaner by splitting the args up into groups.

kparal commented 8 years ago

We discussed this yesterday on IRC. There were some differences in our assumptions of how things will work in the future that shaped how we imagine to CLI arguments to work. I assumed that developers would run tasks locally by default, but there would be a set of very simple instructions in the readme which would allow the developer to install taskotron-vmbuilder and create default set of images, and then he could use disposable clients. So there would be two very distinct run modes (local and libvirt) and an orthogonal ssh run mode where we just connect to a machine and run the code there, but the developer has to set up everything manually (configure ssh, install libtaskotron). Also, I did not consider the cloud images to be used at all in the future (maybe from a certain directive, but not directly by us, because the custom images would serve us better).

Tim supposed that remote execution would be available even for unconfigured developer machines (no taskotron-vmbuilder, no default set of images created). If there were no our images available, he would download a cloud image on the fly and used that for execution. He also supposed that automagical setup (as much as we can sanely implement) would be used for all kinds of remote connections (disposable and direct ssh), and that's why he's picturing similar options like --remote (as opposed to very specific --libvirt) and --remote user@machine (as opposed to --ssh user@machine). In the future, if we extend our supported frameworks, he would use protocol-like syntax to distinguish it (so .e.g. --remote openstack://user@machine).

So that's to clarify the differences, I hope I did not misrepresent it. In the nearest future, let's make something work, so that it's ready for Flock. And then we can continue with the discussion and try to find the best solution for us and our users. The --remote options sounds much better to me, now that I see the different use cases.

tflink commented 8 years ago

The current code in D482 works well enough to demo using --ssh and --libvirt. I'm for just leaving it alone - there's enough stuff left to do before this demo will work as it is :)

tflink commented 8 years ago

Talking to @kparal, this is done enough for disposable clients. There are a few smaller features here which we may or may not want in the future but they can be broken off into smaller tickets.

Once those have been broken out, this can be closed

tflink commented 8 years ago

I think that the remaining bits we still want to implement have been extracted (#731, #732) so I'm closing the ticket. Please re-open if I missed something.

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

Normal

taskotron / libtaskotron

Source Code

#224 refine formula syntax from PoC Closed: Fixed None Opened 9 years ago by tflink.

Metadata

#224 refine formula syntax from PoC

Closed: Fixed None Opened 9 years ago by tflink.