Idea: Deployment plans to converge states #316

trentonstrong · 2015-07-12T01:14:06Z

While working on adding Amazon RDS resource support to nixops I couldn't help but note that the implementation was a little bit tricky, mostly due to state management. There are also quite few different "approaches" amongst the different backends and resources implemented, making it difficult to figure out the right balance between handling state differences for the user or forcing them to rectify out-of-channel. Nixops is overall great, by the way, and I am happy it exists.

Problem Description

Most of the complexity in adding a new type of resource is in the create function and concerned with possible differences between our local state and the real state, and what to do in the myriad ways they can diverge.

Take the SQS queue resource, which is relatively simple compared to something like the ec2 backend. The majority of the create logic is contained in these lines:

if self.state == self.UP and (self.queue_name != defn.queue_name or self.region != defn.region):
            self.log("queue definition changed, recreating...")
            self._destroy()
            self._conn = None # necessary if region changed

        if check or self.state != self.UP:

            self.region = defn.region
            self.connect()

            q = self._conn.lookup(defn.queue_name)

            if not q or self.state != self.UP:
                if q:
                    # SQS requires us to wait for 60 seconds to
                    # recreate a queue.
                    self.log("deleting queue ‘{0}’ (and waiting 60 seconds)...".format(defn.queue_name))
                    self._conn.delete_queue(q)
                    time.sleep(61)
                self.log("creating SQS queue ‘{0}’...".format(defn.queue_name))
                q = nixops.ec2_utils.retry(lambda: self._conn.create_queue(defn.queue_name, defn.visibility_timeout), error_codes = ['AWS.SimpleQueueService.QueueDeletedRecently'])

            with self.depl._db:
                self.state = self.UP
                self.queue_name = defn.queue_name
                self.url = q.url
                self.arn = q.get_attributes()['QueueArn']

While not particularly long or impossible to understand, I would argue that even for this simple example it takes a bit to wrap your head around the state logic and prove to yourself there aren't any serious logic errors.

A couple other issues touched on similar issues: #123 and #250.

I think I understand the motivations for having a local state file, and don't argue against it. The point I would like to make is between essential and incidental complexity. The fact that our local state and the state of the world can diverge is an essential complexity of declarative configuration tools. The fact that we implement the logic to converge those states imperatively seems like incidental complexity to me.

One Possible Approach: Deployment Plans

Perhaps one could take inspiration from other declarative languages by keeping the What distinctly separate from the How by introducing some re-usable abstractions for comparing states and generating "plans" for how to converge to the desired state, if possible.

This could help standardize code and the way the ResourceState classes move through their states and when they conflict with options such as allow_reboot,allow_recreate and so forth.

This is a vague idea at the moment, but generating concrete plans has some added benefits, such as:

Dependency ordering as a DAG
Optimization of complex plans
--dry-run takes on more meaning, being able to list the concrete steps nixops will take before a developer or sysadmin executes the deploy. This can prevent a lot of "Oh shit." moments.

It might be worth taking a look at a tool like Terraform (https://www.terraform.io/) that

Hopefully this doesn't come off as excess criticism or unsolicited advice, I just wanted to share my thoughts on the development experience while they were still fresh.

The text was updated successfully, but these errors were encountered:

rbvermaa · 2015-07-13T09:13:13Z

No worries about giving unsollicited advice, your remarks are very welcome. @Phreedom or @aszlig have suggested similar changes before if I remember correctly.

I have always liked the feature of terraform, that shows exactly what steps it will perform. I would love to see this feature in nixops as well. Will ponder a bit about this feature during my holiday, to think through what issues we would run into.

danbst · 2015-07-15T11:37:21Z

nixos will benefit too from having "switch plans", for example, when I change fs type for a partition. The particular plan must be infered or stated explicitly in configuration.nix.

moretea · 2017-06-14T10:43:52Z

I would really like to see the following workflow in nixops

# Manually generate a plan from the current state
nixops plan --output-plan ./plan.nix

# Print the operations in the plan
nixops explain-plan --plan ./plan.nix

# Apply the plan
nixops apply --plan ./plan.nix

The nixops deploy command will run the plan and apply steps after each other automatically.

aszlig added feature improvement nice-to-have idea labels Dec 8, 2015

rbvermaa mentioned this issue Jul 18, 2016

read-only-mode modifies state #471

Closed

rbvermaa mentioned this issue Nov 15, 2016

Ask for confirmation once per operation #549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Deployment plans to converge states #316

Idea: Deployment plans to converge states #316

trentonstrong commented Jul 12, 2015

rbvermaa commented Jul 13, 2015

danbst commented Jul 15, 2015

moretea commented Jun 14, 2017

Idea: Deployment plans to converge states #316

Idea: Deployment plans to converge states #316

Comments

trentonstrong commented Jul 12, 2015

Problem Description

One Possible Approach: Deployment Plans

rbvermaa commented Jul 13, 2015

danbst commented Jul 15, 2015

moretea commented Jun 14, 2017