Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Deployment plans to converge states #316

Open
trentonstrong opened this issue Jul 12, 2015 · 3 comments
Open

Idea: Deployment plans to converge states #316

trentonstrong opened this issue Jul 12, 2015 · 3 comments

Comments

@trentonstrong
Copy link
Contributor

While working on adding Amazon RDS resource support to nixops I couldn't help but note that the implementation was a little bit tricky, mostly due to state management. There are also quite few different "approaches" amongst the different backends and resources implemented, making it difficult to figure out the right balance between handling state differences for the user or forcing them to rectify out-of-channel. Nixops is overall great, by the way, and I am happy it exists.

Problem Description

Most of the complexity in adding a new type of resource is in the create function and concerned with possible differences between our local state and the real state, and what to do in the myriad ways they can diverge.

Take the SQS queue resource, which is relatively simple compared to something like the ec2 backend. The majority of the create logic is contained in these lines:

if self.state == self.UP and (self.queue_name != defn.queue_name or self.region != defn.region):
            self.log("queue definition changed, recreating...")
            self._destroy()
            self._conn = None # necessary if region changed

        if check or self.state != self.UP:

            self.region = defn.region
            self.connect()

            q = self._conn.lookup(defn.queue_name)

            if not q or self.state != self.UP:
                if q:
                    # SQS requires us to wait for 60 seconds to
                    # recreate a queue.
                    self.log("deleting queue ‘{0}’ (and waiting 60 seconds)...".format(defn.queue_name))
                    self._conn.delete_queue(q)
                    time.sleep(61)
                self.log("creating SQS queue ‘{0}’...".format(defn.queue_name))
                q = nixops.ec2_utils.retry(lambda: self._conn.create_queue(defn.queue_name, defn.visibility_timeout), error_codes = ['AWS.SimpleQueueService.QueueDeletedRecently'])

            with self.depl._db:
                self.state = self.UP
                self.queue_name = defn.queue_name
                self.url = q.url
                self.arn = q.get_attributes()['QueueArn']

While not particularly long or impossible to understand, I would argue that even for this simple example it takes a bit to wrap your head around the state logic and prove to yourself there aren't any serious logic errors.

A couple other issues touched on similar issues: #123 and #250.

I think I understand the motivations for having a local state file, and don't argue against it. The point I would like to make is between essential and incidental complexity. The fact that our local state and the state of the world can diverge is an essential complexity of declarative configuration tools. The fact that we implement the logic to converge those states imperatively seems like incidental complexity to me.

One Possible Approach: Deployment Plans

Perhaps one could take inspiration from other declarative languages by keeping the What distinctly separate from the How by introducing some re-usable abstractions for comparing states and generating "plans" for how to converge to the desired state, if possible.

This could help standardize code and the way the ResourceState classes move through their states and when they conflict with options such as allow_reboot,allow_recreate and so forth.

This is a vague idea at the moment, but generating concrete plans has some added benefits, such as:

  • Dependency ordering as a DAG
  • Optimization of complex plans
  • --dry-run takes on more meaning, being able to list the concrete steps nixops will take before a developer or sysadmin executes the deploy. This can prevent a lot of "Oh shit." moments.

It might be worth taking a look at a tool like Terraform (https://www.terraform.io/) that

Hopefully this doesn't come off as excess criticism or unsolicited advice, I just wanted to share my thoughts on the development experience while they were still fresh.

@rbvermaa
Copy link
Member

No worries about giving unsollicited advice, your remarks are very welcome. @Phreedom or @aszlig have suggested similar changes before if I remember correctly.

I have always liked the feature of terraform, that shows exactly what steps it will perform. I would love to see this feature in nixops as well. Will ponder a bit about this feature during my holiday, to think through what issues we would run into.

@danbst
Copy link
Contributor

danbst commented Jul 15, 2015

nixos will benefit too from having "switch plans", for example, when I change fs type for a partition. The particular plan must be infered or stated explicitly in configuration.nix.

@moretea
Copy link
Contributor

moretea commented Jun 14, 2017

I would really like to see the following workflow in nixops

# Manually generate a plan from the current state
nixops plan --output-plan ./plan.nix

# Print the operations in the plan
nixops explain-plan --plan ./plan.nix

# Apply the plan
nixops apply --plan ./plan.nix

The nixops deploy command will run the plan and apply steps after each other automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants