Industrial-strength Deployments in Three Commands

Posted on 22 August 2019

Tags: nix, programming, devops

If your deployment target is running NixOS, a full-system deployment is only three commands:

$ nix-copy-closure --to --use-substitutes <target> <path>                                #1
$ ssh <target> -- "sudo nix-env --profile /nix/var/nix/profiles/system --set <path>"     #2
$ ssh <target> -- "sudo /nix/var/nix/profiles/system/bin/switch-to-configuration switch" #3

Here’s what each command does:

Copies the transitive closure of the new system configuration to the target, using binary caches (--use-substitutes) where possible.
Sets the current system profile to the new system configuration. This isn’t strictly necessary, but allows us to roll back to this configuration later.
Switches to the new system configuration.

This workflow has been described before in Typeclasses and by Gabriella Gonzalez, but I thought one more post demonstrating how to use these commands wouldn’t hurt. Since the AWS use case has been covered so thoroughly by Typeclasses, I’m going to use the packet.net cloud instead.

Provisioning

I logged on to the Packet console and launched a t1.small.x86 instance running NixOS 19.03 (the latest as of this writing). It was assigned the IP address 147.75.38.113. Since I added my SSH keys when I first created my Packet account, I was able to SSH into this instance at root@147.75.38.113 without any further configuration.

Copying the existing configuration

The next step is to copy the existing configuration, especially instance-specific hardware configuration:

$ scp -r root@147.75.38.113:/etc/nixos/* .

There’s probably a better way to do this, but for a quick one-off demonstration this is fine. Here’s the commit adding those files.

We’ll only be making changes to configuration.nix, which for me looks like this (after all commented-out lines have been removed):

{ config, pkgs, ... }:

{
  imports =
    [
      ./packet.nix
    ];

  boot.loader.grub.enable = true;
  boot.loader.grub.version = 2;

  system.stateVersion = "19.03";

}

Building a system closure

The Nix expression to build a whole system is pretty straightforward (as described in the Typeclasses article):

let
  nixos = import <nixpkgs/nixos> {
    configuration = import ./configuration.nix;
  };
in
  nixos.system

but this doesn’t provide any way of pinning nixpkgs. Another way (as described by Gabriella Gonzalez), is to explicitly depend on a particular revision of nixpkgs:

let
  nixpkgs = builtins.fetchTarball {
    url = "https://github.com/NixOS/nixpkgs/archive/b74b1cdb2fecc31ff7a127c5bc89771f887c93bb.tar.gz";
    sha256 = "0ncr4g29220amqm4riaa1xf4jz55v2nmh9fi16f1gzhww1gplk8h";
  };

in
  import "${nixpkgs}/nixos" {
    configuration = {
      imports = [
        /etc/nixos/configuration.nix
      ];
    };

    system = "x86_64-linux";
  }

but the downside there is that there’s no automated way to update the revision of nixpkgs. I have my own approach to pinning nixpkgs, where I have a versions.json that stores version information:

{
  "nixpkgs": {
    "owner": "NixOS",
    "repo": "nixpkgs-channels",
    "branch": "nixos-19.03",
    "rev": "77295b0bd26555c39a1ba9c1da72dbdb651fd280",
    "sha256": "18v866h12xk6l1s37nk1vns869pvzphmnnlhrnm2b1zklg2hd1nq"
  }
}

and a script that uses jq to update this file. My (slightly more complex) expression then looks like this:

default.nix

let
  fetcher = { owner, repo, rev, sha256, ... }: builtins.fetchTarball {
    inherit sha256;
    url = "https://github.com/${owner}/${repo}/tarball/${rev}";
  };
  nixpkgs = fetcher (builtins.fromJSON (builtins.readFile ./versions.json)).nixpkgs;
  nixos = import "${nixpkgs}/nixos" {
    configuration = import ./configuration.nix;
  };
in
  nixos.system

and this allows me to both be explicit about nixpkgs as well as easily update it when necessary. Here’s the commit that adds those files.

Building the closure locally is also straightforward (as described here):

$ nix-build --no-out-link default.nix

Deploying the system closure

With all of our prerequisites taken care of, deploying the system closure is straightforward:

deploy.sh

#!/usr/bin/env bash

set -euxo pipefail

TARGET="root@147.75.38.113"

PROFILE_PATH="$(nix-build --no-out-link default.nix)"
nix-copy-closure --to --use-substitutes $TARGET $PROFILE_PATH
ssh $TARGET -- "nix-env --profile /nix/var/nix/profiles/system --set $PROFILE_PATH && /nix/var/nix/profiles/system/bin/switch-to-configuration switch"

This takes care of both building the new system closure and deploying it.

Here’s the commit that adds deploy.sh.

Adding a service

Let’s deploy the final version of the small Haskell web service from my Functional DevOps post. The application consists of two files:

Main.hs

 {-# LANGUAGE OverloadedStrings #-}

import Web.Scotty
import System.Environment (getArgs)

import Data.Monoid (mconcat)

main = getArgs >>= \(port:_) -> scotty (read port) $ do
    get "/:word" $ do
        beam <- param "word"
        html $ mconcat ["<h1>Scotty, ", beam, " me up!</h1>"]

blank-me-up.cabal

name:                blank-me-up
version:             0.1.0.0
license:             BSD3
build-type:          Simple
cabal-version:       >=1.10

executable blank-me-up
  main-is:             Main.hs
  build-depends:       base >=4.9 && <5
                     , scotty
  default-language:    Haskell2010

and the Nix service is one file:

service.nix

{ config, lib, pkgs, ... }:

let
  blank-me-up = pkgs.haskellPackages.callCabal2nix "blank-me-up" ../app {};
  cfg = config.services.blank-me-up;
in {
  options.services.blank-me-up.enable = lib.mkEnableOption "Blank Me Up";
  options.services.blank-me-up.port = lib.mkOption {
    default = 3000;
    type = lib.types.int;
  };

  config = lib.mkIf cfg.enable {
    networking.firewall.allowedTCPPorts = [ cfg.port ];

    systemd.services.blank-me-up = {
      description = "Blank Me Up";
      after = [ "network.target" ];
      wantedBy = [ "multi-user.target" ];
      serviceConfig = {
        ExecStart = "${blank-me-up}/bin/blank-me-up ${toString cfg.port}";
        Restart = "always";
        KillMode = "process";
      };
    };
  };
}

For more information about what’s happening in service.nix, see the relevant section of my Functional DevOps post.

Here’s the commit that adds these files.

Enabling the service is as easy as adding two lines to configuration.nix:

configuration.nix

{ config, pkgs, ... }:

{
  imports =
    [
      ./packet.nix
      ./deploy/nix/service.nix        #1
    ];

  boot.loader.grub.enable = true;
  boot.loader.grub.version = 2;

  services.blank-me-up.enable = true; #2

  system.stateVersion = "19.03";

}

Here’s the commit that makes that change.

Deploying the service

$ ./deploy.sh
+ TARGET=root@147.75.38.113
++ nix-build --no-out-link default.nix
+ PROFILE_PATH=/nix/store/<hash>-nixos-system-nixos-19.03pre-git
+ nix-copy-closure --to --use-substitutes root@147.75.38.113 /nix/store/<hash>-nixos-system-nixos-19.03pre-git
<...>
+ ssh root@147.75.38.113 -- 'nix-env --profile /nix/var/nix/profiles/system --set /nix/store/<hash>-nixos-system-nixos-19.03pre-git && /nix/var/nix/profiles/system/bin/switch-to-configuration switch'
updating GRUB 2 menu...
activating the configuration...
setting up /etc...
reloading user units for root...
setting up tmpfiles

$ curl http://147.75.38.113:3000/beam
<h1>Scotty, beam me up!</h1>

But this is just a janky bash script!!???

It’s definitely the case that deploy.sh is short and unsophisticated, but the three commands it invokes are what’s really important here. Once you begin looking for them, you will find them everywhere, since they’re the best way of deploying to NixOS! They’re used in NixOps, nix-deploy, and obelisk, and a quick GitHub search for “switch-to-configuration” turns up many more examples. At a previous job, our deployment platform used these three commands as well, and we routinely deployed to hundreds of servers without any deployment-related issues, so I’m comfortable saying that this is an industrial-grade deployment solution.

What about provisioning?

These tools don’t care how you provision your servers, as long as you end up with NixOS targets you can SSH into. For quick demonstrations and small deployments, manual provisioning is fine, but for anything beyond that, I’d recommend using a tool like Terraform. You can even specify your Terraform configuration with Nix using something like terranix, and this is in fact what we did at the previous job I mentioned earlier, since Nix makes a great templating language and comes with excellent support for producing JSON which can then be fed into Terraform. It’s also possible to output YAML from Nix, which means it’s easy to interoperate with most infrastructure tooling.

Should I use this instead of my current deployment solution?

My aim with this post is not to convince you to drop whatever you’re currently using in favour of a hand-rolled bash script and NixOS, especially if your current solution works well for you. I do, however, want to encourage you to think about how the process I’ve outlined here compares. In which ways is it better or worse?

Since this is the workflow I’ve had the most experience with, it was a rude shock to start working with container-based deployments where even tiny changes require a full (slow) rebuild, and the actual deployment lifecycle is more complex and error-prone. I think it’s important to point out that things don’t have to be this way.

In my Functional DevOps post, I outlined some characteristics of an ideal DevOps workflow, and I think the process I’ve outlined here meets them all:

Automatic: The process is completely scriptable.
Repeatable: I can leverage NixOS to get the same results every time.
Idempotent: Deploying the same thing a second time is a no-op.
Reversible: Rolling back is very easy.
Atomic: A deploy either fails or succeeds, there’s no weird in-between¹.

I think this is pretty great for three commands. I hope this blog post can help move us towards better systems by making this corner of NixOS more approachable!

Thanks to Brian Hicks for comments and feedback!

As ElvishJerricco points out on Reddit, this isn’t quite true in the case of broken services.↩︎