GenServers please behave, or I’ll call the Supervisor

In my previous post I explored the basics of Elixir GenServer behaviour, it’s now time to jump on the next big topic: Supervisors.

If GenServers are the building blocks in the OTP world, Supervisors are the glue that keeps them togheter.

First, let’s get a definition of supervisors:
A supervisor is a process which supervises other processes, called child processes. Supervisors are used to build a hierarchical process structure called a supervision tree, a nice way to structure fault-tolerant applications.

And that’s all there is to it! Supervisors are used to spawn processes (tipically GenServers), to manage them, to restart them when they crash, to handle errors.

Supervisor is another OTP behaviour, just like GenServer, in fact it’s built on top of it. Why do we need supervisors in the first place? Because processes can crash for any reason, and in the concurrent world it’s better to handle failure and recover from it rather than let the whole application crash.

Try this, start your iex session and create a linked process that keeps waiting indefinitely. We then ask if the process is alive, and check the pid number of the current iex main process:


pid = spawn_link fn -> receive do end end
#PID<0.59.0>

Process.alive? pid
true

current = self
#PID<0.68.0>

Now, let’s see what happens when we kill the process:


Process.exit pid, :kill
** (EXIT from #PID<0.68.0>) killed

self
#PID<0.74.0>

See how the pid of the iex process has changed. What happened? Killing the linked process took down the iex main process as well (which was actually restarted behind the scenes, but you can assume that if the iex process were your application, it would be dead right now).

It’s now time to build our first supervisor. I promise it’s going to be easy. Let’s go barebone first:


defmodule A.Supervisor do
  use Supervisor

  def start_link do
    Supervisor.start_link __MODULE__, :ok
  end

  def init _ do
    children = []
    supervise children, strategy: :one_for_all
  end
end

{:ok, sup} = A.Supervisor.start_link
{:ok, #PID<0.103.0>}

Supervisor.count_children sup
%{active: 0, specs: 0, supervisors: 0, workers: 0}

Just like with GenServer, we define a start_link function that eventually calls the init function. The init function handles all the gory details of the supervision, right now we want to start simple so there is nothing to supervise yet, so the function looks quite empty (children is an empty list).

The function Supervisor.count_children confirms that this supervisor is not very busy at the moment, it is supervising nothing: no worker (that’s what generic supervised processes are called) and no supervisor (because supervisors can supervise other supervisors as well) yet.

Still, we already decided a very important behavior of this supervisor: its children are supervised with the one_for_all strategy, which means that, if one of its child processes dies, then all the others will die as well. But all of them will be restarted soon enough.

Let’s define a simple GenServer:


defmodule A.Worker do
  use GenServer

  def start_link value \ "" do
    GenServer.start_link __MODULE__, value
  end

  def state pid do
    GenServer.call pid, :state
  end

  def handle_call :state, _, state do
    {:reply, state, state}
  end
end

It’s very basic, we can start it with an explicit state if we’re not happy with the default empty string and query for its state:


{:ok, pid} = A.Worker.start_link "my value"
{:ok, #PID<0.125.0>}

A.Worker.state pid
"my value"

That’s it. In order to add a A.Worker child to the supervisor we need to build a worker specification, which is basically a tuple of tuples that contains the “recipe” for spawning the child. Luckily enough, Supervisor.Spec knows how to build this contrieved tuple:


import Supervisor.Spec

spec = worker(A.Worker, ["first value"])
{A.Worker, {A.Worker, :start_link, []}, :permanent, 5000, :worker, [A.Worker]}

At a minimum the required ingredients for the “recipe” are the module name (A.Worker) and the arguments list ("first value").
The most notable information included in the tuple are:

  • :start_link: the function of the module A.Worker that will be called for bootstrapping the worker
  • :permanent: means that if the child dies, it will be respawned (the other option :temporary won’t restart the process)
  • 5000: milliseconds that must pass between one restart and the other

You can find a more detailed explanation here.
With our specification we’re now ready to create our child:


{:ok, worker} = Supervisor.start_child sup, spec
{:ok, #PID<0.133.0>}

Supervisor.count_children sup
%{active: 1, specs: 1, supervisors: 0, workers: 1}
Supervisor.which_children sup                        
[{A.Worker, #PID<0.133.0>, :worker, [A.Worker]}]

We can see now that 1 worker is reported and active.

It’s time to see our supervisor do its job, I mean restart the supervised process, when it dies:


Process.exit worker, :kill
true

Process.alive? worker
false

Supervisor.count_children sup
%{active: 1, specs: 1, supervisors: 0, workers: 1}
Supervisor.which_children sup
[{A.Worker, #PID<0.146.0>, :worker, [A.Worker]}]

As you can see, after killing the worker process the supervisor restarted another worker to take its place, this is confirmed by the different pid number (146 instead of the previous 133).

What happens if we try to add a new worker with the same specification? We get an error, because we can’t reuse it with the strategy we chose. We can work around this by supplying a different :id identifier (or a different :name):


{:ok, worker} = Supervisor.start_child sup, spec
** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.146.0>}}

spec = worker(A.Worker, [], id: :extra_worker)
{:extra_worker, {A.Worker, :start_link, []}, :permanent, 5000, :worker, [A.Worker]}
{:ok, worker} = Supervisor.start_child sup, spec
{:ok, #PID<0.153.0>}

Supervisor.count_children sup                   
%{active: 2, specs: 2, supervisors: 0, workers: 2}

There is a more proper way when we need to supervise multiple processes from the same module. Let’s creare a new supervisor module that can handle transparently all the A.Worker processes we want:


defmodule A.WorkerSupervisor do
  use Supervisor

  def start_link do
    Supervisor.start_link __MODULE__, :ok
  end

  def init _ do
    children = [worker(A.Worker, [])]
    supervise children, strategy: :simple_one_for_one
  end
end

{:ok, w_sup} = A.WorkerSupervisor.start_link
{:ok, pid} = Supervisor.start_child w_sup, ["value"]

A.Worker.state pid
"value"

Supervisor.start_child w_sup, []
Supervisor.start_child w_sup, []

Supervisor.count_children w_sup
%{active: 3, specs: 1, supervisors: 0, workers: 3}

This time the supevisor defines upfront the worker specification inside the init function. The strategy we choose, :simple_one_for_one, allows us to create as many children as we want for this supervisor, that’s because this strategy assumes we’re going to use always the very same specification. And by the way, this looks very much like a Factory object in OOP languages.

In the code above, after creating the supervisor, we add one child with a given value, then we check for its value to be there. A few lines later we add 2 more children, and we verify via count_children that 3 active workers exist and they’re all with the same specification (specs: 1).

Let’s go back to our first supervisor, the one from the A.Supervisor module. It currently has 2 children from the same module:


Supervisor.count_children sup
 [{:extra_worker, #PID<0.153.0>, :worker, [A.Worker]},
 {A.Worker, #PID<0.146.0>, :worker, [A.Worker]}]

We said we could supervise supervisors as well, thus creating a supervision tree, so let’s do it:


Process.exit w_sup, :normal

spec = Supervisor.Spec.supervisor A.WorkerSupervisor, []
{A.WorkerSupervisor, {A.WorkerSupervisor, :start_link, []}, :permanent,
 :infinity, :supervisor, [A.WorkerSupervisor]}
 
{:ok, w_sup} = Supervisor.start_child sup, spec
{:ok, #PID<0.177.0>}

Supervisor.count_children sup
%{active: 3, specs: 3, supervisors: 1, workers: 2}

Just like before, we created a new child for our A.Supervisor process, but this time we used the supervisor specification instead of the previous worker, of course that’s because this time we needed to create a child of type supervisor.

count_children confirms that we now have 3 active supervised processes, and one is a supervisor type.

Of course we can attach as many workers as we want to our w_sup supervisor:


for i <- 0..10 do
  Supervisor.start_child w_sup, [i]
end

Supervisor.count_children w_sup
%{active: 11, specs: 1, supervisors: 0, workers: 11}

And that’s all for now, I hope you enjoyed this introduction to supervisors!

Leave a Reply

Please Login to comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.