In my previous post I explored the basics of Elixir GenServer behaviour, it’s now time to jump on the next big topic: Supervisors.
If GenServers are the building blocks in the OTP world, Supervisors are the glue that keeps them togheter.
First, let’s get a definition of supervisors:
A supervisor is a process which supervises other processes, called child processes. Supervisors are used to build a hierarchical process structure called a supervision tree, a nice way to structure fault-tolerant applications.
And that’s all there is to it! Supervisors are used to spawn processes (tipically GenServers), to manage them, to restart them when they crash, to handle errors.
Supervisor is another OTP behaviour, just like GenServer, in fact it’s built on top of it. Why do we need supervisors in the first place? Because processes can crash for any reason, and in the concurrent world it’s better to handle failure and recover from it rather than let the whole application crash.
Try this, start your iex session and create a linked process that keeps waiting indefinitely. We then ask if the process is alive, and check the pid number of the current iex main process:
pid = spawn_link fn -> receive do end end
#PID<0.59.0>
Process.alive? pid
true
current = self
#PID<0.68.0>
Now, let’s see what happens when we kill the process:
Process.exit pid, :kill
** (EXIT from #PID<0.68.0>) killed
self
#PID<0.74.0>
See how the pid of the iex process has changed. What happened? Killing the linked process took down the iex main process as well (which was actually restarted behind the scenes, but you can assume that if the iex process were your application, it would be dead right now).
It’s now time to build our first supervisor. I promise it’s going to be easy. Let’s go barebone first:
defmodule A.Supervisor do
use Supervisor
def start_link do
Supervisor.start_link __MODULE__, :ok
end
def init _ do
children = []
supervise children, strategy: :one_for_all
end
end
{:ok, sup} = A.Supervisor.start_link
{:ok, #PID<0.103.0>}
Supervisor.count_children sup
%{active: 0, specs: 0, supervisors: 0, workers: 0}
Just like with GenServer, we define a start_link
function that eventually calls the init
function. The init function handles all the gory details of the supervision, right now we want to start simple so there is nothing to supervise yet, so the function looks quite empty (children
is an empty list).
The function Supervisor.count_children
confirms that this supervisor is not very busy at the moment, it is supervising nothing: no worker
(that’s what generic supervised processes are called) and no supervisor
(because supervisors can supervise other supervisors as well) yet.
Still, we already decided a very important behavior of this supervisor: its children are supervised with the one_for_all
strategy, which means that, if one of its child processes dies, then all the others will die as well. But all of them will be restarted soon enough.
Let’s define a simple GenServer:
defmodule A.Worker do
use GenServer
def start_link value \ "" do
GenServer.start_link __MODULE__, value
end
def state pid do
GenServer.call pid, :state
end
def handle_call :state, _, state do
{:reply, state, state}
end
end
It’s very basic, we can start it with an explicit state if we’re not happy with the default empty string and query for its state:
{:ok, pid} = A.Worker.start_link "my value"
{:ok, #PID<0.125.0>}
A.Worker.state pid
"my value"
That’s it. In order to add a A.Worker child to the supervisor we need to build a worker specification, which is basically a tuple of tuples that contains the “recipe” for spawning the child. Luckily enough, Supervisor.Spec
knows how to build this contrieved tuple:
import Supervisor.Spec
spec = worker(A.Worker, ["first value"])
{A.Worker, {A.Worker, :start_link, []}, :permanent, 5000, :worker, [A.Worker]}
At a minimum the required ingredients for the “recipe” are the module name (A.Worker
) and the arguments list ("first value"
).
The most notable information included in the tuple are:
:start_link
: the function of the module A.Worker that will be called for bootstrapping the worker:permanent
: means that if the child dies, it will be respawned (the other option:temporary
won’t restart the process)5000
: milliseconds that must pass between one restart and the other
You can find a more detailed explanation here.
With our specification we’re now ready to create our child:
{:ok, worker} = Supervisor.start_child sup, spec
{:ok, #PID<0.133.0>}
Supervisor.count_children sup
%{active: 1, specs: 1, supervisors: 0, workers: 1}
Supervisor.which_children sup
[{A.Worker, #PID<0.133.0>, :worker, [A.Worker]}]
We can see now that 1 worker is reported and active.
It’s time to see our supervisor do its job, I mean restart the supervised process, when it dies:
Process.exit worker, :kill
true
Process.alive? worker
false
Supervisor.count_children sup
%{active: 1, specs: 1, supervisors: 0, workers: 1}
Supervisor.which_children sup
[{A.Worker, #PID<0.146.0>, :worker, [A.Worker]}]
As you can see, after killing the worker process the supervisor restarted another worker to take its place, this is confirmed by the different pid number (146 instead of the previous 133).
What happens if we try to add a new worker with the same specification? We get an error, because we can’t reuse it with the strategy we chose. We can work around this by supplying a different :id
identifier (or a different :name):
{:ok, worker} = Supervisor.start_child sup, spec
** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.146.0>}}
spec = worker(A.Worker, [], id: :extra_worker)
{:extra_worker, {A.Worker, :start_link, []}, :permanent, 5000, :worker, [A.Worker]}
{:ok, worker} = Supervisor.start_child sup, spec
{:ok, #PID<0.153.0>}
Supervisor.count_children sup
%{active: 2, specs: 2, supervisors: 0, workers: 2}
There is a more proper way when we need to supervise multiple processes from the same module. Let’s creare a new supervisor module that can handle transparently all the A.Worker processes we want:
defmodule A.WorkerSupervisor do
use Supervisor
def start_link do
Supervisor.start_link __MODULE__, :ok
end
def init _ do
children = [worker(A.Worker, [])]
supervise children, strategy: :simple_one_for_one
end
end
{:ok, w_sup} = A.WorkerSupervisor.start_link
{:ok, pid} = Supervisor.start_child w_sup, ["value"]
A.Worker.state pid
"value"
Supervisor.start_child w_sup, []
Supervisor.start_child w_sup, []
Supervisor.count_children w_sup
%{active: 3, specs: 1, supervisors: 0, workers: 3}
This time the supevisor defines upfront the worker
specification inside the init
function. The strategy we choose, :simple_one_for_one
, allows us to create as many children as we want for this supervisor, that’s because this strategy assumes we’re going to use always the very same specification. And by the way, this looks very much like a Factory object in OOP languages.
In the code above, after creating the supervisor, we add one child with a given value, then we check for its value to be there. A few lines later we add 2 more children, and we verify via count_children
that 3 active workers exist and they’re all with the same specification (specs: 1
).
Let’s go back to our first supervisor, the one from the A.Supervisor module. It currently has 2 children from the same module:
Supervisor.count_children sup
[{:extra_worker, #PID<0.153.0>, :worker, [A.Worker]},
{A.Worker, #PID<0.146.0>, :worker, [A.Worker]}]
We said we could supervise supervisors as well, thus creating a supervision tree, so let’s do it:
Process.exit w_sup, :normal
spec = Supervisor.Spec.supervisor A.WorkerSupervisor, []
{A.WorkerSupervisor, {A.WorkerSupervisor, :start_link, []}, :permanent,
:infinity, :supervisor, [A.WorkerSupervisor]}
{:ok, w_sup} = Supervisor.start_child sup, spec
{:ok, #PID<0.177.0>}
Supervisor.count_children sup
%{active: 3, specs: 3, supervisors: 1, workers: 2}
Just like before, we created a new child for our A.Supervisor process, but this time we used the supervisor
specification instead of the previous worker
, of course that’s because this time we needed to create a child of type supervisor.
count_children
confirms that we now have 3 active supervised processes, and one is a supervisor type.
Of course we can attach as many workers as we want to our w_sup supervisor:
for i <- 0..10 do
Supervisor.start_child w_sup, [i]
end
Supervisor.count_children w_sup
%{active: 11, specs: 1, supervisors: 0, workers: 11}
And that’s all for now, I hope you enjoyed this introduction to supervisors!
Leave a Reply