Use Elixir/OTP for the Main Cloud Host Runtime

The activity being proposed would be replacing all of the high-level dispatch code written in Actix and Rust with Elixir/OTP. The new Elixir host runtime would replace the previous, and would re-use as much of the existing low-level code as possible through the use of native interaction with Rust crates (via NIFs).

ℹ️ NOTE

The chronology for this ADR does not correspond to its number. This ADR was migrated from an RFC from June of 2021.

Context and Problem Statement

There are a number of motivations for this, each having varying degrees of urgency. The main motivation is that the current codebase, despite having been rewritten to be more easily consumed and contributed to, is difficult to understand and even more difficult to add new features. Decoding the spaghetti/web of actix interactions makes it so that, even though this runtime is better than the old one, it still remains entirely inscrutable to all but the original project creator and lead maintainer. In short, no one is contributing to the core because no one wants to contribute to the core because it’s too much work and imposes too high of a cognitive burden.

Another motivation is related to hindsight. As we’ve been adding more and more dispatch and communication-related features, we’ve been asking ourselves, “Are we building just another Kubernetes?”. Thankfully, the answer continues to be “no”. On the other hand, from a certain point of view we could easily say that what we’re building looks an awful lot like OTP. The philosophy here is that if we’re building “our own” OTP, then we should stop doing so and use the real OTP to gain the benefit of its decades of increasing maturity, thriving ecosystem, and production deployment in some of the largest and most scalable environments in the industry.

Considered Options

One of our biggest rationales is the deletion of code. By deciding that what we’ve done is build a pretty poor impersonation of OTP, we get to delete all of that code and rely on OTP, thereby following our own philosophy of eschewing boilerplate in all its forms.

There are very few, if any, alternatives to this. We could continue with the existing code base, but evidence shows we will continue to have stagnant or non-existent contributions to the host core. We could rewrite the Rust code base to use a different actor framework, but we feel we’d be back in the same problem as we are now, the rewrite just delaying more pain until the future.

Adding new and powerful features today is untenable for all but the contributors who wrote the core in the first place. This is not a strategy for open source success or community building. We think the elegant, declarative syntax of Elixir and the dramatic reduction in code will make the core more approachable and thus more likely to draw contributions and constructive feedback in the form of issues.

The following options were considered for this ADR:

Continue with Rust
Use Another Framework
Switch to Elixir/OTP

Continue with Actix+Rust

As mentioned, the main drawback with continuing with Actix+Rust is that the codebase will continue to become more complicated, more spaghetti-ridden, and more difficult to contribute to and maintain.

Use Another Framework

Another option is to adopt a different framework other than Elixir/OTP/BEAM. An incomplete list of other potential solutions included:

Akka
Orleans
Bastion - barely maintained at the moment
CAF - C++-based, and therefore incurs far too big a complexity and lack of safety penalty. Also hinders potential contributions by limiting the audience to C++.
Pony - very limited audience, very difficult to find contributors, and lack of access to robust community set of libraries.

In all of these cases, we felt that the benefits of those frameworks did not outweigh the cost of embracing them and in some cases the frameworks wouldn’t support the kind of features we need in the future.

Switch to Elixir/OTP

This design would include several important facets:

NATS becomes mandatory. In order for us to support polyglot providers we need to be able to communicate with them using a reliable, free, and highly performant mechanism. NATS was chosen because of our existing investment in it for lattice support.
The wasmcloud binary, which is basically a CLI designed to be stopped and started, gets replaced by an OTP application (or application suite) designed to run as a daemon/background process much like a database or other server. This process is always up, always available, and remotely controllable via CLI and web application dashboard.
Existing integrations with the wasmcloud-host crate would be required to switch to using remote control over lattice. This includes the krustlet provider.

Drawbacks The most obvious drawback to this solution is the loss of the wasmcloud-host crate. This is the high-level functionality for embedders to insert a running wasmCloud host inside their own process. This would no longer be available and integrators would have to use lattice for remote control/interaction rather than crate embedding.

NATS becomes mandatory, which has the potential drawback of making the installation process a little more time-consuming. We think there are a number of ways to mitigate this, but it’s still listed here as a drawback for completeness.

Sasa Juric’s talk on the Soul of Erlang

Decision Outcome

We chose to go with Elixir/OTP, putting the new host runtime here.

adr

Journal of Architectural Decision Records made for the project