Datalog in Morel
This week we added Datalog support to Morel — not by building a Datalog engine, but by adding a language feature called predicate inversion.
You can now write queries in the Soufflé dialect of Datalog and execute them using Morel’s usual runtime.
This demonstrates that Morel now supports both query paradigms — Datalog’s relational calculus and Morel’s native relational algebra — and you can freely switch between them. But what are these paradigms, and why does it matter?
The two paradigms
The two paradigms originate in set theory, and continue through the relational model into modern query languages.
Set theory provides two ways to define a set: the intensional method defines the set by its properties (for example, “red cars” is the set of all cars whose color is red), and the extensional method creates the set by performing operations on existing sets (intersect the set of all cars with the set of all red objects).
The relational model for databases provides two ways to specify a query which mirror intensional and extensional set definitions. In relational calculus, one specifies the logical properties of the tuples to retrieve from the input relations; in relational algebra, one specifies the input relations and a sequence of operations (intersect, join, filter, project) to apply to them. Codd’s Theorem proves that these languages have equivalent expressive power.
Query languages are generally based on one of those paradigms. SQL is
largely based on algebra (although its EXISTS keyword shows the
influence of calculus). Datalog is based on calculus. Functional
programming languages (including Morel) are in the algebra camp; they
provide relational operators via higher-order functions like map,
filter and reduce, and sometimes provide syntactic sugar like
list-comprehensions.
If the languages are equivalent, why does it matter? The languages have different strengths.
Algebra’s strengths:
- Algebra naturally extends to bags and lists (collections with ordering and/or duplicate values), while calculus only works on sets;
- Aggregate functions are a more natural extension to algebra than calculus;
- Mainstream programming languages are functional or procedural, so there is lower impedance mismatch embedding a query in a program or writing a user-defined function to be called from a query;
- Developers familiar with mainstream programming languages find the calculus paradigm difficult to learn.
Calculus (epitomized by Datalog) excels at graph and deductive queries, such as queries that iterate until they reach a fixed point. As we shall see, it is just easier to write recursive queries if they return a boolean than if they return a complex data type like a set of tuples.
For simple fixed-point queries such as computing the transitive closure of a relation, the algebra query returns a set that is the union of the points that are one step away, two steps away, and so forth. In calculus, the value is boolean: whether there is a path from one point to another.
For more complex fixed-point queries, the algebra programmer must
define a data type with a semilattice structure. Consider, for
example, a query to find all pairs of nodes connected by no more than
five steps. In algebra, the data type is now a set of (source,
destination, distance) triples combined by taking the minimum
distance. In calculus, the data type remains boolean: the result of
the function has_path_within(source, destination, distance). The
boolean function is easier to write, and easier for the query planner
to understand.
Until now, if a programmer had to solve a problem with mixed workload, they would need to switch languages. Because of a new feature called predicate inversion, Morel now supports both paradigms.
The Datalog interface
The following program, in the
Soufflé dialect of Datalog,
computes the transitive closure of an edge relation.
.decl edge(x:number, y:number)
.decl path(x:number, y:number)
edge(1,2).
edge(2,3).
path(X,Y) :- edge(X,Y).
path(X,Z) :- path(X,Y), edge(Y,Z).
.output path
In a graph with nodes 1, 2 and 3, the edge relation defines edges
from 1 → 2 and 2 → 3. The derived path relation says that
there is a path between two nodes if (a) there is an edge, or (b)
there is an edge to an intermediate node and a path from that
intermediate node to the destination node. From the edges {1 →
2, 2 → 3} it deduces the paths {1 → 2, 2 → 3, 1 →
3}.
You can now run the following program from Morel’s shell:
Datalog.execute "
.decl edge(x:int, y:int)
.decl path(x:int, y:int)
edge(1,2).
edge(2,3).
path(X,Y) :- edge(X,Y).
path(X,Z) :- path(X,Y), edge(Y,Z).
.output path";
> val it = {path=[{x=1,y=2},{x=2,y=3},{x=1,y=3}]}
> : {path:{x:int, y:int} list} variant
The program is passed (as a string literal) as an argument to the
Datalog.execute function, and the Soufflé symbol and
number types in the .decl directive have been mapped to Morel
string and int types, but is otherwise unchanged.
(Adding a Datalog structure, with functions execute, translate
and validate, seemed preferable to writing a whole Datalog shell and
testing framework. Facts and rules have the same syntax as
Soufflé, as does the .output directive. The .input
directive, not shown in this example, has a new optional filePath
argument.)
Translating Datalog to Morel
The translation makes concrete the equivalence that Codd’s Theorem promises: each Datalog construct has a direct counterpart in Morel.
One way to support Datalog would have been to implement a Datalog engine, but this would have been a major task and would not have benefited the rest of Morel. Instead, we have extended the Morel language with Datalog-like constructs; this has made the Morel language more powerful, and made Datalog translation straightforward.
The Datalog-to-Morel translator has a structure that will be familiar to anyone who has implemented a compiler that translates a high-level language to a lower-level language. Three steps are executed in succession:
- The parser converts a Datalog string to a parse tree.
- The validator makes sure that the program is valid (that rules are safe, grounded and stratified) and deduces its type.
- The translator generates a Morel program that is equivalent to the Datalog program.
Parsing and validation follow standard patterns, but let’s look at the translation algorithm in a little more detail. Here is the translation to Morel of the earlier Datalog program:
let
val edge_facts = [(1, 2), (2, 3)]
fun edge (x, y) = (x, y) elem edge_facts
fun path (x, y) =
edge (x, y) orelse
(exists v0 where path (x, v0) andalso edge (v0, y))
in
{path = from x, y where path (x, y)}
end
You’ll notice that the Datalog and Morel programs have the same
structure. Datalog rules without a body (such as edge(1,2) and
edge(2,3)) are gathered into a list of tuples (edge_facts).
Each rule becomes a boolean function. If there are several
comma-separated predicates in a rule’s body, they are combined using
andalso. If there are several rules of the same name, their
conditions are combined using orelse. Invocations of a rule become
function calls, which, like rules, may be recursive.
The body of the rule path(X,Z) :- path(X,Y), edge(Y,Z) has a
variable, Y, that does not occur in the head. It is translated to
exists v0.
A Datalog program may have several .output directives. The Morel
program returns a single value, a record with one field for each
directive. This program has one directive, .output path, so the
record has a single field named path that is a list of
{x:int, y:int} records.
How Morel does it
The magic lies not in the Datalog-to-Morel converter but in the Morel language itself. Over the last few months, we have added to Morel a capability called predicate inversion, the ability to deduce a set from a boolean expression.
At the heart of the generated Morel program is a query: from x, y
where path (x, y). It differs from a regular query in that the
variables x and y are unbounded. (In a conventional query,
every variable is bounded, meaning it iterates over a collection, as
do d and e in from d in departments, e in employees.)
In principle, an unbounded variable iterates over every possible value
of its data type. This is fine for “small” data types like boolean,
char, and enum Color { RED | GREEN | BLUE }, but problematic for
“large” data types like int and {b: boolean, i: int} and infinite
data types like string and int list.
Morel allows unbounded variables in a program as long as there is a
predicate like where x > 0 andalso x < 10 or where e elem
employees that connects it with a finite set. Invertible predicates
provide a way to generate the values of the variable. In Datalog
parlance, they ensure that the variable is grounded.
Morel’s predicate inversion algorithm recognizes various predicate
patterns, including boolean functions that check collection membership
(like edge) or compute transitive closure (like path).
Mixing styles
The net result is that predicate inversion allows you to freely mix
Datalog-style queries (defined by boolean expressions and functions)
with the relational algebra-style queries (defined by from,
exists, join and set operations).
The following query is in a hybrid style.
(* Calculus style: recursive reachability *)
fun edge (x, y) = (x, y) elem [(1,2), (2,3), (3,4), (2,4)];
fun reachable (x, y) =
edge (x, y) orelse
exists z where edge (x, z) andalso reachable (z, y);
(* Algebra style: count reachable nodes per source *)
from source in [1, 2, 3, 4]
yield {source,
reachable_count = count (from target
where reachable (source, target))}
The edge and reachable functions define graph reachability in a
Datalog style, using recursion and boolean return values. The from
query is in the algebra style, but uses predicate inversion to
generate all values of the unbounded target variable for which
reachable (source, target) is true. Predicate inversion provides the
junction between the two styles.
Conclusion
Morel now unifies the calculus and algebra styles of writing queries. The new Datalog interface showcases this capability, but you can also use the calculus style in Morel programs, where you can freely mix it with the algebra style and functional programming.
Notice that we have made no claims about the efficiency of the implementation. Our goal was to increase the expressive power of the language, and we have achieved that goal. Morel’s internal representation is algebraic — using relational operators, other operators provided by functions, and iteration to a fixed point — and from this point we can apply conventional query-optimization techniques.
To keep things simple, we have not discussed evaluation models. Datalog uses forward chaining (bottom-up evaluation) while boolean functions give the impression that backwards chaining (top-down evaluation) is being used. For most queries both approaches are valid, and the planner would ideally consider both strategies along with optimizations such as join re-ordering, magic sets, semi-naïve evaluation, and materialized views. But there are queries where the evaluation model matters (say, they would terminate under one model but not another), and for these cases it is important that we define Morel’s operational semantics.
The predicate inversion algorithm needs to evolve and mature. It has been tested over a wide array of queries, but there are still cases where it fails to invert a predicate, or fails to remove a condition that has been fully satisfied by a generator. (We hope to write more about predicate inversion, generators, and subsuming predicates, in a future article.)
Please download Morel and give it a try! (Morel has both Java and Rust versions, but Datalog and predicate inversion require the Java version for now.)
If you have comments, please reply on Bluesky @julianhyde.bsky.social or Twitter:
How we added Datalog support to @morel_lang... and why you might want to just write a Morel query. https://t.co/bws0HF4xHl pic.twitter.com/z6wfZGyamn
— Julian Hyde (@julianhyde) March 9, 2026