<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="http://blog.hydromatic.net/feed.xml" rel="self" type="application/atom+xml" /><link href="http://blog.hydromatic.net/" rel="alternate" type="text/html" /><updated>2026-03-23T21:33:46-07:00</updated><id>http://blog.hydromatic.net/feed.xml</id><title type="html">Julian Hyde on Streaming Data, Open Source OLAP. And stuff.</title><subtitle>Julian Hyde&apos;s blog</subtitle><entry><title type="html">Datalog in Morel</title><link href="http://blog.hydromatic.net/2026/03/09/datalog-in-morel.html" rel="alternate" type="text/html" title="Datalog in Morel" /><published>2026-03-09T13:00:00-07:00</published><updated>2026-03-09T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2026/03/09/datalog-in-morel</id><content type="html" xml:base="http://blog.hydromatic.net/2026/03/09/datalog-in-morel.html"><![CDATA[<p>This week we
<a href="https://github.com/hydromatic/morel/commit/62581437ac9c8dc415b159fdc9d6abc7eb588e9a">added Datalog support</a>
to Morel — not by building a Datalog engine,
but by adding a language feature called predicate inversion.</p>

<p>You can now write queries in the
<a href="https://souffle-lang.github.io/">Soufflé</a> dialect of Datalog
and execute them using Morel’s usual runtime.</p>

<p>This demonstrates that Morel now supports both query paradigms —
Datalog’s relational calculus and Morel’s native relational algebra
— and you can freely switch between them. But what are these
paradigms, and why does it matter?</p>

<h2 id="the-two-paradigms">The two paradigms</h2>

<p>The two paradigms originate in set theory, and continue
through the relational model into modern query languages.</p>

<p>Set theory provides two ways to define a set: the <strong>intensional</strong>
method defines the set by its properties (for example, “red cars” is
the set of all cars whose color is red), and the <strong>extensional</strong>
method creates the set by performing operations on existing sets
(intersect the set of all cars with the set of all red objects).</p>

<p>The relational model for databases provides two ways to specify a
query which mirror intensional and extensional set definitions. In
<strong>relational calculus</strong>, one specifies the logical properties of the
tuples to retrieve from the input relations; in <strong>relational
algebra</strong>, one specifies the input relations and a sequence of
operations (intersect, join, filter, project) to apply to them.
<a href="https://en.wikipedia.org/wiki/Codd%27s_theorem">Codd’s Theorem</a>
proves that these languages have equivalent expressive power.</p>

<p>Query languages are generally based on one of those paradigms. SQL is
largely based on algebra (although its <code class="language-plaintext highlighter-rouge">EXISTS</code> keyword shows the
influence of calculus). Datalog is based on calculus. Functional
programming languages (including Morel) are in the algebra camp; they
provide relational operators via higher-order functions like <code class="language-plaintext highlighter-rouge">map</code>,
<code class="language-plaintext highlighter-rouge">filter</code> and <code class="language-plaintext highlighter-rouge">reduce</code>, and sometimes provide syntactic sugar like
list-comprehensions.</p>

<p>If the languages are equivalent, why does it matter? The languages
have different strengths.</p>

<p>Algebra’s strengths:</p>
<ul>
  <li>Algebra naturally extends to <strong>bags and lists</strong> (collections with
ordering and/or duplicate values), while calculus only works on
sets;</li>
  <li><strong>Aggregate functions</strong> are a more natural extension to algebra
than calculus;</li>
  <li>Mainstream programming languages are functional or procedural, so
there is lower <strong>impedance mismatch</strong> embedding a query in a
program or writing a user-defined function to be called from a
query;</li>
  <li>Developers familiar with mainstream programming languages
find the calculus paradigm <strong>difficult to learn</strong>.</li>
</ul>

<p>Calculus (epitomized by Datalog) excels at graph and deductive
queries, such as queries that iterate until they reach a fixed
point. As we shall see, it is just easier to write recursive queries
if they return a boolean than if they return a complex data type like
a set of tuples.</p>

<p>For simple fixed-point queries such as computing the transitive
closure of a relation, the algebra query returns a set that is the
union of the points that are one step away, two steps away, and so
forth. In calculus, the value is boolean: whether there is a path from
one point to another.</p>

<p>For more complex fixed-point queries, the algebra programmer must
define a data type with a semilattice structure. Consider, for
example, a query to find all pairs of nodes connected by no more than
five steps. In algebra, the data type is now a set of <code class="language-plaintext highlighter-rouge">(source,
destination, distance)</code> triples combined by taking the minimum
distance. In calculus, the data type remains boolean: the result of
the function <code class="language-plaintext highlighter-rouge">has_path_within(source, destination, distance)</code>. The
boolean function is easier to write, and easier for the query planner
to understand.</p>

<p>Until now, if a programmer had to solve a problem with mixed workload,
they would need to switch languages. Because of a new feature called
predicate inversion, Morel now supports both paradigms.</p>

<h2 id="the-datalog-interface">The Datalog interface</h2>

<p>The following program, in the
<a href="https://souffle-lang.github.io/">Soufflé</a> dialect of Datalog,
computes the transitive closure of an <code class="language-plaintext highlighter-rouge">edge</code> relation.</p>

<div class="language-prolog highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">.</span><span class="ss">decl</span> <span class="ss">edge</span><span class="p">(</span><span class="ss">x</span><span class="o">:</span><span class="ss">number</span><span class="p">,</span> <span class="ss">y</span><span class="o">:</span><span class="ss">number</span><span class="p">)</span>
<span class="p">.</span><span class="ss">decl</span> <span class="ss">path</span><span class="p">(</span><span class="ss">x</span><span class="o">:</span><span class="ss">number</span><span class="p">,</span> <span class="ss">y</span><span class="o">:</span><span class="ss">number</span><span class="p">)</span>
<span class="ss">edge</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">).</span>
<span class="ss">edge</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">).</span>
<span class="ss">path</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">edge</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">).</span>
<span class="ss">path</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Z</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">path</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">edge</span><span class="p">(</span><span class="nv">Y</span><span class="p">,</span><span class="nv">Z</span><span class="p">).</span>
<span class="p">.</span><span class="ss">output</span> <span class="ss">path</span>
</code></pre></div></div>

<p>In a graph with nodes 1, 2 and 3, the <code class="language-plaintext highlighter-rouge">edge</code> relation defines edges
from 1 → 2 and 2 → 3. The derived <code class="language-plaintext highlighter-rouge">path</code> relation says that
there is a path between two nodes if (a) there is an edge, or (b)
there is an edge to an intermediate node and a path from that
intermediate node to the destination node. From the edges {1 →
2, 2 → 3} it deduces the paths {1 → 2, 2 → 3, 1 →
3}.</p>

<p>You can now run the following program from Morel’s shell:</p>

<!-- morel
Datalog.execute "
.decl edge(x:int, y:int)
.decl path(x:int, y:int)
edge(1,2).
edge(2,3).
path(X,Y) :- edge(X,Y).
path(X,Z) :- path(X,Y), edge(Y,Z).
.output path";
> val it = {path=[{x=1,y=2},{x=2,y=3},{x=1,y=3}]}
>   : {path:{x:int, y:int} list} variant
-->

<div class="code-block">
<div class="code-input"><span class="nn">Datalog</span><span class="p">.</span><span class="n">execute</span> <span class="s2">"
.decl edge(x:int, y:int)
.decl path(x:int, y:int)
edge(1,2).
edge(2,3).
path(X,Y) :- edge(X,Y).
path(X,Z) :- path(X,Y), edge(Y,Z).
.output path"</span><span class="p">;</span></div>
<div class="code-output">val it = {path=[{x=1,y=2},{x=2,y=3},{x=1,y=3}]}
  : {path:{x:int, y:int} list} variant</div>
</div>

<p>The program is passed (as a string literal) as an argument to the
<code class="language-plaintext highlighter-rouge">Datalog.execute</code> function, and the Soufflé <code class="language-plaintext highlighter-rouge">symbol</code> and
<code class="language-plaintext highlighter-rouge">number</code> types in the <code class="language-plaintext highlighter-rouge">.decl</code> directive have been mapped to Morel
<code class="language-plaintext highlighter-rouge">string</code> and <code class="language-plaintext highlighter-rouge">int</code> types, but is otherwise unchanged.</p>

<p>(Adding a <code class="language-plaintext highlighter-rouge">Datalog</code> structure, with functions <code class="language-plaintext highlighter-rouge">execute</code>, <code class="language-plaintext highlighter-rouge">translate</code>
and <code class="language-plaintext highlighter-rouge">validate</code>, seemed preferable to writing a whole Datalog shell and
testing framework. Facts and rules have the same syntax as
Soufflé, as does the <code class="language-plaintext highlighter-rouge">.output</code> directive. The <code class="language-plaintext highlighter-rouge">.input</code>
directive, not shown in this example, has a new optional <em>filePath</em>
argument.)</p>

<h2 id="translating-datalog-to-morel">Translating Datalog to Morel</h2>

<p>The translation makes concrete the equivalence that Codd’s Theorem
promises: each Datalog construct has a direct counterpart in Morel.</p>

<p>One way to support Datalog would have been to implement a Datalog
engine, but this would have been a major task and would not have
benefited the rest of Morel. Instead, we have extended the Morel
language with Datalog-like constructs; this has made the Morel
language more powerful, and made Datalog translation straightforward.</p>

<p>The Datalog-to-Morel translator has a structure that will be familiar
to anyone who has implemented a compiler that translates a high-level
language to a lower-level language. Three steps are executed in
succession:</p>

<ol>
  <li>The <em>parser</em> converts a Datalog string to a parse tree.</li>
  <li>The <em>validator</em> makes sure that the program is valid (that rules
are safe, grounded and stratified) and deduces its type.</li>
  <li>The <em>translator</em> generates a Morel program that is equivalent to
the Datalog program.</li>
</ol>

<p>Parsing and validation follow standard patterns, but let’s look at
the translation algorithm in a little more detail.
Here is the translation to Morel of the earlier Datalog program:</p>

<!-- morel skip
let
  val edge_facts = [(1, 2), (2, 3)]
  fun edge (x, y) = (x, y) elem edge_facts
  fun path (x, y) =
    edge (x, y) orelse
    (exists v0 where path (x, v0) andalso edge (v0, y))
in
  {path = from x, y where path (x, y)}
end
-->

<div class="code-block">
<div class="code-input"><span class="kr">let</span>
  <span class="kr">val</span> <span class="nv">edge_facts</span> <span class="p">=</span> <span class="p">[(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)]</span>
  <span class="kr">fun</span> <span class="nf">edge</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">=</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="kr">elem</span> <span class="n">edge_facts</span>
  <span class="kr">fun</span> <span class="nf">path</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">=</span>
    <span class="n">edge</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="kr">orelse</span>
    <span class="p">(</span><span class="kr">exists</span> <span class="n">v0</span> <span class="kr">where</span> <span class="n">path</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">v0</span><span class="p">)</span> <span class="kr">andalso</span> <span class="n">edge</span> <span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span>
<span class="kr">in</span>
  <span class="p">{</span><span class="n">path</span> <span class="p">=</span> <span class="kr">from</span> <span class="nv">x</span><span class="p">,</span> <span class="nv">y</span> <span class="kr">where</span> <span class="n">path</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)}</span>
<span class="kr">end</span></div>
</div>

<p>You’ll notice that the Datalog and Morel programs have the same
structure. Datalog rules without a body (such as <code class="language-plaintext highlighter-rouge">edge(1,2)</code> and
<code class="language-plaintext highlighter-rouge">edge(2,3)</code>) are gathered into a list of tuples (<code class="language-plaintext highlighter-rouge">edge_facts</code>).</p>

<p>Each rule becomes a boolean function. If there are several
comma-separated predicates in a rule’s body, they are combined using
<code class="language-plaintext highlighter-rouge">andalso</code>. If there are several rules of the same name, their
conditions are combined using <code class="language-plaintext highlighter-rouge">orelse</code>. Invocations of a rule become
function calls, which, like rules, may be recursive.</p>

<p>The body of the rule <code class="language-plaintext highlighter-rouge">path(X,Z) :- path(X,Y), edge(Y,Z)</code> has a
variable, <code class="language-plaintext highlighter-rouge">Y</code>, that does not occur in the head. It is translated to
<code class="language-plaintext highlighter-rouge">exists v0</code>.</p>

<p>A Datalog program may have several <code class="language-plaintext highlighter-rouge">.output</code> directives. The Morel
program returns a single value, a record with one field for each
directive. This program has one directive, <code class="language-plaintext highlighter-rouge">.output path</code>, so the
record has a single field named <code class="language-plaintext highlighter-rouge">path</code> that is a list of
<code class="language-plaintext highlighter-rouge">{x:int, y:int}</code> records.</p>

<h2 id="how-morel-does-it">How Morel does it</h2>

<p>The magic lies not in the Datalog-to-Morel converter but in the Morel
language itself. Over the last few months, we have added to Morel a
capability called <em>predicate inversion</em>, the ability to deduce a set
from a boolean expression.</p>

<p>At the heart of the generated Morel program is a query: <code class="language-plaintext highlighter-rouge">from x, y
where path (x, y)</code>. It differs from a regular query in that the
variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are <em>unbounded</em>. (In a conventional query,
every variable is <em>bounded</em>, meaning it iterates over a collection, as
do <code class="language-plaintext highlighter-rouge">d</code> and <code class="language-plaintext highlighter-rouge">e</code> in <code class="language-plaintext highlighter-rouge">from d in departments, e in employees</code>.)</p>

<p>In principle, an unbounded variable iterates over every possible value
of its data type. This is fine for “small” data types like <code class="language-plaintext highlighter-rouge">boolean</code>,
<code class="language-plaintext highlighter-rouge">char</code>, and <code class="language-plaintext highlighter-rouge">enum Color { RED | GREEN | BLUE }</code>, but problematic for
“large” data types like <code class="language-plaintext highlighter-rouge">int</code> and <code class="language-plaintext highlighter-rouge">{b: boolean, i: int}</code> and infinite
data types like <code class="language-plaintext highlighter-rouge">string</code> and <code class="language-plaintext highlighter-rouge">int list</code>.</p>

<p>Morel allows unbounded variables in a program as long as there is a
predicate like <code class="language-plaintext highlighter-rouge">where x &gt; 0 andalso x &lt; 10</code> or <code class="language-plaintext highlighter-rouge">where e elem
employees</code> that connects it with a finite set. Invertible predicates
provide a way to generate the values of the variable. In Datalog
parlance, they ensure that the variable is <em>grounded</em>.</p>

<p>Morel’s predicate inversion algorithm recognizes various predicate
patterns, including boolean functions that check collection membership
(like <code class="language-plaintext highlighter-rouge">edge</code>) or compute transitive closure (like <code class="language-plaintext highlighter-rouge">path</code>).</p>

<h2 id="mixing-styles">Mixing styles</h2>

<p>The net result is that predicate inversion allows you to freely mix
Datalog-style queries (defined by boolean expressions and functions)
with the relational algebra-style queries (defined by <code class="language-plaintext highlighter-rouge">from</code>,
<code class="language-plaintext highlighter-rouge">exists</code>, <code class="language-plaintext highlighter-rouge">join</code> and set operations).</p>

<p>The following query is in a hybrid style.</p>

<!-- morel skip
(* Calculus style: recursive reachability *)
fun edge (x, y) = (x, y) elem [(1,2), (2,3), (3,4), (2,4)];
fun reachable (x, y) =
  edge (x, y) orelse
  exists z where edge (x, z) andalso reachable (z, y);

(* Algebra style: count reachable nodes per source *)
from source in [1, 2, 3, 4]
  yield {source,
         reachable_count = count (from target
                                    where reachable (source, target))}
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Calculus style: recursive reachability *)</span>
<span class="kr">fun</span> <span class="nf">edge</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">=</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="kr">elem</span> <span class="p">[(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">4</span><span class="p">)];</span>
<span class="kr">fun</span> <span class="nf">reachable</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">=</span>
  <span class="n">edge</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="kr">orelse</span>
  <span class="kr">exists</span> <span class="n">z</span> <span class="kr">where</span> <span class="n">edge</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">z</span><span class="p">)</span> <span class="kr">andalso</span> <span class="n">reachable</span> <span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>

<span class="c">(*</span><span class="cm"> Algebra style: count reachable nodes per source *)</span>
<span class="kr">from</span> <span class="nv">source</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="n">source</span><span class="p">,</span>
         <span class="n">reachable_count</span> <span class="p">=</span> <span class="n">count</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">target</span>
                                    <span class="kr">where</span> <span class="n">reachable</span> <span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">target</span><span class="p">))}</span></div>
</div>

<p>The <code class="language-plaintext highlighter-rouge">edge</code> and <code class="language-plaintext highlighter-rouge">reachable</code> functions define graph reachability in a
Datalog style, using recursion and boolean return values. The <code class="language-plaintext highlighter-rouge">from</code>
query is in the algebra style, but uses predicate inversion to
generate all values of the unbounded <code class="language-plaintext highlighter-rouge">target</code> variable for which
<code class="language-plaintext highlighter-rouge">reachable (source, target)</code> is true. Predicate inversion provides the
junction between the two styles.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Morel now unifies the calculus and algebra styles of writing queries.
The new Datalog interface showcases this capability, but you can also
use the calculus style in Morel programs, where you can freely mix it
with the algebra style and functional programming.</p>

<p>Notice that we have made no claims about the <strong>efficiency</strong> of the
implementation. Our goal was to increase the expressive power of the
language, and we have achieved that goal. Morel’s internal
representation is algebraic — using relational operators, other
operators provided by functions, and iteration to a fixed point
— and from this point we can apply conventional
query-optimization techniques.</p>

<p>To keep things simple, we have not discussed <strong>evaluation models</strong>.
Datalog uses forward chaining (bottom-up evaluation) while boolean
functions give the impression that backwards chaining (top-down
evaluation) is being used. For most queries both approaches are valid,
and the planner would ideally consider both strategies along with
optimizations such as join re-ordering, magic sets, semi-naïve
evaluation, and materialized views. But there are queries where the
evaluation model matters (say, they would terminate under one model
but not another), and for these cases it is important that we define
Morel’s operational semantics.</p>

<p>The predicate inversion algorithm needs to evolve and mature. It has
been tested over a wide array of queries, but there are still cases
where it fails to invert a predicate, or fails to remove a condition
that has been fully satisfied by a generator. (We hope to write more
about predicate inversion, generators, and subsuming predicates, in a
future article.)</p>

<p>Please download Morel and give it a try! (Morel has both
<a href="https://github.com/hydromatic/morel">Java</a> and
<a href="https://github.com/hydromatic/morel-rust">Rust</a> versions, but Datalog
and predicate inversion require the Java version for now.)</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">How we added Datalog support to <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>... and why you might want to just write a Morel query. <a href="https://t.co/bws0HF4xHl">https://t.co/bws0HF4xHl</a> <a href="https://t.co/z6wfZGyamn">pic.twitter.com/z6wfZGyamn</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/2031116250278211833?ref_src=twsrc%5Etfw">March 9, 2026</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2026-03-09-datalog-in-morel.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[This week we added Datalog support to Morel — not by building a Datalog engine, but by adding a language feature called predicate inversion.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">History of lambda syntax</title><link href="http://blog.hydromatic.net/2025/10/26/history-of-lambda-syntax.html" rel="alternate" type="text/html" title="History of lambda syntax" /><published>2025-10-26T13:00:00-07:00</published><updated>2025-10-26T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/10/26/history-of-lambda-syntax</id><content type="html" xml:base="http://blog.hydromatic.net/2025/10/26/history-of-lambda-syntax.html"><![CDATA[<p>Lambda syntax varies widely across languages; more widely, I think, than
other language features. I wish it weren’t so. It’s difficult to see the
elegance in a new language if the syntax is unfamiliar.</p>

<p>The following table lists the year that various programming languages
introduced lambda syntax (not always the year in which the language
was born). If a language introduced an alternate syntax at a different
date, I have noted the year of introduction.</p>

<table>
  <thead>
    <tr>
      <th>Language</th>
      <th>Year</th>
      <th>Syntax</th>
      <th>Alternate(s)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Lambda calculus</td>
      <td>1930s<sup id="fnref:32"><a href="#fn:32" class="footnote" rel="footnote" role="doc-noteref">1</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">λx.x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Lisp</td>
      <td>1960<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">2</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">(lambda (x) (+ x 1))</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>ML</td>
      <td>1978<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">3</a></sup></td>
      <td><code>&lambda;x.x+1</code></td>
      <td>Evolved to <code class="language-plaintext highlighter-rouge">fun x.x+1</code> (1983), then <code class="language-plaintext highlighter-rouge">fn x =&gt; x + 1</code> (1985)</td>
    </tr>
    <tr>
      <td>Hope</td>
      <td>1980<sup id="fnref:36"><a href="#fn:36" class="footnote" rel="footnote" role="doc-noteref">4</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">lambda x =&gt; x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Smalltalk</td>
      <td>1981<sup id="fnref:35"><a href="#fn:35" class="footnote" rel="footnote" role="doc-noteref">5</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">[ :x | x + 1 ]</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Erlang</td>
      <td>1987<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">6</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">fun(X) -&gt; X + 1 end</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Haskell</td>
      <td>1990<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">7</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">\x -&gt; x + 1</code></td>
      <td><code class="language-plaintext highlighter-rouge">(+ 1)</code> (1999<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">8</a></sup>)</td>
    </tr>
    <tr>
      <td>Python</td>
      <td>1991<sup id="fnref:6"><a href="#fn:6" class="footnote" rel="footnote" role="doc-noteref">9</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">lambda x: x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Lua</td>
      <td>1993<sup id="fnref:33"><a href="#fn:33" class="footnote" rel="footnote" role="doc-noteref">10</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">function (x) return x + 1 end</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Perl</td>
      <td>1994<sup id="fnref:7"><a href="#fn:7" class="footnote" rel="footnote" role="doc-noteref">11</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">sub { $_[0] + 1 }</code></td>
      <td><code class="language-plaintext highlighter-rouge">sub { my $x = shift; $x + 1 }</code></td>
    </tr>
    <tr>
      <td>JavaScript</td>
      <td>1995<sup id="fnref:8"><a href="#fn:8" class="footnote" rel="footnote" role="doc-noteref">12</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">function(x) { return x + 1; }</code></td>
      <td><code class="language-plaintext highlighter-rouge">x =&gt; x + 1</code> (2015 <sup id="fnref:9"><a href="#fn:9" class="footnote" rel="footnote" role="doc-noteref">13</a></sup>)</td>
    </tr>
    <tr>
      <td>Ruby</td>
      <td>1995<sup id="fnref:10"><a href="#fn:10" class="footnote" rel="footnote" role="doc-noteref">14</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">Proc.new { | x | x + 1 }</code><br /><code class="language-plaintext highlighter-rouge">proc { | x | x + 1 }</code></td>
      <td><code class="language-plaintext highlighter-rouge">lambda { |x| x + 1 }</code> (2003 <sup id="fnref:11"><a href="#fn:11" class="footnote" rel="footnote" role="doc-noteref">15</a></sup>)<br /><code class="language-plaintext highlighter-rouge">-&gt;(x) { x + 1 }</code> (2007 <sup id="fnref:12"><a href="#fn:12" class="footnote" rel="footnote" role="doc-noteref">16</a></sup>)</td>
    </tr>
    <tr>
      <td>OCaml</td>
      <td>1996<sup id="fnref:13"><a href="#fn:13" class="footnote" rel="footnote" role="doc-noteref">17</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">fun x -&gt; x + 1</code></td>
      <td><code class="language-plaintext highlighter-rouge">(+) 1</code> (1996 <sup id="fnref:14"><a href="#fn:14" class="footnote" rel="footnote" role="doc-noteref">18</a></sup>)</td>
    </tr>
    <tr>
      <td>APL</td>
      <td>1996<sup id="fnref:20"><a href="#fn:20" class="footnote" rel="footnote" role="doc-noteref">19</a></sup></td>
      <td><code>{&omega;+1}</code></td>
      <td><code class="language-plaintext highlighter-rouge">+∘1</code> (1978 <sup id="fnref:21"><a href="#fn:21" class="footnote" rel="footnote" role="doc-noteref">20</a></sup>)</td>
    </tr>
    <tr>
      <td>Groovy</td>
      <td>2003<sup id="fnref:16"><a href="#fn:16" class="footnote" rel="footnote" role="doc-noteref">21</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">{ x -&gt; x + 1 }</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Scala</td>
      <td>2003<sup id="fnref:17"><a href="#fn:17" class="footnote" rel="footnote" role="doc-noteref">22</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">x =&gt; x + 1</code></td>
      <td><code class="language-plaintext highlighter-rouge">_ + 1</code> (2007 <sup id="fnref:18"><a href="#fn:18" class="footnote" rel="footnote" role="doc-noteref">23</a></sup>)</td>
    </tr>
    <tr>
      <td>MATLAB</td>
      <td>2004<sup id="fnref:19"><a href="#fn:19" class="footnote" rel="footnote" role="doc-noteref">24</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">@(x) x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>C#</td>
      <td>2007<sup id="fnref:15"><a href="#fn:15" class="footnote" rel="footnote" role="doc-noteref">25</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">x =&gt; x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Clojure</td>
      <td>2007<sup id="fnref:22"><a href="#fn:22" class="footnote" rel="footnote" role="doc-noteref">26</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">(fn [x] (+ x 1))</code></td>
      <td><code class="language-plaintext highlighter-rouge">#(+ % 1)</code><br /><code class="language-plaintext highlighter-rouge">(partial + 1)</code></td>
    </tr>
    <tr>
      <td>Go</td>
      <td>2009<sup id="fnref:23"><a href="#fn:23" class="footnote" rel="footnote" role="doc-noteref">27</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">func(x int) int { return x + 1 }</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Delphi</td>
      <td>2009<sup id="fnref:34"><a href="#fn:34" class="footnote" rel="footnote" role="doc-noteref">28</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">f := function(x: Integer): Integer begin Result := x + 1; end;</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Rust</td>
      <td>2010<sup id="fnref:24"><a href="#fn:24" class="footnote" rel="footnote" role="doc-noteref">29</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">|x| x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Dart</td>
      <td>2011<sup id="fnref:25"><a href="#fn:25" class="footnote" rel="footnote" role="doc-noteref">30</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">(x) =&gt; x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Elixir</td>
      <td>2011<sup id="fnref:26"><a href="#fn:26" class="footnote" rel="footnote" role="doc-noteref">31</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">fn x -&gt; x + 1 end</code></td>
      <td><code class="language-plaintext highlighter-rouge">&amp;(&amp;1 + 1)</code></td>
    </tr>
    <tr>
      <td>Kotlin</td>
      <td>2011<sup id="fnref:27"><a href="#fn:27" class="footnote" rel="footnote" role="doc-noteref">32</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">{ x -&gt; x + 1 }</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>C++</td>
      <td>2011<sup id="fnref:28"><a href="#fn:28" class="footnote" rel="footnote" role="doc-noteref">33</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">[](int x) { return x + 1; }</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Julia</td>
      <td>2012<sup id="fnref:29"><a href="#fn:29" class="footnote" rel="footnote" role="doc-noteref">34</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">x -&gt; x + 1</code></td>
      <td> </td>
    </tr>
    <tr>
      <td>Swift</td>
      <td>2014<sup id="fnref:30"><a href="#fn:30" class="footnote" rel="footnote" role="doc-noteref">35</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">{ x in x + 1 }</code></td>
      <td><code class="language-plaintext highlighter-rouge">{$0 + 1}</code></td>
    </tr>
    <tr>
      <td>Java</td>
      <td>2014<sup id="fnref:31"><a href="#fn:31" class="footnote" rel="footnote" role="doc-noteref">36</a></sup></td>
      <td><code class="language-plaintext highlighter-rouge">x -&gt; x + 1</code></td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p>(Please let me know if there are mistakes in syntax or year of
introduction. Claude was my research assistant. I omitted languages in
the same family with the same syntax, e.g. Lisp-Scheme-Racket,
OCaml-F#. Did I miss any early, major languages?)</p>

<p>Here is the original tweet:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">Lambda syntax varies widely across languages; more widely, I think, than other language features. I wish it weren&#39;t so. It&#39;s difficult to see the elegance in a new language if the syntax is unfamiliar. <a href="https://t.co/kz1KrtsrbU">pic.twitter.com/kz1KrtsrbU</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1950681730568143094?ref_src=twsrc%5Etfw">July 30, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-10-26-history-of-lambda-syntax.md">has been updated</a>.</p>

<h3 id="footnotes">Footnotes</h3>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:32">
      <p>The lambda calculus was invented in the 1930s by Alonzo Church.
   The original notation used a Greek letter lambda (λ) to denote
   anonymous functions. It is a mathematical formalism rather than
   a programming language. <a href="#fnref:32" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1">
      <p>Lisp was invented in 1958, but the lambda syntax appeared in the
  1960 paper “Recursive Functions of Symbolic Expressions and
  Their Computation by Machine, Part I” by John McCarthy. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>ML was invented in 1973 by Robin Milner et al. In
  <a href="https://dl.acm.org/doi/pdf/10.1145/512760.512773&quot;">“A Metalanguage for Interactive Proof in ICF”</a>
  (Gordon, Milner et al., 1978), the syntax was
  “<code>&lambda;x.x+1</code>”. By
  <a href="https://smlfamily.github.io/history/SML-proposal-6-83.pdf">“A Proposal for Standard ML (second draft)”</a>
  (Milner, 1983), the syntax was “<code class="language-plaintext highlighter-rouge">fun x . x + 1</code>”.
  The final syntax “<code>fn x =&gt; x + 1</code>” first appeared in
  <a href="https://smlfamily.github.io/history/SML-proposal-9-85.pdf">“The Standard ML Core Language (Revised)”</a>
  (Milner, 1985). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:36">
      <p>Per “<a href="https://dl.acm.org/doi/pdf/10.1145/3386336">The History of Standard ML</a>”
   (MacQueen, Harper, Reppy, 2000), HOPE was developed in
   Edinburgh just after LCF/ML from 1977 to 1980. See
   “<a href="https://dl.acm.org/doi/pdf/10.1145/800087.802799">HOPE: An experimental applicative language</a>”
   (Burstall, MacQueen, Sannella, 1980). <a href="#fnref:36" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:35">
      <p>Smalltalk was created in the early 1970s at Xerox PARC by Alan
   Kay, Dan Ingalls, Adele Goldberg, and others. Smalltalk-76
   added block literals with no arguments. Smalltalk-80 (1981)
   allowed code blocks to have arguments. See
   “<a href="http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf">Smalltalk-80: The Language and its implementation</a>”
   by Adele Goldberg and David Robson, page 35. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>Erlang was created in 1987 by Joe Armstrong et al. The syntax
  appeared in the 1993 book “Erlang Programming” by Armstrong. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>Haskell was first defined in 1990 by a committee. The syntax
  appeared in the 1990 paper “Haskell: A Non-strict, Purely
  Functional Language” by Simon Peyton Jones et al. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>The operator section syntax <code class="language-plaintext highlighter-rouge">(+ 1)</code> was introduced in the
  Haskell 98 Report (1999). <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6">
      <p>Python introduced the <code class="language-plaintext highlighter-rouge">lambda</code> syntax in version 1.0, released
  in January 1994. The syntax was present in the 1991 “Python
  Tutorial” by Guido van Rossum. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:33">
      <p>Lua was created in 1993 by Roberto Ierusalimschy, Luiz Henrique
   de Figueiredo, and Waldemar Celes. The function syntax appeared
   in the original Lua documentation. <a href="#fnref:33" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7">
      <p>Perl introduced anonymous subroutines in version 5.0, released
  in 1994. The syntax was documented in the “Programming Perl”
  book by Larry Wall et al. The use of <code class="language-plaintext highlighter-rouge">my $x = shift</code>; within
  such a block is a standard way to access arguments passed to the
  subroutine. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8">
      <p>JavaScript was created in 1995 by Brendan Eich. The <code class="language-plaintext highlighter-rouge">function</code>
  syntax appeared in the original specification “JavaScript
  Language Specification” by Netscape. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9">
      <p>The arrow function syntax <code class="language-plaintext highlighter-rouge">x =&gt; x + 1</code> was introduced in
  ECMAScript 6 (2015). Prior to that, JavaScript did not have a
  concise lambda syntax. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10">
      <p>Ruby was created in 1995 by Yukihiro Matsumoto. The initial
   release, Ruby 0.95 contained the <code class="language-plaintext highlighter-rouge">Proc</code> class and block
   syntax. The <code class="language-plaintext highlighter-rouge">Kernel#proc</code> method was equivalent to <code class="language-plaintext highlighter-rouge">Proc.new</code>. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11">
      <p>Ruby introduced <code class="language-plaintext highlighter-rouge">lambda</code> in Ruby 1.8 (2003) as a way to create
   lambda functions with stricter argument checking. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:12">
      <p>The stabby lambda syntax <code class="language-plaintext highlighter-rouge">-&gt;</code> was introduced in Ruby 1.9 (2007)
   as a more concise way to define lambdas. <code class="language-plaintext highlighter-rouge">Kernel#proc</code> was
   changed to be equivalent to <code class="language-plaintext highlighter-rouge">Proc.new</code>, which has slightly
   different behavior than a lambda. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13">
      <p>OCaml was created in 1996 by Xavier Leroy et al. The syntax
   appeared in the 1996 paper “The Objective Caml System” by
   Leroy. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:14">
      <p>OCaml supports partial application of functions, so <code class="language-plaintext highlighter-rouge">(+) 1</code> is
   valid syntax for a function that adds 1. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:20">
      <p><a href="https://en.wikipedia.org/wiki/John_M._Scholes">John Scholes</a>
   invented direct functions or dfns (pronounced “dee funs”) in
   1996. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:21">
      <p>The tacit programming style (also known as point-free style)
   was introduced by Kenneth E. Iverson in the 1978 book “APL: An
   Interactive Approach” co-authored with Philip S. Abrams.  See
   also <a href="https://aplwiki.com/wiki/Bind">Bind</a>. <a href="#fnref:21" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:16">
      <p>Groovy was created in 2003 by James Strachan. The closure
   syntax appeared in the original Groovy documentation. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:17">
      <p>Scala was created in 2003 by Martin Odersky. The syntax
   appeared in the 2004 paper “The Scala Language Specification”
   by Odersky et al. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:18">
      <p>Scala “placeholder syntax” was introduced around 2007, and
   appears in the 2008 “Programming in Scala” book by Odersky,
   Spoon, and Venners. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:19">
      <p>MATLAB introduced function handles in release R12 (MATLAB 6.0),
   which was released in November 2000. However, in this version,
   calling them still required the use of the <code class="language-plaintext highlighter-rouge">feval</code>
   function. Anonymous and nested functions, which expanded the
   capabilities related to function handles, were introduced later
   in release R14 (MATLAB 7.0), released in June 2004. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:15">
      <p>While C# had <code class="language-plaintext highlighter-rouge">delegate</code> in version 2.0 (2005), lambda
   expressions did not arrive until version 3.0 (2007). The syntax
   appeared in the “C# Language Specification” by Anders Hejlsberg
   et al. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:22">
      <p>Clojure was created in 2007 by Rich Hickey. The <code class="language-plaintext highlighter-rouge">fn</code> syntax,
   function literal syntax <code class="language-plaintext highlighter-rouge">#(+ % 1)</code>, and <code class="language-plaintext highlighter-rouge">partial</code> all
   appeared in the original Clojure documentation. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:23">
      <p>Go was created in 2009 by Robert Griesemer, Rob Pike, and Ken
   Thompson.  The <code class="language-plaintext highlighter-rouge">func</code> syntax appeared in the original Go
   specification. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:34">
      <p>Delphi introduced anonymous methods in Delphi 2009. The syntax
   appeared in the “Delphi Language Guide” by Embarcadero. Anonymous
   methods must be used immediately (assigned to a variable, passed
   as a parameter, or applied to arguments). <a href="#fnref:34" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:24">
      <p>Rust was created in 2010 by Graydon Hoare. The closure syntax
   appeared in the original Rust documentation. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:25">
      <p>Dart was created in 2011 by Lars Bak and Kasper Lund. The arrow
   syntax appeared in the original Dart language specification. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:26">
      <p>Elixir was created in 2011 by José Valim. The <code class="language-plaintext highlighter-rouge">fn</code> syntax and
   capture operator <code class="language-plaintext highlighter-rouge">&amp;</code> appeared in the original Elixir
   documentation. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:27">
      <p>Kotlin was created in 2011 by JetBrains. The lambda syntax
   appeared in the original Kotlin documentation. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:28">
      <p>C++ introduced lambda expressions in C++11 (2011). The syntax
   appeared in the “C++11 Standard” by the ISO/IEC JTC1/SC. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:29">
      <p>Julia was created in 2012 by Jeff Bezanson, Stefan Karpinski,
   Viral B. Shah, and Alan Edelman. The arrow syntax appeared in
   the original Julia documentation. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:30">
      <p>Swift was created in 2014 by Apple Inc. The closure syntax, and
   shorthand argument names like <code class="language-plaintext highlighter-rouge">$0</code>, have been part of Swift
   since version 1.0. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:31">
      <p>Java 8 introduced lambda expressions in 2014. The syntax
   appeared in the “Java Language Specification, Java SE 8
   Edition” by James Gosling et al. <a href="#fnref:31" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[Lambda syntax varies widely across languages; more widely, I think, than other language features. I wish it weren’t so. It’s difficult to see the elegance in a new language if the syntax is unfamiliar.]]></summary></entry><entry><title type="html">Morel Rust release 0.2.0</title><link href="http://blog.hydromatic.net/2025/10/23/morel-rust-release-0-2-0.html" rel="alternate" type="text/html" title="Morel Rust release 0.2.0" /><published>2025-10-23T02:30:00-07:00</published><updated>2025-10-23T02:30:00-07:00</updated><id>http://blog.hydromatic.net/2025/10/23/morel-rust-release-0-2-0</id><content type="html" xml:base="http://blog.hydromatic.net/2025/10/23/morel-rust-release-0-2-0.html"><![CDATA[<p>I am pleased to announce
<a href="https://github.com/hydromatic/morel-rust/blob/main/CHANGELOG.md#020--2025-10-23">release 0.2.0</a>
of <a href="https://github.com/hydromatic/morel-rust/">Morel Rust</a>.</p>

<p>The Morel language has an existing implementation in Java
(<a href="https://github.com/hydromatic/morel/">Morel Java</a> version 0.7 was
<a href="/2025/06/08/morel-release-0-7-0.html">released in June</a> and
0.8 is coming soon) but this is the beginning of a brand-new Rust
runtime.</p>

<h3 id="whats-in-release-020">What’s in release 0.2.0</h3>

<p>This release focuses on Morel’s underpinnings as a functional
programming language. It can parse any program, and execute
simple programs that consist of expressions, function
declarations, and lambdas (closures).</p>

<p>Here’s a quick example showing what works today.
First, use <code class="language-plaintext highlighter-rouge">cargo</code> to build Morel and start a shell:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>cargo run
morel-rust version 0.2.0 <span class="o">(</span>rust version 1.90.0<span class="o">)</span>
-
</code></pre></div></div>

<p>Next, you can enter some commands:</p>

<!-- morel skip
(* Define a recursive function *)
fun factorial n =
  if n <= 1 then 1
  else n * factorial (n - 1);

(* Use lambdas and higher-order functions *)
val squares = List.map (fn x => x * x) [1, 2, 3, 4, 5];

(* Compose functions *)
val sumOfSquares =
  List.foldl (fn (x, y) => x + y) 0 (List.map (fn x => x * x) [1, 2, 3, 4, 5]);
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Define a recursive function *)</span>
<span class="kr">fun</span> <span class="nf">factorial</span> <span class="n">n</span> <span class="p">=</span>
  <span class="kr">if</span> <span class="n">n</span> <span class="o">&lt;</span><span class="p">=</span> <span class="mi">1</span> <span class="kr">then</span> <span class="mi">1</span>
  <span class="kr">else</span> <span class="n">n</span> <span class="o">*</span> <span class="n">factorial</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>

<span class="c">(*</span><span class="cm"> Use lambdas and higher-order functions *)</span>
<span class="kr">val</span> <span class="nv">squares</span> <span class="p">=</span> <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="kr">fn</span> <span class="n">x</span> <span class="o">=&gt;</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">];</span>

<span class="c">(*</span><span class="cm"> Compose functions *)</span>
<span class="kr">val</span> <span class="nv">sumOfSquares</span> <span class="p">=</span>
  <span class="nn">List</span><span class="p">.</span><span class="n">foldl</span> <span class="p">(</span><span class="kr">fn</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> <span class="mi">0</span> <span class="p">(</span><span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="kr">fn</span> <span class="n">x</span> <span class="o">=&gt;</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]);</span></div>
</div>

<p>This demonstrates core functional programming: recursion,
higher-order functions, and composition. Support for programs
that contain queries—the <code class="language-plaintext highlighter-rouge">from</code>, <code class="language-plaintext highlighter-rouge">exists</code>, and <code class="language-plaintext highlighter-rouge">forall</code>
keywords—and user-defined types will come later.</p>

<p>A caveat: this is pre-alpha software. Expect bugs, crashes, and
minimal error handling. We’ve focused on getting the foundations
right—the Hindley-Milner type deduction algorithm, an
evaluation environment that handles recursive functions and
closures—rather than polish. If you’d like to
contribute—fixing bugs, adding features, or improving
documentation—please join us!</p>

<h3 id="why-rust">Why Rust?</h3>

<p>Why create a Rust runtime for Morel when there is already a
Java runtime? Rust brings significant advantages for data
processing workloads.</p>

<p>Rust processes in-memory data at exceptional speed, with
zero-cost abstractions and no garbage collection pauses. It
integrates naturally with modern data infrastructure: Apache
DataFusion for query execution, Arrow for columnar processing,
and Parquet for efficient storage. Memory safety comes without
runtime overhead, and the resulting binaries are ideal for
cloud-native deployments.</p>

<p>Having multiple runtimes also underscores a key design principle: when
writing a Morel program, you don’t need to think about implementation
details. Your choice of runtime is separate from your choice of
language. Choose Java for its ecosystem and JVM integration, or Rust
for performance and modern infrastructure. Programs are portable
across both.</p>

<p>But the most important “runtime” is wherever your data already
lives—Iceberg tables on object storage, Kafka topics, or SQL
engines like Snowflake, BigQuery, or Postgres. That’s why query
planning, federation, and SQL dialect translation are central to
Morel’s design. The compiler can push computation to the data,
regardless of which Morel implementation you’re using.</p>

<h3 id="morel-is-a-language-not-a-framework">Morel is a language, not a framework</h3>

<p>Morel is a complete language, not a framework. When you have a
data problem, you can solve it entirely in Morel—no jumping
between languages, no glue code, and no framework boundaries to
cross.</p>

<p>With a framework, you’re constantly context-switching: Python
for orchestration, SQL for queries, Java for business logic,
and Spark for transformations. Morel lets you express the entire
solution in one language, bringing the benefits of functional
programming—type safety, composability, and refactoring—to data
engineering.</p>

<h3 id="choose-your-runtime-keep-your-code">Choose your runtime, keep your code</h3>

<p>Because Morel is a language with multiple implementations, your
Morel programs are portable across runtimes. Write your code
once, and run it on either Java or Rust—whichever fits your
deployment needs. Users shouldn’t notice—and don’t need to
care—which implementation they’re running.</p>

<p>One reason that Morel Rust has developed quickly is that we can
run Morel Java’s test scripts unchanged. We are gradually
enabling tests as functionality comes online, proving
portability in practice.</p>

<h3 id="learn-more">Learn more</h3>

<p>To find out more about Morel, read about its
<a href="/2020/02/25/morel-a-functional-language-for-data.html">goals</a>
and <a href="/2020/03/03/morel-basics.html">basic language</a>,
and find a full definition of the language in the
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md">query reference</a>
and the
<a href="https://github.com/hydromatic/morel/blob/main/docs/reference.md">language reference</a>.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">Morel is now in Rust! I just made the first release of the new Rust toolchain for <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>. Morel-Rust implements same language as Morel-Java. It&#39;s early days, but potentially performance will be much better. <a href="https://t.co/15BJXA8lLe">https://t.co/15BJXA8lLe</a> <a href="https://t.co/cysSLvMbPP">pic.twitter.com/cysSLvMbPP</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1981440836467642880?ref_src=twsrc%5Etfw">October 23, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-10-23-morel-rust-release-0-2-0.md">has been updated</a>.</p>

<p><small>Apache Arrow, Apache DataFusion, Apache Iceberg, Apache
Parquet, and Apache Kafka are trademarks of the Apache Software
Foundation.</small></p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[I am pleased to announce release 0.2.0 of Morel Rust.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Sorting on expressions</title><link href="http://blog.hydromatic.net/2025/06/20/sorting-on-expressions.html" rel="alternate" type="text/html" title="Sorting on expressions" /><published>2025-06-20T13:00:00-07:00</published><updated>2025-06-20T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/06/20/sorting-on-expressions</id><content type="html" xml:base="http://blog.hydromatic.net/2025/06/20/sorting-on-expressions.html"><![CDATA[<p>Morel’s design philosophy of “everything is an expression” has
transformed how we think about queries, making them more composable
and flexible than traditional SQL.  One stubborn holdout was the
<code class="language-plaintext highlighter-rouge">order</code> step, which required a special syntax with comma-separated
order-items rather than a single expression. In this post, we describe
how we
<a href="https://github.com/hydromatic/morel/issues/244">evolved the syntax of the <code class="language-plaintext highlighter-rouge">order</code> step</a>
in Morel release 0.7, and the benefits of this change.</p>

<h2 id="why-expressions">Why expressions?</h2>

<p>In release 0.6, Morel’s
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md#syntax">query syntax</a>
(simplified a little) looked like this:</p>

<pre><code><i>query</i> &rarr; <b>from</b> <i>scan</i> [ , <i>scan</i> ... ] [ <i>step</i> ... ]

<i>step</i> &rarr; <b>distinct</b>
    | <b>except</b> [ <b>distinct</b> ] <i>exp</i> [ , <i>exp</i> ... ]
    | <b>group</b> <i>groupKey</i> [ , <i>groupKey</i> ... ] [ <b>compute</b> <i>agg</i> [ , <i>agg</i> ... ] ]
    | <b>intersect</b> [ <b>distinct</b> ] <i>exp</i> [ , <i>exp</i> ... ]
    | <b>join</b> <i>scan</i> [ , <i>scan</i> ... ]
    | <b>order</b> <i>orderItem</i> [ , <i>orderItem</i> ... ]
    | <b>skip</b> <i>exp</i>
    | <b>take</b> <i>exp</i>
    | <b>union</b> [ <b>distinct</b> ] <i>exp</i> [ , <i>exp</i> ... ]
    | <b>where</b> <i>exp</i>
    | <b>yield</b> <i>exp</i>

<i>scan</i> &rarr; <i>pat</i> <b>in</b> <i>exp</i> [ <b>on</b> <i>exp</i> ]

<i>orderItem</i> &rarr; <i>exp</i> [ <b>desc</b> ]

<i>groupKey</i> &rarr; [ <i>id</i> <b>=</b> ] <i>exp</i>

<i>agg</i> &rarr; [ <i>id</i> <b>=</b> ] <i>exp</i> [ <b>of</b> <i>exp</i> ]</code></pre>

<p>Almost everything is an expression. The argument to the <code class="language-plaintext highlighter-rouge">yield</code> step
is an expression (whereas SQL’s <code class="language-plaintext highlighter-rouge">SELECT</code> has a list of expressions
with optional <code class="language-plaintext highlighter-rouge">AS</code> aliases); the scan in a <code class="language-plaintext highlighter-rouge">from</code> query or <code class="language-plaintext highlighter-rouge">join</code> step
is over an expression (which, unlike SQL, is not necessarily a query);
the arguments to the <code class="language-plaintext highlighter-rouge">where</code>, <code class="language-plaintext highlighter-rouge">skip</code>, <code class="language-plaintext highlighter-rouge">take</code>, <code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">intersect</code>,
and <code class="language-plaintext highlighter-rouge">union</code> steps are also expressions.</p>

<p>(The <em>groupKey</em> and <em>agg</em> items in <code class="language-plaintext highlighter-rouge">group</code> and <code class="language-plaintext highlighter-rouge">compute</code> have some way
to go, and we will be looking at those for Morel 0.8, but at least the
aggregate function (before <code class="language-plaintext highlighter-rouge">of</code>) may be a (function-valued)
expression.)</p>

<p>Making everything an expression pays dividends. Queries can return a
collection of any value, not just records. You can easily join a
collection to a set of nested records (say an order to its nested
order-lines). If you need a custom aggregate function, you can roll
your own. And each of these expressions can be made into function
arguments, so that you can parameterize your query.</p>

<p>From Morel 0.7 onwards, syntax of the <code class="language-plaintext highlighter-rouge">order</code> step is simpler:</p>

<pre><code><i>step</i> &rarr; ...
  | <b>order</b> <i>exp</i></code></pre>

<p>The argument is now just an expression, and the <em>orderItem</em> concept
has disappeared.</p>

<p>Let’s look at how we got here. What was wrong with the previous
syntax, which alternatives did we consider for the new syntax, and
what changes were necessary in order to make it possible?</p>

<h2 id="the-order-step">The <code class="language-plaintext highlighter-rouge">order</code> step</h2>

<p>In the previous syntax, the argument of the <code class="language-plaintext highlighter-rouge">order</code> step was a
comma-separated list of order-items, each of which is an expression
with an optional <code class="language-plaintext highlighter-rouge">desc</code> keyword.</p>

<p>One problem is the commas. In the expression</p>

<!-- morel skip
let
  val pairs = [(1, "a"), (2, "b"), (1, "c")];
in
  foo (from (i, j) in pairs order i desc, j)
end;
-->

<div class="code-block">
<div class="code-input"><span class="kr">let</span>
  <span class="kr">val</span> <span class="nv">pairs</span> <span class="p">=</span> <span class="p">[(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">"b"</span><span class="p">),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">)];</span>
<span class="kr">in</span>
  <span class="n">foo</span> <span class="p">(</span><span class="kr">from</span> <span class="p">(</span><span class="nv">i</span><span class="p">,</span> <span class="nv">j</span><span class="p">)</span> <span class="kr">in</span> <span class="n">pairs</span> <span class="kr">order</span> <span class="n">i</span> <span class="kr">desc</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>
<span class="kr">end</span><span class="p">;</span></div>
</div>

<p>it is not immediately clear whether <code class="language-plaintext highlighter-rouge">j</code> is a second argument for the
call to the function <code class="language-plaintext highlighter-rouge">foo</code> or the second item in the <code class="language-plaintext highlighter-rouge">order</code> clause.</p>

<p>Another problem was the fact that the <code class="language-plaintext highlighter-rouge">order</code> clause could not be
empty. The
<a href="https://github.com/hydromatic/morel/issues/273">ordered and unordered collections</a>
feature introduced an <code class="language-plaintext highlighter-rouge">unorder</code> step to convert a <code class="language-plaintext highlighter-rouge">list</code> to a <code class="language-plaintext highlighter-rouge">bag</code>,
and we need the opposite of that, a trivial sort whose
key has the same value for every element.</p>

<p>We can’t just get rid of the <code class="language-plaintext highlighter-rouge">desc</code> keyword and covert the list to a
singleton. Real queries require complex sorting behaviors like
composite keys, descending keys, and nulls-first or nulls-last
specifications. So, how can we put all that complexity in a single
expression?</p>

<p>One approach is to do what many programming languages do, and use a
comparator function. Let’s explore this approach.</p>

<h2 id="comparator-functions">Comparator functions</h2>

<p>In Standard ML, a comparator function is any function that takes a
pair of arguments of the same type and returns a value of the <code class="language-plaintext highlighter-rouge">order</code>
enum (<code class="language-plaintext highlighter-rouge">LESS</code>, <code class="language-plaintext highlighter-rouge">EQUAL</code>, <code class="language-plaintext highlighter-rouge">GREATER</code>). Its type is
<code class="language-plaintext highlighter-rouge">alpha * alpha -&gt; order</code>.</p>

<p>For <code class="language-plaintext highlighter-rouge">int</code>, I can write a simple function:</p>

<!-- morel
fun compareInt (x: int, y: int) =
  if x < y then LESS
  else if x > y then GREATER
  else EQUAL;
> val compareInt = fn : int * int -> order
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">compareInt</span> <span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="n">int</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">int</span><span class="p">)</span> <span class="p">=</span>
  <span class="kr">if</span> <span class="n">x</span> <span class="o">&lt;</span> <span class="n">y</span> <span class="kr">then</span> <span class="n">LESS</span>
  <span class="kr">else</span> <span class="kr">if</span> <span class="n">x</span> <span class="o">&gt;</span> <span class="n">y</span> <span class="kr">then</span> <span class="n">GREATER</span>
  <span class="kr">else</span> <span class="n">EQUAL</span><span class="p">;</span></div>
<div class="code-output">val compareInt = fn : int * int -&gt; order</div>
</div>

<p>In fact, most data types have a built-in <code class="language-plaintext highlighter-rouge">compare</code> function:</p>

<!-- morel
Int.compare;
> val it = fn : int * int -> order
Real.compare;
> val it = fn : real * real -> order
String.compare;
> val it = fn : string * string -> order
-->

<div class="code-block">
<div class="code-input"><span class="nn">Int</span><span class="p">.</span><span class="n">compare</span><span class="p">;</span></div>
<div class="code-output">val it = fn : int * int -&gt; order</div>
<div class="code-input"><span class="nn">Real</span><span class="p">.</span><span class="n">compare</span><span class="p">;</span></div>
<div class="code-output">val it = fn : real * real -&gt; order</div>
<div class="code-input"><span class="nn">String</span><span class="p">.</span><span class="n">compare</span><span class="p">;</span></div>
<div class="code-output">val it = fn : string * string -&gt; order</div>
</div>

<p>For more complex orderings, I can write a comparator that combines
other comparators. For example, this function compares a list of
<code class="language-plaintext highlighter-rouge">string * real</code> pairs, the <code class="language-plaintext highlighter-rouge">string</code> first, then the <code class="language-plaintext highlighter-rouge">real</code>
descending:</p>

<!-- morel
fun compareStringRealPair ((s1, r1), (s2, r2)) =
    case String.compare (s1, s2) of
        EQUAL => Real.compare (r2, r1)
      | result => result;
> val compareStringRealPair = fn : string * real * (string * real) -> order
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">compareStringRealPair</span> <span class="p">((</span><span class="n">s1</span><span class="p">,</span> <span class="n">r1</span><span class="p">),</span> <span class="p">(</span><span class="n">s2</span><span class="p">,</span> <span class="n">r2</span><span class="p">))</span> <span class="p">=</span>
    <span class="kr">case</span> <span class="nn">String</span><span class="p">.</span><span class="n">compare</span> <span class="p">(</span><span class="n">s1</span><span class="p">,</span> <span class="n">s2</span><span class="p">)</span> <span class="kr">of</span>
        <span class="n">EQUAL</span> <span class="o">=&gt;</span> <span class="nn">Real</span><span class="p">.</span><span class="n">compare</span> <span class="p">(</span><span class="n">r2</span><span class="p">,</span> <span class="n">r1</span><span class="p">)</span>
      <span class="p">|</span> <span class="n">result</span> <span class="o">=&gt;</span> <span class="n">result</span><span class="p">;</span></div>
<div class="code-output">val compareStringRealPair = fn : string * real * (string * real) -&gt; order</div>
</div>

<p>If we were to add comparators to Morel, we could add <code class="language-plaintext highlighter-rouge">order using</code>
syntax like this:</p>

<!-- morel skip
(* Sort employees by job, and then by descending salary. *)
from e in scott.emps
  order using fn (emp1, emp2) =>
    case String.compare (emp1.job, emp2.job) of
       EQUAL => Real.compare (emp2.sal, emp1.sal)
     | result => result;
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Sort employees by job, and then by descending salary. *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="n">using</span> <span class="kr">fn</span> <span class="p">(</span><span class="n">emp1</span><span class="p">,</span> <span class="n">emp2</span><span class="p">)</span> <span class="o">=&gt;</span>
    <span class="kr">case</span> <span class="nn">String</span><span class="p">.</span><span class="n">compare</span> <span class="p">(</span><span class="nn">emp1</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="nn">emp2</span><span class="p">.</span><span class="n">job</span><span class="p">)</span> <span class="kr">of</span>
       <span class="n">EQUAL</span> <span class="o">=&gt;</span> <span class="nn">Real</span><span class="p">.</span><span class="n">compare</span> <span class="p">(</span><span class="nn">emp2</span><span class="p">.</span><span class="n">sal</span><span class="p">,</span> <span class="nn">emp1</span><span class="p">.</span><span class="n">sal</span><span class="p">)</span>
     <span class="p">|</span> <span class="n">result</span> <span class="o">=&gt;</span> <span class="n">result</span><span class="p">;</span></div>
</div>

<p>(The comparator expression in this query is basically an inline
version of the <code class="language-plaintext highlighter-rouge">compareStringRealPair</code> function, but working on <code class="language-plaintext highlighter-rouge">emp</code>
records rather than <code class="language-plaintext highlighter-rouge">string * real</code> pairs.)</p>

<p>But this is much longer than the equivalent in SQL. Comparator
functions are clearly powerful, but they fail the “make simple things
simple” test – forcing developers to write complex code for common
sorting patterns.</p>

<p>Let’s look instead at value-based sorting, which is simpler, but
provides most of the flexibility of comparator functions.</p>

<h2 id="structured-values-for-complex-orderings">Structured values for complex orderings</h2>

<p>The idea behind value-based sorting is that any values of the same
type can be compared, and that the Morel system generates comparison
logic for any type. If you require a complex sorting behavior, you can
construct an expression with a complex type.</p>

<p>Previously, if you wanted a composite ordering, with one of the keys
descending, you would write something like this:</p>

<!-- morel skip
(* Old syntax. *)
from e in scott.emps
  order e.job, e.sal desc;
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Old syntax. *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="kr">desc</span><span class="p">;</span></div>
</div>

<p>As of Morel 0.7, you can write the same query using a single
expression:</p>

<!-- morel skip
(* New syntax. *)
from e in scott.emps
  order (e.job, DESC e.sal);
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> New syntax. *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">);</span></div>
</div>

<p>Note that:</p>
<ul>
  <li>For a composite ordering, we use a tuple type. Morel compares the
values lexicographically.</li>
  <li>For a descending ordering, we wrap the value in the <code class="language-plaintext highlighter-rouge">descending</code>
data type using its <code class="language-plaintext highlighter-rouge">DESC</code> constructor. Morel compares the values
in the usual way, then reverses the direction.</li>
</ul>

<p>Sorting is defined for all other data types, including tuples,
records, sum-types such as <code class="language-plaintext highlighter-rouge">Option</code> and <code class="language-plaintext highlighter-rouge">Descending</code>, lists, bags, and
any combination thereof.</p>

<p>Morel’s compiler has two tricks to make this powerful and efficient.</p>

<p>First, Morel is effectively generating a comparator function at
compile time based on the type of the <code class="language-plaintext highlighter-rouge">order</code> expression.  This makes
value-based sorting as powerful as comparator functions, but with less
code for the user to write.</p>

<p>(The change included a new library function, <code class="language-plaintext highlighter-rouge">Relational.compare</code>,
that allows you to compare any two values of the same type, even if
you are not performing a sort. This is a somewhat strange function,
because it takes the type as an implicit argument, then drives its
behavior by introspecting that type.)</p>

<p>Second, the <code class="language-plaintext highlighter-rouge">order</code> clause uses a form of lazy evaluation. If the
query</p>

<!-- morel skip
from e in scott.emps
  order (e.job, DESC e.sal);
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">);</span></div>
</div>

<p>created a tuple <code class="language-plaintext highlighter-rouge">(e.job, DESC(e.sal))</code> for every element, we would
worry about the impact on performance, but those tuples are never
constructed. Morel operates on the employee records <code class="language-plaintext highlighter-rouge">e</code> directly,
and the performance is the same as if we had specified the ordering
using a list of order-items or a comparator function.</p>

<h2 id="benefits-of-sorting-on-expressions">Benefits of sorting on expressions</h2>

<p>Now the <code class="language-plaintext highlighter-rouge">order</code> step takes an expression, what is now possible that
wasn’t before?</p>

<p>We can pass the expression as an argument to a function, like this:</p>

<!-- morel skip
fun rankedEmployees extractKey =
  from e in scott.emps
    order extractKey e;

rankedEmployees (fn e => e.ename);
rankedEmployees (fn e => (e.job,  DESC e.sal));
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">rankedEmployees</span> <span class="n">extractKey</span> <span class="p">=</span>
  <span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
    <span class="kr">order</span> <span class="n">extractKey</span> <span class="n">e</span><span class="p">;</span>

<span class="n">rankedEmployees</span> <span class="p">(</span><span class="kr">fn</span> <span class="n">e</span> <span class="o">=&gt;</span> <span class="nn">e</span><span class="p">.</span><span class="n">ename</span><span class="p">);</span>
<span class="n">rankedEmployees</span> <span class="p">(</span><span class="kr">fn</span> <span class="n">e</span> <span class="o">=&gt;</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span>  <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">));</span></div>
</div>

<p>We can also achieve the trivial sort required to convert a <code class="language-plaintext highlighter-rouge">bag</code> to a
<code class="language-plaintext highlighter-rouge">list</code>. You can sort by any constant value, such as the integer <code class="language-plaintext highlighter-rouge">0</code> or
the <code class="language-plaintext highlighter-rouge">Option</code> constructor <code class="language-plaintext highlighter-rouge">NONE</code>, but the norm would be to sort by the
empty tuple <code class="language-plaintext highlighter-rouge">()</code>:</p>

<!-- morel skip
from e in scott.emps
  yield e.ename
  order ();
> val it =
>   ["SMITH","ALLEN","WARD","JONES","MARTIN","BLAKE","CLARK",
>    "SCOTT","KING","TURNER","ADAMS","JAMES","FORD","MILLER"]
>   : string list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">yield</span> <span class="nn">e</span><span class="p">.</span><span class="n">ename</span>
  <span class="kr">order</span> <span class="p">();</span></div>
<div class="code-output">val it =
  ["SMITH","ALLEN","WARD","JONES","MARTIN","BLAKE","CLARK",
   "SCOTT","KING","TURNER","ADAMS","JAMES","FORD","MILLER"]
  : string list</div>
</div>

<p>Note that result is a <code class="language-plaintext highlighter-rouge">list</code>, even though <code class="language-plaintext highlighter-rouge">scott.emps</code> (a relational
database table) is a <code class="language-plaintext highlighter-rouge">bag</code>.  The elements are in
arbitrary order (because any order is consistent with the empty sort
key) but in converting the collection to a <code class="language-plaintext highlighter-rouge">list</code> the arbitrary order
has become frozen and repeatable.</p>

<h2 id="future-work">Future work</h2>

<p>Several challenges remain to be addressed.</p>

<h3 id="nulls-first-and-nulls-last">NULLS FIRST and NULLS LAST</h3>

<p>Real-world data sets often contain null values, and at various times
you wish to sort nulls low (as if they were zero or negative infinity)
or high (as if they were positive infinity). Morel uses the <code class="language-plaintext highlighter-rouge">option</code>
type rather than <code class="language-plaintext highlighter-rouge">NULL</code> to represent optional values, but the same
requirement exists.</p>

<p>SQL has <code class="language-plaintext highlighter-rouge">NULLS FIRST</code> and <code class="language-plaintext highlighter-rouge">NULLS LAST</code> keywords to control how nulls
are sorted, but Morel does not have an equivalent syntax.</p>

<p>Currently, the behavior is the same as SQL’s <code class="language-plaintext highlighter-rouge">NULLS FIRST</code>.  This
happens because Morel sorts datatype values based on the declaration
order of their constructors. The <code class="language-plaintext highlighter-rouge">option</code> type is declared as:</p>

<!-- morel skip
datatype option 'a = NONE | SOME of 'a;
-->

<div class="code-block">
<div class="code-input"><span class="kr">datatype</span> <span class="n">option</span> <span class="nn">'a</span> <span class="p">=</span> <span class="n">NONE</span> <span class="p">|</span> <span class="n">SOME</span> <span class="kr">of</span> <span class="nn">'a</span><span class="p">;</span></div>
</div>

<p>Since <code class="language-plaintext highlighter-rouge">NONE</code> appears before <code class="language-plaintext highlighter-rouge">SOME</code> in this declaration, the <code class="language-plaintext highlighter-rouge">NONE</code>
value sorts lower than all <code class="language-plaintext highlighter-rouge">SOME</code> values:</p>

<!-- morel
from i in [SOME 1, SOME ~100, NONE]
  order i;
> val it = [NONE,SOME ~100,SOME 1] : int option list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="n">SOME</span> <span class="mi">1</span><span class="p">,</span> <span class="n">SOME</span> ~<span class="mi">100</span><span class="p">,</span> <span class="n">NONE</span><span class="p">]</span>
  <span class="kr">order</span> <span class="n">i</span><span class="p">;</span></div>
<div class="code-output">val it = [NONE,SOME ~100,SOME 1] : int option list</div>
</div>

<p>We haven’t yet figured out how to express the equivalent of <code class="language-plaintext highlighter-rouge">NULLS
LAST</code>.  One idea is to add a <code class="language-plaintext highlighter-rouge">noneLast</code> datatype</p>

<!-- morel skip
datatype 'a noneLast = NONE_LAST of 'a;
-->

<div class="code-block">
<div class="code-input"><span class="kr">datatype</span> <span class="nn">'a</span> <span class="n">noneLast</span> <span class="p">=</span> <span class="n">NONE_LAST</span> <span class="kr">of</span> <span class="nn">'a</span><span class="p">;</span></div>
</div>

<p>and use it in a query like this:</p>

<!-- morel skip
from i in [SOME 1, SOME ~100, NONE]
  order NONE_LAST i;
> val it = [SOME ~100, SOME 1, NONE] : int option list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="n">SOME</span> <span class="mi">1</span><span class="p">,</span> <span class="n">SOME</span> ~<span class="mi">100</span><span class="p">,</span> <span class="n">NONE</span><span class="p">]</span>
  <span class="kr">order</span> <span class="n">NONE_LAST</span> <span class="n">i</span><span class="p">;</span></div>
<div class="code-output">val it = [SOME ~100, SOME 1, NONE] : int option list</div>
</div>

<p>When we use <code class="language-plaintext highlighter-rouge">NONE_LAST</code> and <code class="language-plaintext highlighter-rouge">DESC</code> together in a query</p>

<!-- morel skip
from i in [SOME 1, SOME ~100, NONE]
  order DESC (NONE_LAST i);
> val it = [NONE, SOME 1, SOME ~100] : int option list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="n">SOME</span> <span class="mi">1</span><span class="p">,</span> <span class="n">SOME</span> ~<span class="mi">100</span><span class="p">,</span> <span class="n">NONE</span><span class="p">]</span>
  <span class="kr">order</span> <span class="n">DESC</span> <span class="p">(</span><span class="n">NONE_LAST</span> <span class="n">i</span><span class="p">);</span></div>
<div class="code-output">val it = [NONE, SOME 1, SOME ~100] : int option list</div>
</div>

<p>the <code class="language-plaintext highlighter-rouge">NONE</code> value appears first. It’s what we asked for,
but not what we expected if we were expecting <code class="language-plaintext highlighter-rouge">DESC</code>
and <code class="language-plaintext highlighter-rouge">NONE_LAST</code> to commute.</p>

<p>Until we figure out something intuitive, we won’t have a
solution for <code class="language-plaintext highlighter-rouge">NULLS LAST</code> yet.</p>

<h3 id="comparator-functions-1">Comparator functions</h3>

<p>Under the “make hard things possible” principle, we might still want
to support comparator functions at some point. The syntax could be as
follows:</p>

<pre><code><i>step</i> &rarr; ...
  | <b>order</b> <i>exp</i>
  | <b>order using</b> <i>comparator</i></code></pre>

<p>Is value-based sorting strictly less powerful than comparator
functions? It’s an interesting theoretical question, and I honestly
don’t know. A comparator function can be an arbitrarily complex piece
of code — but perhaps it is always possible to create a value that
matches the structure of the code.</p>

<h3 id="aggregation-syntax">Aggregation syntax</h3>

<p>The syntax for <code class="language-plaintext highlighter-rouge">group</code> and <code class="language-plaintext highlighter-rouge">compute</code> steps is still not an expression.
For Morel 0.8 and beyond, we will be looking at
<a href="https://github.com/hydromatic/morel/issues/288">several improvements</a>.</p>

<p>First, making the group-key and compute-items an expression, with
field aliasing provided via record syntax, as in the current <code class="language-plaintext highlighter-rouge">yield</code>
step.</p>

<p>Second, allowing complex compute expressions with expressions both
inside and outside the aggregate function, as in the SQL expression
“<code class="language-plaintext highlighter-rouge">1 + AVG(sal * 2)</code>”. This will mean the <code class="language-plaintext highlighter-rouge">of</code> keyword, which is
currently part of the <em>agg</em> syntax, will be transitioned to a new
keyword that is part of the expression syntax, possibly <code class="language-plaintext highlighter-rouge">over</code>.</p>

<p>Third, further explore the relationship between the argument to an
aggregate function and a query. Noting that SQL aggregate function
syntax by now includes most relational operators (<code class="language-plaintext highlighter-rouge">FILTER</code>,
<code class="language-plaintext highlighter-rouge">DISTINCT</code>, <code class="language-plaintext highlighter-rouge">WITHIN DISTINCT</code>, <code class="language-plaintext highlighter-rouge">ORDER BY</code>) consider making the
argument (the <code class="language-plaintext highlighter-rouge">over</code> keyword just mentioned) a kind of query
expression.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Making sorting expression-based represents more than just a syntax
change – it exemplifies Morel’s commitment to principled language
design. By eliminating the special-case syntax for <code class="language-plaintext highlighter-rouge">order</code>, we’ve
resolved parsing ambiguities and enabled new forms of query
composition.</p>

<p>In the next few releases, we shall continue to evolve Morel to make it
more uniform and composable. The result, we hope, will be a query
language that feels both familiar to SQL users and naturally
functional to developers who think in terms of higher-order functions
and data transformation pipelines.</p>

<p>To find out more about Morel, read about its
<a href="/2020/02/25/morel-a-functional-language-for-data.html">goals</a>
and <a href="/2020/03/03/morel-basics.html">basic language</a>, peruse the
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md">query reference</a>
or
<a href="https://github.com/hydromatic/morel/blob/main/docs/reference.md">language reference</a>,
or download it from <a href="https://github.com/hydromatic/morel/">GitHub</a> and
give it a try.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">How we simplified the syntax of <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>&#39;s &quot;order&quot; step <a href="https://t.co/pLUHBVoURN">https://t.co/pLUHBVoURN</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1936229301621604772?ref_src=twsrc%5Etfw">June 21, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-06-20-sorting-on-expressions.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[Morel’s design philosophy of “everything is an expression” has transformed how we think about queries, making them more composable and flexible than traditional SQL. One stubborn holdout was the order step, which required a special syntax with comma-separated order-items rather than a single expression. In this post, we describe how we evolved the syntax of the order step in Morel release 0.7, and the benefits of this change.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Morel release 0.7.0</title><link href="http://blog.hydromatic.net/2025/06/08/morel-release-0-7-0.html" rel="alternate" type="text/html" title="Morel release 0.7.0" /><published>2025-06-08T13:00:00-07:00</published><updated>2025-06-08T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/06/08/morel-release-0-7-0</id><content type="html" xml:base="http://blog.hydromatic.net/2025/06/08/morel-release-0-7-0.html"><![CDATA[<p>I am pleased to announce Morel
<a href="https://github.com/hydromatic/morel/blob/main/HISTORY.md#070--2025-06-07">release 0.7.0</a>,
just one month after
<a href="https://github.com/hydromatic/morel/blob/main/HISTORY.md#060--2025-05-02">release 0.6.0</a>.</p>

<p>This release has actually been under development for a long time.
<a href="#1-ordered-and-unordered-collections-and-queries">Ordered and unordered collections and queries</a>,
which are the centerpiece of this release, required major changes to
the type inference algorithm, not to mention a new
<a href="https://github.com/hydromatic/morel/issues/235">data type</a> (<code class="language-plaintext highlighter-rouge">bag</code>),
<a href="https://github.com/hydromatic/morel/issues/277">query step</a> (<code class="language-plaintext highlighter-rouge">unorder</code>),
and
<a href="https://github.com/hydromatic/morel/issues/276">expression</a> (<code class="language-plaintext highlighter-rouge">ordinal</code>).
The type inference changes have been under development for six months
(during which time there were two other Morel releases), and were so
extensive that we got
<a href="#2-function-overloading">function overloading</a> practically for free.</p>

<p>There are other changes to query syntax:
<a href="#3-sorting-on-expressions">sorting on expressions</a>,
<a href="#4-atomic-yield-steps">atomic <code class="language-plaintext highlighter-rouge">yield</code> steps</a>, and
<a href="#5-set-operators-in-pipelines">set operators in pipelines</a>.</p>

<p>Morel aims to be a solid implementation of Standard ML and good
general-purpose programming language, in addition to being a
revolutionary query language, which means gradually completing our
implementation of Standard ML’s
<a href="https://smlfamily.github.io/Basis/">Basis Library</a>. This release we
have completed the
<a href="#6-string-and-char-structures"><code class="language-plaintext highlighter-rouge">String</code> and <code class="language-plaintext highlighter-rouge">Char</code> structures</a>.</p>

<p>Let’s explore the key features. For complete details, see the
<a href="https://github.com/hydromatic/morel/blob/main/HISTORY.md#070--2025-06-07">official release notes</a>.</p>

<h2 id="1-ordered-and-unordered-collections-and-queries">1. Ordered and unordered collections and queries</h2>

<p>The biggest change in 0.7.0 is the introduction of
<a href="https://github.com/hydromatic/morel/issues/273">ordered and unordered collections and queries</a>.
Previously, every query was over a <code class="language-plaintext highlighter-rouge">list</code> type, whose elements were
ordered and duplicates were allowed.</p>

<p>But saying that every collection and query is over a <code class="language-plaintext highlighter-rouge">list</code> type
is a white lie. Consider this query:</p>

<!-- morel skip
from e in scott.emps
  where e.sal > 1000.0
  yield e.ename;
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">.</span><span class="mi">0</span>
  <span class="kr">yield</span> <span class="nn">e</span><span class="p">.</span><span class="n">ename</span><span class="p">;</span></div>
</div>

<p>The collection <code class="language-plaintext highlighter-rouge">scott.emps</code> maps to the <code class="language-plaintext highlighter-rouge">EMP</code> table in the <code class="language-plaintext highlighter-rouge">scott</code>
database, and Morel’s goal is to push as much of the processing as
possible to where the data resides. In this case, Morel can generate
the SQL query</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">ENAME</span>
<span class="k">FROM</span> <span class="n">SCOTT</span><span class="p">.</span><span class="n">EMP</span>
<span class="k">WHERE</span> <span class="n">SAL</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>

<p>SQL makes no guarantees about the order of results. If you execute
the query twice, a DBMS is free to return the results in a different
order each time. So Morel is being dishonest if it says that result
is a <code class="language-plaintext highlighter-rouge">list</code>.</p>

<p>Could we redefine <code class="language-plaintext highlighter-rouge">list</code> so that its iteration order is undefined?
Yes, but then we would be short-changing queries such as</p>

<!-- morel
from i in ["a", "b"],
    j in [1, 2, 3]
  yield (i, j);
> val it = [("a",1),("a",2),("a",3),("b",1),("b",2),("b",3)]
>   : (string * int) list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="s2">"a"</span><span class="p">,</span> <span class="s2">"b"</span><span class="p">],</span>
    <span class="nv">j</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">);</span></div>
<div class="code-output">val it = [("a",1),("a",2),("a",3),("b",1),("b",2),("b",3)]
  : (string * int) list</div>
</div>

<p>which do have a defined order.</p>

<p>The fact is – even though the relational model tells us it ain’t so
– some data sets are ordered, and some are unordered. Adding distinct
<code class="language-plaintext highlighter-rouge">bag</code> and <code class="language-plaintext highlighter-rouge">list</code> types, relational operators that can work on both,
and relational operators to convert between them, was the way to go.</p>

<p>The features that we implemented are described in the article
“<a href="http://blog.hydromatic.net/2025/06/06/ordered-unordered.html">Ordered and unordered data</a>”.</p>

<h2 id="2-function-overloading">2. Function overloading</h2>

<p>In Standard ML, and in Morel until recently, a name could only have
one binding.  Functions are values, and therefore inhabit the same
namespace as regular values.  If I declare <code class="language-plaintext highlighter-rouge">x</code> to be an <code class="language-plaintext highlighter-rouge">int</code> value</p>

<!-- morel skip
val x = 42;
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">x</span> <span class="p">=</span> <span class="mi">42</span><span class="p">;</span></div>
</div>

<p>and then later try to declare <code class="language-plaintext highlighter-rouge">x</code> to be a function</p>

<!-- morel skip
val x = fn y => y + 1;
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">x</span> <span class="p">=</span> <span class="kr">fn</span> <span class="n">y</span> <span class="o">=&gt;</span> <span class="n">y</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span></div>
</div>

<p>then the previous declaration of <code class="language-plaintext highlighter-rouge">x</code> is no longer accessible.</p>

<!-- morel fail
val z = x - 2;
> stdIn:1.9 Error: unbound variable or constructor: x
>   raised at: stdIn:1.9
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">z</span> <span class="p">=</span> <span class="n">x</span> <span class="o">-</span> <span class="mi">2</span><span class="p">;</span></div>
<div class="code-error">stdIn:1.9 Error: unbound variable or constructor: x
  raised at: stdIn:1.9</div>
</div>

<p>To create
<a href="https://github.com/hydromatic/morel/issues/237">overloaded functions</a>,
we need declare that an identifier is special; we do this using the
new <code class="language-plaintext highlighter-rouge">over</code> keyword:</p>

<!-- morel
over f;
> over f
-->

<div class="code-block">
<div class="code-input"><span class="kr">over</span> <span class="n">f</span><span class="p">;</span></div>
<div class="code-output">over f</div>
</div>

<p>Now we can define several instances of <code class="language-plaintext highlighter-rouge">f</code>:</p>

<!-- morel
val inst f = fn (x : int, y : int) => x + y;
> val f = fn : int * int -> int
val inst f = fn list => length list;
> val f = fn : 'a list -> int
val inst f = fn SOME x => x ^ "!" | NONE => ":(";
> val f = fn : string option -> string
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="kr">inst</span> <span class="nv">f</span> <span class="p">=</span> <span class="kr">fn</span> <span class="p">(</span><span class="n">x</span> <span class="p">:</span> <span class="n">int</span><span class="p">,</span> <span class="n">y</span> <span class="p">:</span> <span class="n">int</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">;</span></div>
<div class="code-output">val f = fn : int * int -&gt; int</div>
<div class="code-input"><span class="kr">val</span> <span class="kr">inst</span> <span class="nv">f</span> <span class="p">=</span> <span class="kr">fn</span> <span class="n">list</span> <span class="o">=&gt;</span> <span class="n">length</span> <span class="n">list</span><span class="p">;</span></div>
<div class="code-output">val f = fn : 'a list -&gt; int</div>
<div class="code-input"><span class="kr">val</span> <span class="kr">inst</span> <span class="nv">f</span> <span class="p">=</span> <span class="kr">fn</span> <span class="n">SOME</span> <span class="n">x</span> <span class="o">=&gt;</span> <span class="n">x</span> ^ <span class="s2">"!"</span> <span class="p">|</span> <span class="n">NONE</span> <span class="o">=&gt;</span> <span class="s2">":("</span><span class="p">;</span></div>
<div class="code-output">val f = fn : string option -&gt; string</div>
</div>

<p>All must be functions, because the overloads are resolved based on
the type of the first argument.</p>

<p>Calls to <code class="language-plaintext highlighter-rouge">f</code> will be resolved based on the types of the arguments:</p>

<!-- morel
(* Call the "int * int -> int" overload. *)
f (7, 8);
> val it = 15 : int
(* Call the "'a list -> int" overload. *)
f ["a", "b", "c"];
> val it = 3 : int
f [1, 2, 3, 4];
> val it = 4 : int
f [];
> val it = 0 : int
(* Call the "string option -> string" overload. *)
f (SOME "happy");
> val it = "happy!" : string
f NONE;
> val it = ":(" : string
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Call the "int * int -&gt; int" overload. *)</span>
<span class="n">f</span> <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span></div>
<div class="code-output">val it = 15 : int</div>
<div class="code-input"><span class="c">(*</span><span class="cm"> Call the "'a list -&gt; int" overload. *)</span>
<span class="n">f</span> <span class="p">[</span><span class="s2">"a"</span><span class="p">,</span> <span class="s2">"b"</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">];</span></div>
<div class="code-output">val it = 3 : int</div>
<div class="code-input"><span class="n">f</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">];</span></div>
<div class="code-output">val it = 4 : int</div>
<div class="code-input"><span class="n">f</span> <span class="p">[];</span></div>
<div class="code-output">val it = 0 : int</div>
<div class="code-input"><span class="c">(*</span><span class="cm"> Call the "string option -&gt; string" overload. *)</span>
<span class="n">f</span> <span class="p">(</span><span class="n">SOME</span> <span class="s2">"happy"</span><span class="p">);</span></div>
<div class="code-output">val it = "happy!" : string</div>
<div class="code-input"><span class="n">f</span> <span class="n">NONE</span><span class="p">;</span></div>
<div class="code-output">val it = ":(" : string</div>
</div>

<!-- morel fail
(* No overloads match "int option" or "(int, int, int)" arguments. *)
f (SOME 42);
> 0.0-0.0 Error: Cannot deduce type: no valid overloads
>   raised at: 0.0-0.0
f (1, 2, 3);
> 0.0-0.0 Error: Cannot deduce type: no valid overloads
>   raised at: 0.0-0.0
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> No overloads match "int option" or "(int, int, int)" arguments. *)</span>
<span class="n">f</span> <span class="p">(</span><span class="n">SOME</span> <span class="mi">42</span><span class="p">);</span></div>
<div class="code-error">0.0-0.0 Error: Cannot deduce type: no valid overloads
  raised at: 0.0-0.0</div>
<div class="code-input"><span class="n">f</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span></div>
<div class="code-error">0.0-0.0 Error: Cannot deduce type: no valid overloads
  raised at: 0.0-0.0</div>
</div>

<h2 id="3-sorting-on-expressions">3. Sorting on expressions</h2>

<p>There are only a few places in Morel syntax where you do not use an
expression, and the <code class="language-plaintext highlighter-rouge">order</code> step used to be one of them.  Previously,
<code class="language-plaintext highlighter-rouge">order</code> was followed by a list of “order items”, each an expression
optionally followed by <code class="language-plaintext highlighter-rouge">desc</code>. The items were separated by commas, and
the list could not be empty.</p>

<p>The commas were a problem. In the expression</p>

<!-- morel skip
foo (from i in [1, 2, 3] order i desc, j);
-->

<div class="code-block">
<div class="code-input"><span class="n">foo</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span> <span class="kr">order</span> <span class="n">i</span> <span class="kr">desc</span><span class="p">,</span> <span class="n">j</span><span class="p">);</span></div>
</div>

<p>it is not clear whether <code class="language-plaintext highlighter-rouge">j</code> is a second argument for the call to the
function <code class="language-plaintext highlighter-rouge">foo</code> or the second item in the <code class="language-plaintext highlighter-rouge">order</code> clause.</p>

<p>Another problem was the fact that the <code class="language-plaintext highlighter-rouge">order</code> clause could not be
empty. The
<a href="#1-ordered-and-unordered-collections-and-queries">ordered/unordered collections</a>
feature introduced an <code class="language-plaintext highlighter-rouge">unorder</code> step to convert a <code class="language-plaintext highlighter-rouge">list</code> to a <code class="language-plaintext highlighter-rouge">bag</code>,
and we need the opposite of that, a trivial sort whose
key has the same value for every element.</p>

<p>The answer was to
<a href="https://github.com/hydromatic/morel/issues/244">make the argument to <code class="language-plaintext highlighter-rouge">order</code> an expression</a>.
A composite sort specification is now a tuple, still separated by
commas, but now enclosed in parentheses.  If a sort key is descending,
you now wrap it in the <code class="language-plaintext highlighter-rouge">Descending</code> data type by preceding it with the
<code class="language-plaintext highlighter-rouge">DESC</code>.  Thus:</p>

<!-- morel skip
(* Old syntax *)
from e in scott.emps
  order e.job, e.sal desc;

(* New syntax *)
from e in scott.emps
  order (e.job, DESC e.sal);
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Old syntax *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="kr">desc</span><span class="p">;</span>

<span class="c">(*</span><span class="cm"> New syntax *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">);</span></div>
</div>

<p>You can now sort by any data type, including tuples, records,
sum-types such as <code class="language-plaintext highlighter-rouge">Option</code> and <code class="language-plaintext highlighter-rouge">Descending</code>, lists, bags, and any
combination thereof.</p>

<p>To achieve the trivial sort, you can sort by any constant value, such
as the integer <code class="language-plaintext highlighter-rouge">0</code> or the <code class="language-plaintext highlighter-rouge">Option</code> constructor <code class="language-plaintext highlighter-rouge">NONE</code>, but
conventionally you would sort by the empty tuple <code class="language-plaintext highlighter-rouge">()</code>:</p>

<!-- morel
from e in scott.emps
  yield e.ename
  order ();
> val it =
>   ["SMITH","ALLEN","WARD","JONES","MARTIN","BLAKE","CLARK","SCOTT","KING",
>    "TURNER","ADAMS","JAMES",...] : string list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">yield</span> <span class="nn">e</span><span class="p">.</span><span class="n">ename</span>
  <span class="kr">order</span> <span class="p">();</span></div>
<div class="code-output">val it =
  ["SMITH","ALLEN","WARD","JONES","MARTIN","BLAKE","CLARK","SCOTT","KING",
   "TURNER","ADAMS","JAMES",...] : string list</div>
</div>

<p>The key thing is that the result is a <code class="language-plaintext highlighter-rouge">list</code>.  The elements are in
arbitrary order (because any order is consistent with the empty sort
key) but in converting the collection to a <code class="language-plaintext highlighter-rouge">list</code> the arbitrary order
has become frozen and repeatable.</p>

<h2 id="4-atomic-yield-steps">4. Atomic yield steps</h2>

<p>At any step in a Morel query, there are generally several named fields
you can use to reference parts of the current row.  For example, the
<code class="language-plaintext highlighter-rouge">where</code> step in the following query refers to both fields, <code class="language-plaintext highlighter-rouge">i</code> and
<code class="language-plaintext highlighter-rouge">j</code>.</p>

<!-- morel silent
Sys.set ("output", "tabular");
> val it = () : unit
-->
<!-- morel
from i in [1, 2, 3],
    j in [4, 5, 6]
  where i + j > 7;
> i j
> - -
> 2 6
> 3 5
> 3 6
>
> val it : {i:int, j:int} list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
    <span class="nv">j</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
  <span class="kr">where</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span> <span class="o">&gt;</span> <span class="mi">7</span><span class="p">;</span></div>
<div class="code-output">i j
- -
2 6
3 5
3 6

val it : {i:int, j:int} list</div>
</div>

<p>But there is one circumstance where a step does not produce any named
fields: a <code class="language-plaintext highlighter-rouge">yield</code> whose expression is not a record, what we call an
“atomic yield”. Here is an example:</p>

<!-- morel skip
from i in [1, 2, 3],
    j in [4, 5, 6]
  yield i + j;
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
    <span class="nv">j</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="p">;</span></div>
</div>

<p>That query is valid, but suppose we wished to sort or filter the
results.  If we added an <code class="language-plaintext highlighter-rouge">order</code> or <code class="language-plaintext highlighter-rouge">where</code> step it would have no way
to refer to the current row. We allowed atomic yields because we
needed queries with non-record elements, but we made a rule that the
atomic yield had to be the last step.</p>

<p>That restriction was becoming more of a burden, and the final straw
was ordered/unordered queries, which often end in <code class="language-plaintext highlighter-rouge">order</code> or
<code class="language-plaintext highlighter-rouge">unorder</code>. So we decided to fix the problem.</p>

<p>We
<a href="https://github.com/hydromatic/morel/issues/265">added a new expression, <code class="language-plaintext highlighter-rouge">current</code></a>,
that refers to the current element. (It is only available in query
steps, but you can use it inside a sub-expression or sub-query.)  If
the value is atomic, <code class="language-plaintext highlighter-rouge">current</code> is that value; if there are named
fields, <code class="language-plaintext highlighter-rouge">current</code> is a record consisting of those fields. (In the
previous example, <code class="language-plaintext highlighter-rouge">current</code> would be equivalent to <code class="language-plaintext highlighter-rouge">{i, j}</code>.)</p>

<p>If a <code class="language-plaintext highlighter-rouge">yield</code> is atomic but the expression has a clear name, as in
<code class="language-plaintext highlighter-rouge">yield i</code> or <code class="language-plaintext highlighter-rouge">yield e.deptno</code>, you can also use that name.  (The
expression is still considered atomic, and the result of the query
will be a collection of that type, not a collection of records.)</p>

<p>Here are some examples of <code class="language-plaintext highlighter-rouge">current</code> in action.</p>

<!-- morel
from i in [1, 2, 3],
    j in [4, 5, 6]
  yield i + j
  order DESC current;
> val it = [9,8,8,7,7,7,6,6,5] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
    <span class="nv">j</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span>
  <span class="kr">order</span> <span class="n">DESC</span> <span class="kr">current</span><span class="p">;</span></div>
<div class="code-output">val it = [9,8,8,7,7,7,6,6,5] : int list</div>
</div>

<!-- morel
from maker in ["ford", "ferrari"],
    color in ["red", "green"]
  order current.color;
> color maker
> ----- -------
> green ford
> green ferrari
> red   ford
> red   ferrari
>
> val it : {color:string, maker:string} list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">maker</span> <span class="kr">in</span> <span class="p">[</span><span class="s2">"ford"</span><span class="p">,</span> <span class="s2">"ferrari"</span><span class="p">],</span>
    <span class="nv">color</span> <span class="kr">in</span> <span class="p">[</span><span class="s2">"red"</span><span class="p">,</span> <span class="s2">"green"</span><span class="p">]</span>
  <span class="kr">order</span> <span class="kr">current</span><span class="p">.</span><span class="n">color</span><span class="p">;</span></div>
<div class="code-output">color maker
----- -------
green ford
green ferrari
red   ford
red   ferrari

val it : {color:string, maker:string} list</div>
</div>

<!-- morel
from i in [1, 2, 3, 4]
  yield 4 * (i mod 2) + (i div 2)
  order current;
> val it = [1,2,4,5] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="mi">4</span> <span class="o">*</span> <span class="p">(</span><span class="n">i</span> <span class="kr">mod</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">i</span> <span class="kr">div</span> <span class="mi">2</span><span class="p">)</span>
  <span class="kr">order</span> <span class="kr">current</span><span class="p">;</span></div>
<div class="code-output">val it = [1,2,4,5] : int list</div>
</div>

<!-- morel
from e in scott.emps
  yield e.deptno
  distinct
  order current;
> val it = [10,20,30] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">yield</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span>
  <span class="kr">distinct</span>
  <span class="kr">order</span> <span class="kr">current</span><span class="p">;</span></div>
<div class="code-output">val it = [10,20,30] : int list</div>
</div>

<!-- morel
from e in scott.emps
  yield e.deptno
  distinct
  order deptno;
> val it = [10,20,30] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">yield</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span>
  <span class="kr">distinct</span>
  <span class="kr">order</span> <span class="n">deptno</span><span class="p">;</span></div>
<div class="code-output">val it = [10,20,30] : int list</div>
</div>

<h2 id="5-set-operators-in-pipelines">5. Set operators in pipelines</h2>

<p>The set operators (<code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">intersect</code> and <code class="language-plaintext highlighter-rouge">except</code>) were previously
available via functions but now have
<a href="https://github.com/hydromatic/morel/issues/253">dedicated steps</a> in
the query pipeline.</p>

<p>The steps have slightly different semantics for ordered and unordered
collections, and have an optional <code class="language-plaintext highlighter-rouge">distinct</code> keyword to eliminate
duplicates.</p>

<p>For example, here is a query that finds all employees in departments
10 and 20, but excludes those who are managers or clerks:</p>

<!-- morel skip
from e in scott.emps
  where e.deptno = 10
  union (from e in scott.emps where e.deptno = 20)
  except (from e in scott.emps where e.job = "MANAGER"),
     (from e in scott.emps where e.job = "CLERK");
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="mi">10</span>
  <span class="kr">union</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="mi">20</span><span class="p">)</span>
  <span class="kr">except</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"MANAGER"</span><span class="p">),</span>
     <span class="p">(</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"CLERK"</span><span class="p">);</span></div>
</div>

<p>If you have ever wondered about the semantics of <code class="language-plaintext highlighter-rouge">intersect</code> and
<code class="language-plaintext highlighter-rouge">except</code> with duplicates, wonder no more!
<a href="/2025/06/03/intersect-fractions.html">INTERSECT ALL, EXCEPT ALL, and the arithmetic of fractions</a>
explains everything using a fun example.</p>

<h2 id="6-string-and-char-structures">6. String and Char structures</h2>

<p>Morel now includes complete
<a href="https://github.com/hydromatic/morel/issues/279"><code class="language-plaintext highlighter-rouge">String</code></a> and
<a href="https://github.com/hydromatic/morel/issues/264"><code class="language-plaintext highlighter-rouge">Char</code></a> structures
following the
<a href="https://smlfamily.github.io/Basis/">Standard ML Basis Library</a>
specification.</p>

<p>This gives you comprehensive text manipulation capabilities:</p>

<!-- morel
String.size "hello world";
> val it = 11 : int

String.substring ("hello world", 6, 5);
> val it = "world" : string

String.tokens (fn c => c = #" ") "hello world morel";
> val it = ["hello","world","morel"] : string list

Char.isAlpha #"a";
> val it = true : bool

Char.toUpper #"a";
> val it = #"A" : char

String.map Char.toUpper "hello";
> val it = "HELLO" : string
-->

<div class="code-block">
<div class="code-input"><span class="nn">String</span><span class="p">.</span><span class="n">size</span> <span class="s2">"hello world"</span><span class="p">;</span></div>
<div class="code-output">val it = 11 : int</div>
<div class="code-input">
<span class="nn">String</span><span class="p">.</span><span class="n">substring</span> <span class="p">(</span><span class="s2">"hello world"</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">5</span><span class="p">);</span></div>
<div class="code-output">val it = "world" : string</div>
<div class="code-input">
<span class="nn">String</span><span class="p">.</span><span class="n">tokens</span> <span class="p">(</span><span class="kr">fn</span> <span class="n">c</span> <span class="o">=&gt;</span> <span class="n">c</span> <span class="p">=</span> #<span class="s2">" "</span><span class="p">)</span> <span class="s2">"hello world morel"</span><span class="p">;</span></div>
<div class="code-output">val it = ["hello","world","morel"] : string list</div>
<div class="code-input">
<span class="nn">Char</span><span class="p">.</span><span class="n">isAlpha</span> #<span class="s2">"a"</span><span class="p">;</span></div>
<div class="code-output">val it = true : bool</div>
<div class="code-input">
<span class="nn">Char</span><span class="p">.</span><span class="n">toUpper</span> #<span class="s2">"a"</span><span class="p">;</span></div>
<div class="code-output">val it = #"A" : char</div>
<div class="code-input">
<span class="nn">String</span><span class="p">.</span><span class="n">map</span> <span class="nn">Char</span><span class="p">.</span><span class="n">toUpper</span> <span class="s2">"hello"</span><span class="p">;</span></div>
<div class="code-output">val it = "HELLO" : string</div>
</div>

<p>These structures provide everything you need for serious text
processing, from basic operations like substring extraction to
advanced features like tokenization and character classification.</p>

<h2 id="7-breaking-changes">7. Breaking changes</h2>

<p>This release includes some breaking changes to be aware of.</p>

<h3 id="database-schema-updates">Database schema updates</h3>

<p>The <code class="language-plaintext highlighter-rouge">scott</code> sample database now uses
<a href="https://github.com/hydromatic/morel/issues/255">pluralized table names</a>,
mapping the <code class="language-plaintext highlighter-rouge">emps</code> value maps to the <code class="language-plaintext highlighter-rouge">EMP</code> table, and <code class="language-plaintext highlighter-rouge">depts</code> to the
<code class="language-plaintext highlighter-rouge">DEPT</code> table.</p>

<!-- morel skip
(* Old *)
from e in scott.emp
  join d in scott.dept on e.deptno = d.deptno;

(* New *)
from e in scott.emps
  join d in scott.depts on e.deptno = d.deptno;
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Old *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emp</span>
  <span class="kr">join</span> <span class="nv">d</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">dept</span> <span class="kr">on</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="nn">d</span><span class="p">.</span><span class="n">deptno</span><span class="p">;</span>

<span class="c">(*</span><span class="cm"> New *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">join</span> <span class="nv">d</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">depts</span> <span class="kr">on</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="nn">d</span><span class="p">.</span><span class="n">deptno</span><span class="p">;</span></div>
</div>

<p>This change aligns with the modern programming convention that
collections have plural names.</p>

<h3 id="type-based-orderings">Type-based orderings</h3>

<p>The previous <code class="language-plaintext highlighter-rouge">order</code> syntax no longer works.</p>

<p>You should convert a following <code class="language-plaintext highlighter-rouge">desc</code> to preceding <code class="language-plaintext highlighter-rouge">DESC</code>:</p>

<!-- morel skip
(* Old syntax *)
from e in scott.emps
  order e.sal desc;

(* New syntax *)
from e in scott.emps
  order DESC e.sal;
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Old syntax *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="kr">desc</span><span class="p">;</span>

<span class="c">(*</span><span class="cm"> New syntax *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">;</span></div>
</div>

<p>and put parentheses around composite orderings:</p>

<!-- morel skip
(* Old syntax *)
from e in scott.emps
  order e.job, e.sal desc;

(* New syntax *)
from e in scott.emps
  order (e.job, DESC e.sal);
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Old syntax *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="kr">desc</span><span class="p">;</span>

<span class="c">(*</span><span class="cm"> New syntax *)</span>
<span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">);</span></div>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>Release 0.7.0 represents a major evolution in Morel’s
capabilities. Extensions to the query language, type system, and
standard library make Morel a good solution for a wide range of
data processing tasks, from simple queries to complex data
transformations.</p>

<p>As always, you can get started with Morel by visiting
<a href="https://github.com/hydromatic/morel">GitHub</a>.
For more background, read about its
<a href="/2020/02/25/morel-a-functional-language-for-data.html">goals</a>
and <a href="/2020/03/03/morel-basics.html">basic language</a>,
and find a full definition of the language in the
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md">query reference</a>
and the
<a href="https://github.com/hydromatic/morel/blob/main/docs/reference.md">language reference</a>.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">I&#39;m pleased to announce release 0.7 of <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>! This is a huge release, adding support for ordered/unordered data, set operators, and revised order syntax. A major rework of Morel&#39;s type inference algorithm delivered function overloading. <a href="https://t.co/hERffT3Kxn">https://t.co/hERffT3Kxn</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1931931352729079968?ref_src=twsrc%5Etfw">June 9, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-06-08-morel-release-0-7-0.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[I am pleased to announce Morel release 0.7.0, just one month after release 0.6.0.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Ordered and unordered data</title><link href="http://blog.hydromatic.net/2025/06/06/ordered-unordered.html" rel="alternate" type="text/html" title="Ordered and unordered data" /><published>2025-06-06T13:00:00-07:00</published><updated>2025-06-06T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/06/06/ordered-unordered</id><content type="html" xml:base="http://blog.hydromatic.net/2025/06/06/ordered-unordered.html"><![CDATA[<p>Despite what the relational model says, some data is <em>ordered</em>.</p>

<p>I’m not talking about <em>sorted</em> data. If you sort a collection,
applying a comparator function to its elements, then you have no
more information than you had before.</p>

<p>No, the integer list</p>

<!-- morel skip
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
> val it = [3,1,4,1,5,9,2,6,5,3] : int list
-->

<div class="code-block">
<div class="code-input"><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span></div>
<div class="code-output">val it = [3,1,4,1,5,9,2,6,5,3] : int list</div>
</div>

<p>and the string list</p>

<!-- morel skip
["Shall I compare thee to a summer's day?",
  "Thou art more lovely and more temperate",
  "Rough winds do shake the darling buds of May",
  "And summer's lease hath all too short a date"]
> val it =
>   ["Shall I compare thee to a summer's day?",
>    "Thou art more lovely and more temperate",
>    "Rough winds do shake the darling buds of May",
>    "And summer's lease hath all too short a date"] : string list
-->

<div class="code-block">
<div class="code-input"><span class="p">[</span><span class="s2">"Shall I compare thee to a summer's day?"</span><span class="p">,</span>
  <span class="s2">"Thou art more lovely and more temperate"</span><span class="p">,</span>
  <span class="s2">"Rough winds do shake the darling buds of May"</span><span class="p">,</span>
  <span class="s2">"And summer's lease hath all too short a date"</span><span class="p">]</span></div>
<div class="code-output">val it =
  ["Shall I compare thee to a summer's day?",
   "Thou art more lovely and more temperate",
   "Rough winds do shake the darling buds of May",
   "And summer's lease hath all too short a date"] : string list</div>
</div>

<p>depend on the order of their elements for their meaning.</p>

<p>But of course, some data is <em>unordered</em>, for good reason. A relational
database would be foolish to guarantee that if you write rows into a
table in a particular order, they will be read back in the same
order. Such a guarantee would seriously limit the database’s
scalability.</p>

<p>This post is about how we allow ordered and unordered data to coexist
in <a href="https://github.com/hydromatic/morel">Morel</a>.</p>

<p>We achieved this with a collection of new features, including
<a href="https://github.com/hydromatic/morel/issues/235">adding a <code class="language-plaintext highlighter-rouge">bag</code> type</a>,
the
<a href="https://github.com/hydromatic/morel/issues/273">ordered relational operators</a>,
the
<a href="https://github.com/hydromatic/morel/issues/276"><code class="language-plaintext highlighter-rouge">ordinal</code> keyword</a>,
and a new
<a href="https://github.com/hydromatic/morel/issues/277"><code class="language-plaintext highlighter-rouge">unorder</code> step</a>.
All of these features will appear shortly in Morel release 0.7.</p>

<h2 id="list-and-bag-types">List and bag types</h2>

<p>As a functional query language, Morel spans the worlds of database and
functional programming.</p>

<p>Databases’ fundamental type, the relation, is an unordered collection
of records.  (Though curiously, modern SQL allows columns to contain
“nested tables”, which can be either of the ordered <code class="language-plaintext highlighter-rouge">ARRAY</code> type or
the unordered <code class="language-plaintext highlighter-rouge">MULTISET</code> type.)</p>

<p>Functional programming languages’ fundamental type is the list, an
ordered type. Functional programs are often defined by structural
induction on lists.  For example, the function</p>

<!-- morel
fun allPositive [] = true
  | allPositive (x::xs) = x > 0 andalso allPositive xs;
> val allPositive = fn : int list -> bool
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">allPositive</span> <span class="p">[]</span> <span class="p">=</span> <span class="n">true</span>
  <span class="p">|</span> <span class="n">allPositive</span> <span class="p">(</span><span class="n">x</span><span class="o">::</span><span class="n">xs</span><span class="p">)</span> <span class="p">=</span> <span class="n">x</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="kr">andalso</span> <span class="n">allPositive</span> <span class="n">xs</span><span class="p">;</span></div>
<div class="code-output">val allPositive = fn : int list -&gt; bool</div>
</div>

<p>inductively defines that a list of numbers is “all-positive” if it is
empty, or if its first element is positive and the rest of the list is
“all-positive”. This kind of inductive definition requires a firm
distinction between the first element of a list and the rest of the
list, a distinction that is not present in an unordered collection.</p>

<p>So, Morel needs to support both ordered and unordered collections.</p>

<p>Earlier versions of Morel papered over the difference. All collections
had type <code class="language-plaintext highlighter-rouge">list</code>, even the unordered collections backed by database
tables. Morel’s relational operators produced results in deterministic
order if you applied them to in-memory collections using the
in-process interpreter, but order was not guaranteed when Morel
converted the query to SQL for execution in a DBMS.</p>

<p>To fix the problem, the first step was to add a <code class="language-plaintext highlighter-rouge">bag</code> type.  (Bag is a
synonym for <a href="https://en.wikipedia.org/wiki/Multiset">multiset</a>,
implying a given element may occur more than once, but iteration order
is not defined.) <code class="language-plaintext highlighter-rouge">bag</code> is the unordered counterpart to the ordered
<code class="language-plaintext highlighter-rouge">list</code> type, and has similar operations.</p>

<!-- morel
val b = bag [3, 1, 4, 1, 5];
> val b = [3,1,4,1,5] : int bag
Bag.length b;
> val it = 5 : int
Bag.toList b;
> val it = [3,1,4,1,5] : int list
Bag.fromList [3, 1, 4, 1, 5];
> val it = [3,1,4,1,5] : int bag
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">b</span> <span class="p">=</span> <span class="n">bag</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">];</span></div>
<div class="code-output">val b = [3,1,4,1,5] : int bag</div>
<div class="code-input"><span class="nn">Bag</span><span class="p">.</span><span class="n">length</span> <span class="n">b</span><span class="p">;</span></div>
<div class="code-output">val it = 5 : int</div>
<div class="code-input"><span class="nn">Bag</span><span class="p">.</span><span class="n">toList</span> <span class="n">b</span><span class="p">;</span></div>
<div class="code-output">val it = [3,1,4,1,5] : int list</div>
<div class="code-input"><span class="nn">Bag</span><span class="p">.</span><span class="n">fromList</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">];</span></div>
<div class="code-output">val it = [3,1,4,1,5] : int bag</div>
</div>

<p>Order-dependent operations from the <code class="language-plaintext highlighter-rouge">list</code> type, such as <code class="language-plaintext highlighter-rouge">hd</code> and
<code class="language-plaintext highlighter-rouge">drop</code>, are defined for <code class="language-plaintext highlighter-rouge">bag</code> instances, but they are not guaranteed
to return the same result every time you call them.</p>

<!-- morel
Bag.hd b;
> val it = 3 : int
Bag.drop (b, 2);
> val it = [4,1,5] : int bag
-->

<div class="code-block">
<div class="code-input"><span class="nn">Bag</span><span class="p">.</span><span class="n">hd</span> <span class="n">b</span><span class="p">;</span></div>
<div class="code-output">val it = 3 : int</div>
<div class="code-input"><span class="nn">Bag</span><span class="p">.</span><span class="n">drop</span> <span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span></div>
<div class="code-output">val it = [4,1,5] : int bag</div>
</div>

<p>Collections backed by database tables now have type <code class="language-plaintext highlighter-rouge">bag</code>:</p>

<!-- morel skip
from e in scott.depts;
> deptno dname      loc
> ------ ---------- --------
> 10     ACCOUNTING NEW YORK
> 20     RESEARCH   DALLAS
> 30     SALES      CHICAGO
> 40     OPERATIONS BOSTON
>
> val it : {deptno:int, dname:string, loc:string} bag
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">depts</span><span class="p">;</span></div>
<div class="code-output">deptno dname      loc
------ ---------- --------
10     ACCOUNTING NEW YORK
20     RESEARCH   DALLAS
30     SALES      CHICAGO
40     OPERATIONS BOSTON

val it : {deptno:int, dname:string, loc:string} bag</div>
</div>

<p>(You may notice that <code class="language-plaintext highlighter-rouge">scott.depts</code> collection, backed by the <code class="language-plaintext highlighter-rouge">DEPT</code>
table of the <code class="language-plaintext highlighter-rouge">SCOTT</code> JDBC data source, has changed its name as well
as its type. It used to be called <code class="language-plaintext highlighter-rouge">scott.dept</code>. Morel collection names
should be plural and lower-case, and improvements to the
<a href="https://github.com/hydromatic/morel/issues/255">name mapping system</a>
make it easier to derive proper collection names.)</p>

<p>Next, we provide relational operators to convert between <code class="language-plaintext highlighter-rouge">list</code> and
<code class="language-plaintext highlighter-rouge">bag</code>.</p>

<h2 id="converting-between-ordered-and-unordered">Converting between ordered and unordered</h2>

<p>Now that queries can reference <code class="language-plaintext highlighter-rouge">list</code> and <code class="language-plaintext highlighter-rouge">bag</code> collections, we need
operators to convert from one to the other. To do this, we use the
existing <code class="language-plaintext highlighter-rouge">order</code> step, and add an <code class="language-plaintext highlighter-rouge">unorder</code> step and an <code class="language-plaintext highlighter-rouge">ordinal</code>
expression.</p>

<p>In previous versions of Morel, the <code class="language-plaintext highlighter-rouge">order</code> step converted a list to a
<code class="language-plaintext highlighter-rouge">list</code> with a different ordering; now its input can be a list <em>or</em> a
bag:</p>

<!-- morel
from i in [3, 1, 4, 1, 5]
  order DESC i;
> val it = [5,4,3,1,1] : int list
from i in bag [3, 1, 4, 1, 5]
  order DESC i;
> val it = [5,4,3,1,1] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
  <span class="kr">order</span> <span class="n">DESC</span> <span class="n">i</span><span class="p">;</span></div>
<div class="code-output">val it = [5,4,3,1,1] : int list</div>
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="n">bag</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
  <span class="kr">order</span> <span class="n">DESC</span> <span class="n">i</span><span class="p">;</span></div>
<div class="code-output">val it = [5,4,3,1,1] : int list</div>
</div>

<p>If the sort key does not create a total ordering, the results will be
nondeterministic but still a list. For example, we can sort integers
so that even numbers occur before odd numbers</p>

<!-- morel skip
from i in bag [3, 1, 4, 1, 5]
  order i mod 2;
> val it = [4, 1, 5, 1, 3]
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="n">bag</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
  <span class="kr">order</span> <span class="n">i</span> <span class="kr">mod</span> <span class="mi">2</span><span class="p">;</span></div>
<div class="code-output">val it = [4, 1, 5, 1, 3]</div>
</div>

<p>or convert a bag to a list in arbitrary order.</p>

<!-- morel skip
from i in bag [3, 1, 4, 1, 5]
  order ();
> val it = [5, 4, 1, 1, 3]
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="n">bag</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
  <span class="kr">order</span> <span class="p">();</span></div>
<div class="code-output">val it = [5, 4, 1, 1, 3]</div>
</div>

<p>To go the opposite direction, the new <code class="language-plaintext highlighter-rouge">unorder</code> step converts a list
to a bag:</p>

<!-- morel
from i in [3, 1, 4, 1, 5]
  unorder;
> val it = [3,1,4,1,5] : int bag
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
  <span class="kr">unorder</span><span class="p">;</span></div>
<div class="code-output">val it = [3,1,4,1,5] : int bag</div>
</div>

<p>(You are also free to apply <code class="language-plaintext highlighter-rouge">unorder</code> to a <code class="language-plaintext highlighter-rouge">bag</code>; it will have no
effect.)</p>

<p>As we said above, a <code class="language-plaintext highlighter-rouge">bag</code> contains less information than its
corresponding <code class="language-plaintext highlighter-rouge">list</code>. If you plan to convert the <code class="language-plaintext highlighter-rouge">bag</code> to a <code class="language-plaintext highlighter-rouge">list</code>
at a later stage, you need to store the ordering in an extra field.
The new <code class="language-plaintext highlighter-rouge">ordinal</code> expression lets us do this:</p>

<!-- morel
from i in [3, 1, 4, 1, 5]
  yield {i, j = ordinal}
  unorder
  order j
  yield i;
> val it = [3,1,4,1,5] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="p">=</span> <span class="kr">ordinal</span><span class="p">}</span>
  <span class="kr">unorder</span>
  <span class="kr">order</span> <span class="n">j</span>
  <span class="kr">yield</span> <span class="n">i</span><span class="p">;</span></div>
<div class="code-output">val it = [3,1,4,1,5] : int list</div>
</div>

<p>The <code class="language-plaintext highlighter-rouge">ordinal</code> expression can be used in an expression in a
step whose input is ordered (except the steps whose expressions are
evaluated before the query starts: <code class="language-plaintext highlighter-rouge">except</code>, <code class="language-plaintext highlighter-rouge">intersect</code>, <code class="language-plaintext highlighter-rouge">skip</code>,
<code class="language-plaintext highlighter-rouge">take</code>, and <code class="language-plaintext highlighter-rouge">union</code>). <code class="language-plaintext highlighter-rouge">ordinal</code> evaluates to 0 for the first element,
1 for the next element, and so on. But as we shall see, the optimizer
avoids evaluating <code class="language-plaintext highlighter-rouge">ordinal</code> if it can.</p>

<p>Here is a query that computes the salary rank of each employee,
then returns only the poorly-paid clerks.</p>

<!-- morel skip
from e in scott.emps
  order e.sal
  yield {e, rank = 1 + ordinal}
  where e.job = "CLERK";
> ename  rank
> ------ ----
> MILLER 9
> ADAMS  12
> JAMES  13
> SMITH  14
>
> val it : {ename:string, rank:int} list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="n">e</span><span class="p">,</span> <span class="n">rank</span> <span class="p">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="kr">ordinal</span><span class="p">}</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"CLERK"</span><span class="p">;</span></div>
<div class="code-output">ename  rank
------ ----
MILLER 9
ADAMS  12
JAMES  13
SMITH  14

val it : {ename:string, rank:int} list</div>
</div>

<p>The main reason to apply <code class="language-plaintext highlighter-rouge">order</code> and <code class="language-plaintext highlighter-rouge">unorder</code> in a query is
to control the target collection type. But there is a more subtle
reason which relates to performance. The ordered and unordered
versions of the relational operators may produce the same results
(modulo ordering) but ordered execution may be less efficient (such
as running with reduced parallelism). If a query contains an <code class="language-plaintext highlighter-rouge">order</code>
or <code class="language-plaintext highlighter-rouge">unorder</code>, the order of the input to that step is irrelevant, and
the optimizer can use a more efficient execution plan.</p>

<p>This, by the way, is why the specification of the <code class="language-plaintext highlighter-rouge">order</code> step does
not guarantee stability. If <code class="language-plaintext highlighter-rouge">order</code> was stable, the optimizer would
have to use ordered execution of upstream steps if the sort key is
not exhaustive.</p>

<p>If you want <code class="language-plaintext highlighter-rouge">order</code> to be stable, you can add <code class="language-plaintext highlighter-rouge">ordinal</code> to the
trailing edge of the sort key:</p>

<!-- morel skip
from e in scott.emps
  order DESC e.sal
  where e.deptno <> 20
  yield {e.ename, e.job, e.sal}
  order (e.job, ordinal);
> val it =
>   [{ename="MILLER",job="CLERK",sal=1300.0},
>    {ename="JAMES",job="CLERK",sal=950.0},
>    {ename="BLAKE",job="MANAGER",sal=2850.0},
>    {ename="CLARK",job="MANAGER",sal=2450.0},
>    {ename="KING",job="PRESIDENT",sal=5000.0},
>    {ename="ALLEN",job="SALESMAN",sal=1600.0},
>    {ename="TURNER",job="SALESMAN",sal=1500.0},
>    {ename="WARD",job="SALESMAN",sal=1250.0},
>    {ename="MARTIN",job="SALESMAN",sal=1250.0}]
>   : {ename:string, job:string, sal:real} list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">order</span> <span class="n">DESC</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="o">&lt;</span><span class="o">&gt;</span> <span class="mi">20</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="nn">e</span><span class="p">.</span><span class="n">ename</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span><span class="p">}</span>
  <span class="kr">order</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="kr">ordinal</span><span class="p">);</span></div>
<div class="code-output">val it =
  [{ename="MILLER",job="CLERK",sal=1300.0},
   {ename="JAMES",job="CLERK",sal=950.0},
   {ename="BLAKE",job="MANAGER",sal=2850.0},
   {ename="CLARK",job="MANAGER",sal=2450.0},
   {ename="KING",job="PRESIDENT",sal=5000.0},
   {ename="ALLEN",job="SALESMAN",sal=1600.0},
   {ename="TURNER",job="SALESMAN",sal=1500.0},
   {ename="WARD",job="SALESMAN",sal=1250.0},
   {ename="MARTIN",job="SALESMAN",sal=1250.0}]
  : {ename:string, job:string, sal:real} list</div>
</div>

<p>Materializing <code class="language-plaintext highlighter-rouge">ordinal</code> as a 1-based, contiguous sequence of integers
is expensive because it forces sequential execution, and the
optimizer will avoid this if it can. In this case, because <code class="language-plaintext highlighter-rouge">ordinal</code>
is used for sorting but is not returned, the optimizer downgrades
<code class="language-plaintext highlighter-rouge">ordinal</code> to a virtual expression. The plan might use an ordered
implementation of the <code class="language-plaintext highlighter-rouge">where</code> and <code class="language-plaintext highlighter-rouge">yield</code> steps followed by a stable
sort, or it might replace <code class="language-plaintext highlighter-rouge">ordinal</code> with the previous sort key
(<code class="language-plaintext highlighter-rouge">DESC e.sal</code>).</p>

<h2 id="ordered-relational-operators">Ordered relational operators</h2>

<p>We need to define the semantics of the relational operators
over all types of collection.</p>

<p>Part of the job has been done already:</p>
<ul>
  <li>The relational model defines the semantics of operators over
<strong>sets</strong> (unordered collections without duplicates).</li>
  <li>The SQL standard specifies the relational operators
over <strong>tables</strong> (unordered collections with duplicates).</li>
  <li>Previous versions of Morel defined semantics for (and implemented)
relational operators over <strong>multisets</strong> (unordered collections with
duplicates).  While the collection type was at the time called
<code class="language-plaintext highlighter-rouge">list</code>, we were actually defining the current <code class="language-plaintext highlighter-rouge">bag</code> type.  Unlike
SQL, elements need not be records.</li>
</ul>

<p>What remains is to define the semantics of queries over <strong>lists</strong>
(ordered collections with duplicates), and for hybrid queries that
combine lists and multisets. (We define hybrid semantics in the <a href="#hybrid-relational-operators">next
section</a>.)</p>

<p>Because a
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md">query</a>
consists of a sequence of steps, each corresponding to a relational
operator, we define the semantics of each step over input that is a
<code class="language-plaintext highlighter-rouge">list</code>:</p>

<ul>
  <li>The first step in a query – <code>from <i>pat</i> in
<i>exp</i></code>, <code>forall <i>pat</i> in <i>exp</i></code>, or
<code>exists <i>pat</i> in <i>exp</i></code> – returns elements in
the same order that they are emitted from <i>exp</i>.</li>
  <li><code>join <i>pat</i> in <i>exp</i> [ on <i>condition</i> ]</code>
for each element from its input evaluates <i>exp</i>, then, in order
of those elements, emits a record consisting of fields of the two
elements, skipping records where <i>condition</i> is false.</li>
  <li>If a <code class="language-plaintext highlighter-rouge">from</code>, <code class="language-plaintext highlighter-rouge">forall</code>, <code class="language-plaintext highlighter-rouge">exists</code> or <code class="language-plaintext highlighter-rouge">join</code> step has more than one
scan, each subsequent scan behaves as if it were a separate <code class="language-plaintext highlighter-rouge">join</code>
step.</li>
  <li><code>yield <i>exp</i></code> preserves order.</li>
  <li><code>where <i>condition</i></code> preserves order, dropping rows
for which <i>condition</i> is false.</li>
  <li><code>skip <i>count</i></code> and <code>take <i>count</i></code>
preserve order (respectively dropping the first <i>count</i> rows,
or taking the first <i>count</i> rows).</li>
  <li><code class="language-plaintext highlighter-rouge">distinct</code> preserves order, emitting only the first occurrence
of each element.</li>
  <li><code>group <i>groupKey<sub>1</sub></i>, ...,
<i>groupKey<sub>g</sub></i> [ compute <i>agg<sub>1</sub></i>, ...,
<i>agg<sub>a</sub></i> ]</code> preserves order, emitting groups in
the order that the first element in the group was seen; each
aggregate function <code><i>agg<sub>i</sub></i></code> is invoked
with a list of the input elements that belong to that group, in
arrival order.</li>
  <li><code>compute <i>agg<sub>1</sub></i>, ...,
<i>agg<sub>a</sub></i></code> behaves as a <code class="language-plaintext highlighter-rouge">group</code> step where all
input elements are in the same group.</li>
  <li><code>union [ distinct ] <i>exp<sub>1</sub></i>, ...,
<i>exp<sub>n</sub></i></code> outputs the elements of the input in
order, followed by the elements of each <i>exp<sub>i</sub></i>
argument in order (just like the UNIX <code class="language-plaintext highlighter-rouge">cat</code> command). If <code class="language-plaintext highlighter-rouge">distinct</code>
is specified, outputs only the first occurrence of each element.</li>
  <li><code>intersect [ distinct ] <i>exp<sub>1</sub></i>, ...,
<i>exp<sub>n</sub></i></code> outputs the elements of the input in
order, provided that every <i>exp<sub>i</sub></i> argument contains
at least the number of occurrences of this element so far.  If
<code class="language-plaintext highlighter-rouge">distinct</code> is specified, outputs only the first occurrence of each
element.</li>
  <li><code>except [ distinct ] <i>exp<sub>1</sub></i>, ...,
<i>exp<sub>n</sub></i></code> outputs the elements of the input in
order, provided that the number of occurrences of that element so
far is less than the number of occurrences of that element in all
the <i>exp<sub>i</sub></i> arguments.  If <code class="language-plaintext highlighter-rouge">distinct</code> is specified,
outputs only the first occurrence of each element.</li>
  <li><code>require <i>condition</i></code> (which can only occur in a
<code class="language-plaintext highlighter-rouge">forall</code> query) has the same behavior as the unordered case.</li>
  <li><code class="language-plaintext highlighter-rouge">order</code> and <code class="language-plaintext highlighter-rouge">unorder</code>, as discussed earlier, have the same
semantics as in the unordered case.</li>
</ul>

<p>The rules for <code class="language-plaintext highlighter-rouge">from</code> and <code class="language-plaintext highlighter-rouge">join</code> produce the same familiar ordering as
a nested “for” loop in a language such as C, Python or Java:</p>

<!-- morel silent
Sys.set ("printLength", ~1);
> val it = () : unit
-->
<!-- morel
from hundreds in [100, 200, 300],
    tens in [10, 20, 30]
  join units in [1, 2, 3]
  yield hundreds + tens + units;
> val it =
>   [111,112,113,121,122,123,131,132,133,211,212,213,221,222,223,231,232,233,311,
>    312,313,321,322,323,331,332,333] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">hundreds</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">],</span>
    <span class="nv">tens</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">]</span>
  <span class="kr">join</span> <span class="nv">units</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="n">hundreds</span> <span class="o">+</span> <span class="n">tens</span> <span class="o">+</span> <span class="n">units</span><span class="p">;</span></div>
<div class="code-output">val it =
  [111,112,113,121,122,123,131,132,133,211,212,213,221,222,223,231,232,233,311,
   312,313,321,322,323,331,332,333] : int list</div>
</div>

<p>The rules for <code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">intersect</code> and <code class="language-plaintext highlighter-rouge">except</code> are rather subtle, and
are best illustrated by example:</p>

<!-- morel
from i in [3, 1, 4, 1, 5, 9, 2, 6]
  union [2, 7, 1, 8, 2, 8, 1, 8];
> val it = [3,1,4,1,5,9,2,6,2,7,1,8,2,8,1,8] : int list
from i in [3, 1, 4, 1, 5, 9, 2, 6]
  intersect [2, 7, 1, 8, 2, 8, 1, 8];
> val it = [1,1,2] : int list
from i in [3, 1, 4, 1, 5, 9, 2, 6]
  except [2, 7, 1, 8, 2, 8, 1, 8];
> val it = [3,4,5,9,6] : int list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
  <span class="kr">union</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">];</span></div>
<div class="code-output">val it = [3,1,4,1,5,9,2,6,2,7,1,8,2,8,1,8] : int list</div>
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
  <span class="kr">intersect</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">];</span></div>
<div class="code-output">val it = [1,1,2] : int list</div>
<div class="code-input"><span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
  <span class="kr">except</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">];</span></div>
<div class="code-output">val it = [3,4,5,9,6] : int list</div>
</div>

<h2 id="hybrid-relational-operators">Hybrid relational operators</h2>

<p>We have specified the behavior of queries where input collections are
all lists or all bags. But what if a query has a mix of list and bag
inputs?</p>

<p>The mixing can occur if the first step of the query (<code class="language-plaintext highlighter-rouge">from</code>, <code class="language-plaintext highlighter-rouge">exists</code>,
or <code class="language-plaintext highlighter-rouge">forall</code>) has more than one scan, or in steps that introduce
another collection (<code class="language-plaintext highlighter-rouge">join</code>, <code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">intersect</code>, or <code class="language-plaintext highlighter-rouge">except</code>). In all
cases, unordered wins: if any input is a <code class="language-plaintext highlighter-rouge">bag</code>, the step becomes
unordered, and unordered semantics apply from then on.</p>

<p>For example, if we join a <code class="language-plaintext highlighter-rouge">list</code> of department numbers (ordered) to a
table of employees (unordered), selecting only the clerks and
managers, the result is a <code class="language-plaintext highlighter-rouge">bag</code> (unordered):</p>

<!-- morel skip
from deptno in [10, 20, 30]
  join e in scott.emps on e.deptno = deptno
  where e.job elem ["CLERK", "MANAGER"]
  yield {deptno, e.ename};
> deptno ename
> ------ ------
> 30     JAMES
> 10     CLARK
> 20     ADAMS
> 10     MILLER
> 20     SMITH
> 30     BLAKE
> 20     JONES
>
> val it : {deptno:int, ename:string} bag
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">deptno</span> <span class="kr">in</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">]</span>
  <span class="kr">join</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">on</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="n">deptno</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="kr">elem</span> <span class="p">[</span><span class="s2">"CLERK"</span><span class="p">,</span> <span class="s2">"MANAGER"</span><span class="p">]</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="n">deptno</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">ename</span><span class="p">};</span></div>
<div class="code-output">deptno ename
------ ------
30     JAMES
10     CLARK
20     ADAMS
10     MILLER
20     SMITH
30     BLAKE
20     JONES

val it : {deptno:int, ename:string} bag</div>
</div>

<h2 id="type-inference-challenges">Type inference challenges</h2>

<p>This feature was challenging to implement because it required
major changes to Morel’s type inference algorithm. (We mention this
only in the spirit of sharing war-stories, and for the interest of
those who understand the internal workings of Morel’s compiler.
Hopefully, the changes to type-inference algorithm will be invisible
to the casual user.)</p>

<p>The problem is evident in a program such as</p>

<!-- morel skip
let
  fun f (xs, ys) =
    from i in xs
      intersect ys
in
  f ((from e in scott.emps yield e.empno), [7521, 7782, 8000])
end;
> val it = [7521,7782] : int bag
-->

<div class="code-block">
<div class="code-input"><span class="kr">let</span>
  <span class="kr">fun</span> <span class="nf">f</span> <span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">ys</span><span class="p">)</span> <span class="p">=</span>
    <span class="kr">from</span> <span class="nv">i</span> <span class="kr">in</span> <span class="n">xs</span>
      <span class="kr">intersect</span> <span class="n">ys</span>
<span class="kr">in</span>
  <span class="n">f</span> <span class="p">((</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">yield</span> <span class="nn">e</span><span class="p">.</span><span class="n">empno</span><span class="p">),</span> <span class="p">[</span><span class="mi">7521</span><span class="p">,</span> <span class="mi">7782</span><span class="p">,</span> <span class="mi">8000</span><span class="p">])</span>
<span class="kr">end</span><span class="p">;</span></div>
<div class="code-output">val it = [7521,7782] : int bag</div>
</div>

<p>While resolving the type of function <code class="language-plaintext highlighter-rouge">f</code> and its embedded query, the
types of the arguments <code class="language-plaintext highlighter-rouge">xs</code> and <code class="language-plaintext highlighter-rouge">ys</code> have not yet been
determined. Morel’s previous type inference algorithm allowed us to
say “<i><code class="language-plaintext highlighter-rouge">xs</code> and <code class="language-plaintext highlighter-rouge">ys</code> must be lists with the same element type</i>” or
“<i><code class="language-plaintext highlighter-rouge">xs</code> and <code class="language-plaintext highlighter-rouge">ys</code> must be bags with the same element type</i>”. It was
based on
<a href="https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Algorithm_W">Hindley-Milner’s Algorithm W</a>
and unification, which basically means finding an assignment of
logical variables so that two trees are structurally identical.</p>

<p>But the type inference rules for queries with a mixture of ordered and
unordered collections require conditions that contain the word
‘or’. For example, resolving the <code class="language-plaintext highlighter-rouge">intersect</code> expression above requires
that we say “<i>we can allow <code class="language-plaintext highlighter-rouge">xs</code> and <code class="language-plaintext highlighter-rouge">ys</code> to both be bags, or both be
lists, or one to be a bag and the other a list, but they must have
same element type</i>”.  Furthermore, we need to derive the result
type, saying “<i>the result of the query is a list if both arguments
are lists, otherwise a bag, with the same element type as the
arguments</i>”.</p>

<p>We needed a system where we can place a number of constraints on type
variables, and then solve for those constraints. The new type
inference algorithm extends Hindley-Milner with constraints, using the
approach described in
<a href="https://dl.acm.org/doi/pdf/10.1145/224164.224195">“A Second Look at Overloading” by Odersky, Wadler &amp; Wehr (1995)</a>.
As the title of that paper suggests, we have <a href="https://github.com/hydromatic/morel/issues/237">added a kind of
overloading</a> to Morel;
it is as if the <code class="language-plaintext highlighter-rouge">intersect</code> operator now has four forms:</p>

<ul>
  <li><code>intersect: &alpha; bag * &alpha; bag &rarr; &alpha; bag</code></li>
  <li><code>intersect: &alpha; bag * &alpha; list &rarr; &alpha; bag</code></li>
  <li><code>intersect: &alpha; list * &alpha; bag &rarr; &alpha; bag</code></li>
  <li><code>intersect: &alpha; list * &alpha; list &rarr; &alpha; list</code></li>
</ul>

<p>(and similar overloads for the other relational operators) and the
type inference algorithm solves the constraints to land on one valid
assignment of types.</p>

<p>The algorithm took several months of hard work to implement, but the
results are pleasing.  Morel retains the key benefits of a
Hindley-Milner type system: strong static typing, runtime efficiency,
and type inference without the need for type annotations.</p>

<p>Like any other major change in architecture, constraint-based type
inference will take a while to mature;
[<a href="https://github.com/hydromatic/morel/issues/270">MOREL-270</a>]
and
[<a href="https://github.com/hydromatic/morel/issues/271">MOREL-271</a>]
describe some of the remaining issues.</p>

<h2 id="conclusion">Conclusion</h2>

<p>The ability to combine ordered and unordered data sets, and process
both using relational operators, is a major new feature in Morel. It
allows Morel to handle, with equal ease, data from files and
relational databases, and data that is generated programmatically.</p>

<p>This feature will be available in Morel release 0.7.</p>

<p>To find out more about Morel, read about its
<a href="/2020/02/25/morel-a-functional-language-for-data.html">goals</a>
and <a href="/2020/03/03/morel-basics.html">basic language</a>, peruse the
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md">query reference</a>
or
<a href="https://github.com/hydromatic/morel/blob/main/docs/reference.md">language reference</a>,
or download it from <a href="https://github.com/hydromatic/morel/">GitHub</a> and
give it a try.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">Database tables are unordered, but functional programming languages work best over ordered lists. Which should <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a> prefer? Both! We now have &quot;list&quot; and &quot;bag&quot; types, and full relational algebra over both. <a href="https://t.co/n8vZx0pUmG">https://t.co/n8vZx0pUmG</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1931153173097660591?ref_src=twsrc%5Etfw">June 7, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<!--
This article
[has been updated](https://github.com/julianhyde/share/commits/main/blog/_posts/2025-06-06-ordered-unordered.md).
-->]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[Despite what the relational model says, some data is ordered.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">INTERSECT ALL, EXCEPT ALL, and the arithmetic of fractions</title><link href="http://blog.hydromatic.net/2025/06/03/intersect-fractions.html" rel="alternate" type="text/html" title="INTERSECT ALL, EXCEPT ALL, and the arithmetic of fractions" /><published>2025-06-03T13:00:00-07:00</published><updated>2025-06-03T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/06/03/intersect-fractions</id><content type="html" xml:base="http://blog.hydromatic.net/2025/06/03/intersect-fractions.html"><![CDATA[<p>SQL’s <code class="language-plaintext highlighter-rouge">INTERSECT ALL</code> and <code class="language-plaintext highlighter-rouge">EXCEPT ALL</code> operators rarely get attention,
but they elegantly solve a classic math problem. The problem is
computing the <strong>greatest common divisor (GCD)</strong> and <strong>least common
multiple (LCM)</strong> of two integers, using the prime factors of those
integers.  In this post we show how to do this using <code class="language-plaintext highlighter-rouge">intersect</code> and
<code class="language-plaintext highlighter-rouge">except</code>, Morel’s equivalent of <code class="language-plaintext highlighter-rouge">INTERSECT ALL</code> and <code class="language-plaintext highlighter-rouge">EXCEPT ALL</code>.</p>

<p>SQL’s set operators (<code class="language-plaintext highlighter-rouge">UNION</code>, <code class="language-plaintext highlighter-rouge">INTERSECT</code>, and <code class="language-plaintext highlighter-rouge">EXCEPT</code>) have set and
multiset variants.  The multiset variants retain duplicates and use
the <code class="language-plaintext highlighter-rouge">ALL</code> keyword; the set variants discard duplicates, and you can
use the optional <code class="language-plaintext highlighter-rouge">DISTINCT</code> keyword if you want to be explicit.</p>

<p>Morel has <a href="https://github.com/hydromatic/morel/issues/253">just added</a>
<code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">intersect</code> and <code class="language-plaintext highlighter-rouge">except</code> query steps, achieving parity
with both Standard SQL and
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/pipe-syntax#union_pipe_operator">GoogleSQL’s pipe syntax</a>.
(This post is about multiset mode, which retains duplicate values; to
use the set mode, which discards duplicates and is far more common,
use the <code class="language-plaintext highlighter-rouge">distinct</code> keyword, for example <code class="language-plaintext highlighter-rouge">intersect distinct</code>.)</p>

<p>Using these steps, we can compute GCD and LCM. The queries are even more
concise because Morel queries over integer values do not require
column names.</p>

<h2 id="adding-fractions">Adding fractions</h2>

<p>Remember how – probably in middle school – you learned how to add
two fractions, and to reduce a fraction to its lowest terms?</p>

<p>Suppose you need to add 5/36 and 7/120. First, find the least
common multiple (LCM) of their denominators (36 and 120).</p>

<p>Next, convert each fraction to an equivalent fraction with the LCM
(360) as the denominator.</p>
<ul>
  <li>For 5/36: Multiply the numerator and denominator by 10
(since 36 * 10 = 360).</li>
  <li>For 7/120: Multiply the numerator and denominator by 3
(since 120 * 3 = 360).</li>
</ul>

<p>Last, add the fractions.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  5      7        5 * 10    7 * 3        50      21       71
---- + -----  =  ------- + -------  =  ----- + -----  =  -----
 36     120      36 * 10   120 * 3      360     360       360
</code></pre></div></div>

<h2 id="computing-gcd-and-lcm">Computing GCD and LCM</h2>

<p>To compute the GCD of two numbers, you start by finding their prime
factors.  Prime factors can be repeated, so these are multisets, not
sets.  Let’s find the GCD of 36 and 120.</p>

<ul>
  <li>36 is 2<sup>2</sup> * 3<sup>2</sup>, so has factors [2, 2, 3, 3]</li>
  <li>120 is 2<sup>3</sup> * 3 * 5, so has factors [2, 2, 2, 3, 5]</li>
</ul>

<p>Where there are factors in common, we take the lower repetition count.
Taking the minimum count for each common factor – two 2s, one 3, and
no 5s – the GCD is therefore 2<sup>2</sup> * 3, which is 12.</p>

<p>The crucial step of the algorithm is to combine two multisets and take
the minimum repetition count; that is exactly what <code class="language-plaintext highlighter-rouge">intersect</code> does.</p>

<p>The LCM is similar, but takes the higher repetition count.
This can be achieved by taking the union of both factor multisets,
then subtracting their intersection. Here’s why: The union gives us
all factors from both numbers, but it adds the counts together. Since
we want the maximum count (not the sum), we subtract the intersection,
which contains the overlapping factors we double-counted.
The LCM is therefore 2<sup>3</sup> * 3<sup>2</sup> * 5, which is 360.</p>

<h2 id="using-morel-to-compute-lcm-and-gcd">Using Morel to compute LCM and GCD</h2>

<p>To convert this algorithm to code, we will need three things:</p>
<ul>
  <li>a <code class="language-plaintext highlighter-rouge">factorize</code> function splits the numbers into multisets of prime
factors;</li>
  <li>the <code class="language-plaintext highlighter-rouge">intersect</code> step combines the multisets;</li>
  <li>a <code class="language-plaintext highlighter-rouge">product</code> function converts the multisets back to a number.</li>
</ul>

<p>Here are the <code class="language-plaintext highlighter-rouge">factorize</code> and <code class="language-plaintext highlighter-rouge">product</code> functions.</p>

<!-- morel
fun factorize n =
  let
    fun factorize' n d =
      if n < d then [] else
      if n mod d = 0 then d :: (factorize' (n div d) d)
      else factorize' n (d + 1)
  in
    factorize' n 2
  end;
> val factorize = fn : int -> int list

fun product [] = 1
  | product (x::xs) = x * (product xs);
> val product = fn : int list -> int
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">factorize</span> <span class="n">n</span> <span class="p">=</span>
  <span class="kr">let</span>
    <span class="kr">fun</span> <span class="nf">factorize'</span> <span class="n">n</span> <span class="n">d</span> <span class="p">=</span>
      <span class="kr">if</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="n">d</span> <span class="kr">then</span> <span class="p">[]</span> <span class="kr">else</span>
      <span class="kr">if</span> <span class="n">n</span> <span class="kr">mod</span> <span class="n">d</span> <span class="p">=</span> <span class="mi">0</span> <span class="kr">then</span> <span class="n">d</span> <span class="o">::</span> <span class="p">(</span><span class="n">factorize'</span> <span class="p">(</span><span class="n">n</span> <span class="kr">div</span> <span class="n">d</span><span class="p">)</span> <span class="n">d</span><span class="p">)</span>
      <span class="kr">else</span> <span class="n">factorize'</span> <span class="n">n</span> <span class="p">(</span><span class="n">d</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
  <span class="kr">in</span>
    <span class="n">factorize'</span> <span class="n">n</span> <span class="mi">2</span>
  <span class="kr">end</span><span class="p">;</span></div>
<div class="code-output">val factorize = fn : int -&gt; int list</div>
<div class="code-input">
<span class="kr">fun</span> <span class="nf">product</span> <span class="p">[]</span> <span class="p">=</span> <span class="mi">1</span>
  <span class="p">|</span> <span class="n">product</span> <span class="p">(</span><span class="n">x</span><span class="o">::</span><span class="n">xs</span><span class="p">)</span> <span class="p">=</span> <span class="n">x</span> <span class="o">*</span> <span class="p">(</span><span class="n">product</span> <span class="n">xs</span><span class="p">);</span></div>
<div class="code-output">val product = fn : int list -&gt; int</div>
</div>

<p>Here’s how they work:</p>

<!-- morel
factorize 120;
> val it = [2,2,2,3,5] : int list
product (factorize 120);
> val it = 120 : int
-->

<div class="code-block">
<div class="code-input"><span class="n">factorize</span> <span class="mi">120</span><span class="p">;</span></div>
<div class="code-output">val it = [2,2,2,3,5] : int list</div>
<div class="code-input"><span class="n">product</span> <span class="p">(</span><span class="n">factorize</span> <span class="mi">120</span><span class="p">);</span></div>
<div class="code-output">val it = 120 : int</div>
</div>

<p>So, we can compute GCD like this:</p>

<!-- morel skip
fun gcd (m, n) =
  from f in factorize m
    intersect factorize n
    compute product;
> val gcd = fn : int * int -> int
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">gcd</span> <span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span> <span class="p">=</span>
  <span class="kr">from</span> <span class="nv">f</span> <span class="kr">in</span> <span class="n">factorize</span> <span class="n">m</span>
    <span class="kr">intersect</span> <span class="n">factorize</span> <span class="n">n</span>
    <span class="kr">compute</span> <span class="n">product</span><span class="p">;</span></div>
<div class="code-output">val gcd = fn : int * int -&gt; int</div>
</div>

<p>The last step uses <code class="language-plaintext highlighter-rouge">compute</code> because <code class="language-plaintext highlighter-rouge">product</code> fulfills Morel’s only
criterion to be an aggregate function: its argument is a collection
of values. (At least one SQL dialect agrees with us, and has a
<a href="https://duckdb.org/docs/stable/sql/functions/aggregates#productarg">PRODUCT</a>
aggregate function.)</p>

<p>LCM can be computed from GCD:</p>

<!-- morel skip
fun lcm (m, n) =
  (m * n) div gcd (m, n);
> val lcm = fn : int * int -> int
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">lcm</span> <span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span> <span class="p">=</span>
  <span class="p">(</span><span class="n">m</span> <span class="o">*</span> <span class="n">n</span><span class="p">)</span> <span class="kr">div</span> <span class="n">gcd</span> <span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">);</span></div>
<div class="code-output">val lcm = fn : int * int -&gt; int</div>
</div>

<p>But, as we discussed above, it can also be computed directly using
<code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">except</code> and <code class="language-plaintext highlighter-rouge">intersect</code>:</p>

<!-- morel skip
fun lcm' (m, n) =
  let
    val m_factors = factorize m
    val n_factors = factorize n
  in
    from f in m_factors
      union (n_factors)
      except (from f in m_factors
        intersect n_factors)
    compute product
  end;
> val lcm' = fn : int * int -> int
-->

<div class="code-block">
<div class="code-input"><span class="kr">fun</span> <span class="nf">lcm'</span> <span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span> <span class="p">=</span>
  <span class="kr">let</span>
    <span class="kr">val</span> <span class="nv">m_factors</span> <span class="p">=</span> <span class="n">factorize</span> <span class="n">m</span>
    <span class="kr">val</span> <span class="nv">n_factors</span> <span class="p">=</span> <span class="n">factorize</span> <span class="n">n</span>
  <span class="kr">in</span>
    <span class="kr">from</span> <span class="nv">f</span> <span class="kr">in</span> <span class="n">m_factors</span>
      <span class="kr">union</span> <span class="p">(</span><span class="n">n_factors</span><span class="p">)</span>
      <span class="kr">except</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">f</span> <span class="kr">in</span> <span class="n">m_factors</span>
        <span class="kr">intersect</span> <span class="n">n_factors</span><span class="p">)</span>
    <span class="kr">compute</span> <span class="n">product</span>
  <span class="kr">end</span><span class="p">;</span></div>
<div class="code-output">val lcm' = fn : int * int -&gt; int</div>
</div>

<p>Let’s test them:</p>

<!-- morel skip
gcd (36, 120);
> val it = 12 : int
lcm (36, 120);
> val it = 360 : int
lcm' (36, 120);
> val it = 360 : int
-->

<div class="code-block">
<div class="code-input"><span class="n">gcd</span> <span class="p">(</span><span class="mi">36</span><span class="p">,</span> <span class="mi">120</span><span class="p">);</span></div>
<div class="code-output">val it = 12 : int</div>
<div class="code-input"><span class="n">lcm</span> <span class="p">(</span><span class="mi">36</span><span class="p">,</span> <span class="mi">120</span><span class="p">);</span></div>
<div class="code-output">val it = 360 : int</div>
<div class="code-input"><span class="n">lcm'</span> <span class="p">(</span><span class="mi">36</span><span class="p">,</span> <span class="mi">120</span><span class="p">);</span></div>
<div class="code-output">val it = 360 : int</div>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>The <code class="language-plaintext highlighter-rouge">intersect</code>, <code class="language-plaintext highlighter-rouge">except</code>, and <code class="language-plaintext highlighter-rouge">union</code> steps neatly solve the problem
of computing GCD and LCM because they handle repeated factors in
exactly the way we need.</p>

<p>These steps will be available shortly in Morel release 0.7.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">Be honest, did you ever find a real-world use for SQL&#39;s &quot;INTERSECT ALL&quot; operator? Now we did! This post explains how you can use <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>&#39;s &quot;intersect&quot; to compute GCD (greatest common divisor). <a href="https://t.co/bulN6RHr96">https://t.co/bulN6RHr96</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1930104410375630901?ref_src=twsrc%5Etfw">June 4, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-06-03-intersect-fractions.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[SQL’s INTERSECT ALL and EXCEPT ALL operators rarely get attention, but they elegantly solve a classic math problem. The problem is computing the greatest common divisor (GCD) and least common multiple (LCM) of two integers, using the prime factors of those integers. In this post we show how to do this using intersect and except, Morel’s equivalent of INTERSECT ALL and EXCEPT ALL.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Should Morel be rewritten in Rust?</title><link href="http://blog.hydromatic.net/2025/05/11/rewrite-morel.html" rel="alternate" type="text/html" title="Should Morel be rewritten in Rust?" /><published>2025-05-11T13:00:00-07:00</published><updated>2025-05-11T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/05/11/rewrite-morel</id><content type="html" xml:base="http://blog.hydromatic.net/2025/05/11/rewrite-morel.html"><![CDATA[<p>There are many excellent and innovative projects happening in the Rust
data ecosystem. I am frequently asked whether
<a href="https://github.com/hydromatic/morel">Morel</a> should be one of
them. Thinking about this question gave me some insights into Morel’s
strengths, and where our priorities should be.</p>

<h1 id="rust-and-its-ecosystem">Rust and its ecosystem</h1>

<p>Rust is an excellent language for building a data processing
system. Though it is a high-level language with type safety and memory
protection, Rust avoids the tax that is paid by most high-level
languages, namely garbage collection and the unpredictable pauses that
it can cause.</p>

<p>Rust also has a vibrant ecosystem; enthusiasts are rebuilding the
whole stack because it seems like fun. That’s mostly a good thing,
because a rebuild is a chance to do things better. But sometimes it’s
a chance to make <em>different</em> mistakes. (I say this having been around
for the <a href="https://hadoop.apache.org/">last revolution</a>, rewriting the
DBMS as a Java-based distributed system.)</p>

<p>Rewriting in Rust would incur an opportunity cost. It would take away
resources from other goals, which I describe below. (If there were no
opportunity cost – if one or two engineers were prepared to spend 3-6
months of their spare time translating Morel’s 50K-or-so lines of Java
to high-quality Rust – then I would probably welcome such a
contribution. However, such dedicated volunteer efforts are
understandably rare.) Let’s remind ourselves of the goal.</p>

<h1 id="morel-and-its-goals">Morel and its goals</h1>

<p>Morel is a language, not a framework or a library. It necessarily has a
reference implementation – the only implementation, at present – and
that happens to be written in Java. Building a language – as opposed
to a framework or a library – means focusing on the design of that
language, as opposed to its initial reference implementation.  Key
elements of the design are syntax, type system, and semantics.
Semantics are especially important because they determine what
execution plans are valid for its programs, and in what ways its
compiler/optimizer is allowed to transform that program.</p>

<p>Think of some great languages –
<a href="https://web.archive.org/web/20250115055354/https://www.bell-labs.com/usr/dmr/www/chist.html">C</a>,
Lisp,
<a href="https://stackoverflow.com/questions/16020999/was-the-original-sql-written-in-assembly-or-c">SQL</a>,
Java, Python, Standard ML, and
<a href="https://en.wikipedia.org/wiki/Rust_(programming_language)#:~:text=During%20the%20early%20years%2C%20the,about%2038%2C000%20lines%20of%20OCaml">Rust</a>
are some examples – and tell me what language their original compiler
was written in. Probably you can’t. The initial implementation served
its purpose – letting the language evolve to something useful and
usable – and in time was discarded.</p>

<p>So, what is Morel for? Morel is a functional query language. It has
the expressive power and type system of a functional programming
language, and as a query language has built-in support for relational
algebra. Its optimizer uses algebraic laws to radically transform
programs to exploit parallelism, distributed processing, data
organization, and pre-computed results.</p>

<p>Some implications of this:</p>
<ul>
  <li><strong>Morel must support multiple runtimes</strong>. Programs can run locally
(using the interpreter written in Java), but also on a distributed
framework such as Apache Spark. It also supports federated
execution, translating programs (or portions of programs) into the
language of the system that contains the data (such as SQL).  Tying
Morel tightly to a particular runtime, even an excellent one such as
Rust-based <a href="https://datafusion.apache.org/">Apache DataFusion</a>,
could detract from that goal.</li>
  <li><strong>Morel’s optimizer must be extensible in Morel</strong>. When Morel
connects to new data sources, these data sources have their own
algebra (operators, transformation rules, constraints, and
statistics). To effectively optimize programs on these data sources,
Morel’s optimizer must understand that algebra. In Morel’s current
optimizer, rules are defined as Java classes that implement
interfaces defined by the
<a href="https://calcite.apache.org/">Apache Calcite</a> planning engine.
That means that to add a data source, bulk data type (say vector,
matrix or geospatial polygon), operator (say multiplying matrices,
or finding sets of overlapping polygons), or transformation rule
(pushing filters into matrices or polygon sets), you need to leave
Morel.  Those things are all more difficult to write without Morel’s
expressive power; Morel should allow these kinds of extensibility
without leaving the language.</li>
  <li><strong>Morel should bootstrap its compiler but not its runtime</strong>. Every
programming language aspires to implement its own compiler and
runtime, but those aspirations need to be matched to the actual
strengths of the language. An efficient runtime is not a major goal
of Morel (see first point) and therefore the runtime should be written
in another language. But Morel, in the same family of languages as
<a href="https://en.wikipedia.org/wiki/ML_(programming_language)">Standard ML</a>,
<a href="https://en.wikipedia.org/wiki/OCaml">OCaml</a> and
<a href="https://en.wikipedia.org/wiki/Rocq">Rocq</a>, is a good language for
building compilers and optimizers. Defining the abstract syntax tree
(AST), algebra, and transformation rules in Morel will make
extensibility possible (see second point).</li>
</ul>

<h1 id="convergence">Convergence</h1>

<p>These features put Morel in a good position to take advantage of an
exciting trend in databases and programming languages: the convergence
of compilation, query optimization, and query execution.</p>

<p>A few years ago
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2002/07/inline.pdf">functional programming language compilers</a>
started to look a bit like query optimizers, deciding (albeit without
statistics) whether to inline computations.</p>

<p>Around 2010, the programming language community discovered
<a href="https://arxiv.org/abs/1012.1802">equality saturation</a>, a program
optimization technique that fires transformation rules and maintains
equivalence sets of expressions that are semantically equivalent. They
were apparently unaware of
<a href="https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf">Cascades</a>,
a virtually identical algorithm that has powered most query optimizers
for three decades (including Calcite).</p>

<p>More recent work has
<a href="https://arxiv.org/abs/2304.04332">drawn parallels</a> between equality
saturation and Datalog-style query execution, in particular
<a href="https://en.wikipedia.org/wiki/Worst-case_optimal_join_algorithm">worst-case optimal joins</a>.</p>

<p>Morel’s support for
<a href="https://github.com/hydromatic/morel/discussions/106">Datalog-like recursion</a>
allows it to take on these hard problems. Its functional programming
language roots, in particular its strong type system with
<a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a>
and polymorphism, make it a delightful language in which to build
compilers and optimization rules.</p>

<h1 id="conclusion">Conclusion</h1>

<p>Although the Rust language is a compelling ecosystem, re-implementing
Morel in Rust would be a distraction from its core goals: enabling
queries and data-intensive programs on a variety of runtimes.  Morel’s
promise will be realized when it, not the underlying language,
provides the tools for extensibility.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">Should <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a> be rewritten in <a href="https://twitter.com/rustlang?ref_src=twsrc%5Etfw">@rustlang</a>? The Rust data community is compelling, but moving to Rust might  cause us to lose focus on Morel&#39;s bigger goals. I ponder the question in a blog post. <a href="https://t.co/yhZabmzU26">https://t.co/yhZabmzU26</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1921752150553608204?ref_src=twsrc%5Etfw">May 12, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-05-11-rewrite-morel.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[There are many excellent and innovative projects happening in the Rust data ecosystem. I am frequently asked whether Morel should be one of them. Thinking about this question gave me some insights into Morel’s strengths, and where our priorities should be.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Morel release 0.6.0</title><link href="http://blog.hydromatic.net/2025/05/02/morel-release-0-6-0.html" rel="alternate" type="text/html" title="Morel release 0.6.0" /><published>2025-05-02T13:00:00-07:00</published><updated>2025-05-02T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/05/02/morel-release-0-6-0</id><content type="html" xml:base="http://blog.hydromatic.net/2025/05/02/morel-release-0-6-0.html"><![CDATA[<p>I am pleased to announce Morel
<a href="https://github.com/hydromatic/morel/blob/main/HISTORY.md#060--2025-05-02">release 0.6.0</a>,
just two months after
<a href="https://github.com/hydromatic/morel/blob/main/HISTORY.md#050--2025-03-04">release 0.5.0</a>.</p>

<p>It’s been a busy couple of months. I attended
<a href="https://www.datacouncil.ai/bay-2025">Data Council 2025</a>, gave a talk called
“<a href="https://www.datacouncil.ai/talks/more-than-query-future-directions-of-query-langages-from-sql-to-morel?hsLang=en">More than Query</a>,”
and had discussions with a lot of smart people. In a follow-up
chat, Julien Le Dem and I
<a href="https://www.youtube.com/watch?v=zpdbEvhhne8">went deeper</a> into the
topics I raised at Data Council.  A piece about how Morel could do
<a href="/2025/04/14/dml-in-morel.html">DML and data engineering</a>
generated a lot of discussion. And behind the scenes I’ve been writing
a lot of code, doing fundamental work on Morel’s
<a href="https://github.com/hydromatic/morel/issues/237">type system</a> and
<a href="https://github.com/hydromatic/morel/issues/235">collection types</a>
that is not yet fully baked but will appear in the next release.</p>

<p>Even though this is a slim release, I’m pleased to be able to add
language features for <a href="#1-logic-extensions">logic</a> and
<a href="#2-record-update">updating records</a>,
<a href="#3-tabular-mode">improvements to the shell</a>, and
<a href="#4-unifier-performance-improvements">performance improvements for the type-inference algorithm</a>.</p>

<p>Let’s explore a few of the new features.  For more information, see
the
<a href="https://github.com/hydromatic/morel/blob/main/HISTORY.md#060--2025-05-02">official release notes</a>.</p>

<h1 id="1-logic-extensions">1. Logic extensions</h1>

<p>In logic, it is common to ask whether a predicate is true for all
elements of a set, or for any elements of a set. These questions are
essentially queries that return a boolean value. Morel already has
ways to express those queries, but in 0.6.0 we add
<a href="https://github.com/hydromatic/morel/issues/241">syntax that is closer to logic</a>.</p>

<p>The new <code class="language-plaintext highlighter-rouge">exists</code>, <code class="language-plaintext highlighter-rouge">forall</code> and <code class="language-plaintext highlighter-rouge">implies</code> keywords have the following
syntax.</p>

<pre>
exp &rarr;
  ...
| <b>from</b> [ <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i> ] step*
                                relational expression (s ≥ 0)
| <b>exists</b> [ <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i> ] step*
                                existential quantification (s ≥ 0)
| <b>forall</b> [ <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i> ] <b>require</b> <i>exp</i>
                                universal quantification (s ≥ 0)
| <i>exp<sub>1</sub></i> <b>implies</b> <i>exp<sub>2</sub></i>
</pre>

<p>As you can see, <code class="language-plaintext highlighter-rouge">exists</code> and <code class="language-plaintext highlighter-rouge">forall</code> have similar syntax to the
existing <code class="language-plaintext highlighter-rouge">from</code> expression. Collectively, these are called <em>query
expressions</em>, and are all documented in the new
<a href="https://github.com/hydromatic/morel/blob/main/docs/query.md">query reference</a>.</p>

<p>The new constructs are specified in terms of existing <code class="language-plaintext highlighter-rouge">from</code>, <code class="language-plaintext highlighter-rouge">count</code>,
<code class="language-plaintext highlighter-rouge">not</code>, and <code class="language-plaintext highlighter-rouge">orelse</code> operators, as shown in the following table. (Morel
currently uses the rewrites in the table, but that doesn’t prevent us
from using an equivalent but more efficient rewrite – say using a
semi-join rather than <code class="language-plaintext highlighter-rouge">count</code> – in future.)</p>

<table>
<tr>
<td>
Original <code>exists</code>
</td>
<td>
<pre>
<b>exists</b> <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i> <i>steps</i>
</pre>
</td>
</tr>

<tr>
<td>
Rewritten using <code>count</code>
</td>
<td>
<pre>
<b>count</b> (<b>from</b> <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i> <i>steps</i> ) &gt; 0
</pre>
</td>
</tr>

<tr>
<td>
Original <code>forall</code>
</td>
<td>
<pre>
<b>forall</b> <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i>
  <b>require</b> <i>exp</i>
</pre>
</td>
</tr>

<tr>
<td>
Rewritten using <code>not exists</code>
</td>
<td>
<pre>
<b>not</b> (<b>exists</b> <i>scan<sub>1</sub></i> , ... , <i>scan<sub>s</sub></i>
  <b>where</b> <b>not</b> <i>exp</i>)
</pre>
</td>
</tr>

<tr>
<td>
Original <code>implies</code>
</td>
<td>
<pre>
<i>exp<sub>1</sub></i> <b>implies</b> <i>exp<sub>2</sub></i>
</pre>
</td>
</tr>

<tr>
<td>
Rewritten using <code>not</code> and <code>orelse</code>
</td>
<td>
<pre>
<b>not</b> <i>exp<sub>1</sub></i> <b>orelse</b> <i>exp<sub>2</sub></i>
</pre>
</td>
</tr>

</table>

<p>To avoid confusion, the previous <code class="language-plaintext highlighter-rouge">Relational.exists</code> function, which
is a synonym for <code class="language-plaintext highlighter-rouge">List.null</code> and tests whether a list is non-empty,
has been renamed <code class="language-plaintext highlighter-rouge">Relational.nonEmpty</code>.</p>

<h3 id="example-1-existential-quantification">Example 1. Existential quantification</h3>

<p>Are there any employees with a salary greater than 1,000?</p>

<!-- morel silent
val emps = scott.emps;
> val emps = <relation>
>   :
>     {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
>      job:string, mgr:int, sal:real} bag
val depts = scott.depts;
> val depts = <relation> : {deptno:int, dname:string, loc:string} bag
-->
<!-- morel skip
(* Using new "exists" keyword. *)
exists e in emps
  where e.sal > 1000.0;
> val it = true : bool

(* Equivalent using "from" and "List.null". *)
not (List.null (from e in emps where e.sal > 1000.0));
> val it = true : bool

(* Equivalent using "from" and "compute". *)
(from e in emps
  where e.sal > 1000.0
  compute count) > 0
> val it = true : bool
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Using new "exists" keyword. *)</span>
<span class="kr">exists</span> <span class="n">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span></div>
<div class="code-output">val it = true : bool</div>
<div class="code-input">
<span class="c">(*</span><span class="cm"> Equivalent using "from" and "List.null". *)</span>
<span class="kr">not</span> <span class="p">(</span><span class="nn">List</span><span class="p">.</span><span class="n">null</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="n">emps</span> <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">.</span><span class="mi">0</span><span class="p">));</span></div>
<div class="code-output">val it = true : bool</div>
<div class="code-input">
<span class="c">(*</span><span class="cm"> Equivalent using "from" and "compute". *)</span>
<span class="p">(</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">.</span><span class="mi">0</span>
  <span class="kr">compute</span> <span class="n">count</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span></div>
<div class="code-output">val it = true : bool</div>
</div>

<p>The logic expression “∃ <em>e</em> ∈ <em>set</em>. <em>predicate</em>” maps to the
Morel expression “<code><b>exists</b> <i>e</i> <b>in</b> <i>set</i>
<b>where</b> <i>predicate</i></code>”.</p>

<p><code class="language-plaintext highlighter-rouge">exists</code> has a syntax very similar to <code class="language-plaintext highlighter-rouge">from</code>; it starts a query
pipeline, and therefore you may add steps such as <code class="language-plaintext highlighter-rouge">join</code>, <code class="language-plaintext highlighter-rouge">group</code>,
and <code class="language-plaintext highlighter-rouge">order</code>. The expression yields <code class="language-plaintext highlighter-rouge">true</code> if the query returns at
least one row.</p>

<h3 id="example-2-universal-quantification">Example 2. Universal quantification</h3>

<p>Do all programmers have a salary greater than 900?</p>

<!-- morel skip
(* Using new "forall" keyword. *)
forall e in emps
  where e.job = "PROGRAMMER"
  require e.sal > 900.0;
> val it = true : bool

(* Equivalent using "exists". *)
not (exists e in emps
  where e.job = "PROGRAMMER"
  andalso not (e.sal > 900.0));
> val it = true : bool

(* Equivalent using "from" and "List.null". *)
List.null (from e in emps
           where e.job = "PROGRAMMER" andalso not (e.sal > 900.0));
> val it = true : bool
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Using new "forall" keyword. *)</span>
<span class="kr">forall</span> <span class="n">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"PROGRAMMER"</span>
  <span class="kr">require</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">900</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span></div>
<div class="code-output">val it = true : bool</div>
<div class="code-input">
<span class="c">(*</span><span class="cm"> Equivalent using "exists". *)</span>
<span class="kr">not</span> <span class="p">(</span><span class="kr">exists</span> <span class="n">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"PROGRAMMER"</span>
  <span class="kr">andalso</span> <span class="kr">not</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">900</span><span class="p">.</span><span class="mi">0</span><span class="p">));</span></div>
<div class="code-output">val it = true : bool</div>
<div class="code-input">
<span class="c">(*</span><span class="cm"> Equivalent using "from" and "List.null". *)</span>
<span class="nn">List</span><span class="p">.</span><span class="n">null</span> <span class="p">(</span><span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="n">emps</span>
           <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"PROGRAMMER"</span> <span class="kr">andalso</span> <span class="kr">not</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">900</span><span class="p">.</span><span class="mi">0</span><span class="p">));</span></div>
<div class="code-output">val it = true : bool</div>
</div>

<p>The logic expression “∀ <em>e</em> ∈ <em>set</em>. <em>predicate</em>” maps to
the Morel expression “<code><b>forall</b> <i>e</i> <b>in</b>
<i>set</i> <b>require</b> <i>predicate</i></code>”.</p>

<p>Like <code class="language-plaintext highlighter-rouge">from</code> and <code class="language-plaintext highlighter-rouge">exists</code>, <code class="language-plaintext highlighter-rouge">forall</code> starts a pipeline, but that
pipeline must end with a <code class="language-plaintext highlighter-rouge">require</code> step. The query evaluates to <code class="language-plaintext highlighter-rouge">true</code>
if the predicate returns true for all rows that make it into
<code class="language-plaintext highlighter-rouge">require</code>. (If there are no rows, the result is trivially <code class="language-plaintext highlighter-rouge">true</code>.)</p>

<h3 id="example-3-implication">Example 3. Implication</h3>

<p>Why did we add <code class="language-plaintext highlighter-rouge">require</code> when <code class="language-plaintext highlighter-rouge">where</code> does almost the same thing?  Our
initial syntax only had <code class="language-plaintext highlighter-rouge">where</code>, and to solve the previous query many
people would end up writing something like this:</p>

<!-- morel skip
(* A query using "where" is invalid and not equivalent to the
   original query. *)
forall e in emps
  where e.job = "PROGRAMMER" andalso e.sal > 900;
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> A query using "where" is invalid and not equivalent to the
   original query. *)</span>
<span class="kr">forall</span> <span class="n">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"PROGRAMMER"</span> <span class="kr">andalso</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">900</span><span class="p">;</span></div>
</div>

<p>This query is invalid (we decided that a <code class="language-plaintext highlighter-rouge">forall</code> query must end in
<code class="language-plaintext highlighter-rouge">require</code>, for reasons that will become clear) but even if it were
valid, it would be incorrect.  It is telling us whether all employees
are programmers who earn more than 900. If just one employee were a
manager earning 1,200, the query would return false. Not what we wanted!</p>

<p>So we added the <code class="language-plaintext highlighter-rouge">require</code> step. You can use <code class="language-plaintext highlighter-rouge">where</code> steps (and other
relational operators you like) to narrow down to a population you wish
to check (in this case, programmers), and finish with <code class="language-plaintext highlighter-rouge">require</code> to
check each member of that population.</p>

<p>But it got us thinking about other ways to write the query. How would
you write it if you were a logician? Probably like this:</p>

<!-- morel skip
(* Valid, and equivalent to the original query. *)
forall e in emps
  require not (e.job = "PROGRAMMER") orelse e.sal > 900;
> val it = true : bool
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Valid, and equivalent to the original query. *)</span>
<span class="kr">forall</span> <span class="n">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">require</span> <span class="kr">not</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"PROGRAMMER"</span><span class="p">)</span> <span class="kr">orelse</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">900</span><span class="p">;</span></div>
<div class="code-output">val it = true : bool</div>
</div>

<p>The query considers the whole population (all employees, including
programmers and managers) and crafts the predicate so that rows not in
the population always pass. The new <code class="language-plaintext highlighter-rouge">implies</code> operator lets you do
just that: <code><i>a</i> <b>implies</b> <i>b</i></code> evaluates to
true if <code><i>a</i></code> is false, <code><i>b</i></code>
otherwise.</p>

<!-- morel skip
(* Valid, equivalent to the original query, and we think
   quite nice even if you're not a logician. *)
forall e in emps
  require e.job = "PROGRAMMER" implies e.sal > 900;
> val it = true : bool
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Valid, equivalent to the original query, and we think
   quite nice even if you're not a logician. *)</span>
<span class="kr">forall</span> <span class="n">e</span> <span class="kr">in</span> <span class="n">emps</span>
  <span class="kr">require</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"PROGRAMMER"</span> <span class="kr">implies</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">900</span><span class="p">;</span></div>
<div class="code-output">val it = true : bool</div>
</div>

<p>The <code class="language-plaintext highlighter-rouge">implies</code> operator (like most operators, including <code class="language-plaintext highlighter-rouge">orelse</code>,
<code class="language-plaintext highlighter-rouge">andalso</code> and arithmetic <code class="language-plaintext highlighter-rouge">-</code>, <code class="language-plaintext highlighter-rouge">mod</code> and <code class="language-plaintext highlighter-rouge">/</code>), is left-associative.
“<code><i>a</i> <b>implies</b> <i>b</i> <b>implies</b>
<i>c</i></code>” is equivalent to “(<code><i>a</i> <b>implies</b>
<i>b</i>) <b>implies</b> <i>c</i></code>,” and hence to
“(<code><b>not</b> (<b>not</b> <i>a</i> <b>orelse</b> <i>b</i>)
<b>orelse</b> <i>c</i></code>.”</p>

<h1 id="2-record-update">2. Record update</h1>

<p>While thinking about
<a href="/2025/04/14/dml-in-morel.html">syntax for updating tables</a>
we needed a way to change just one field of a record.</p>

<p>Suppose that the <code class="language-plaintext highlighter-rouge">emps</code> table has eight fields and <code class="language-plaintext highlighter-rouge">emp</code> represents
one row from that table:</p>

<!-- morel skip
val emp = List.hd scott.emps;
> val emp =
>   {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16",
>    job="CLERK",mgr=7902,sal=800.0}
>   : {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
>      job:string, mgr:int, sal:real}
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">emp</span> <span class="p">=</span> <span class="nn">List</span><span class="p">.</span><span class="n">hd</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span><span class="p">;</span></div>
<div class="code-output">val emp =
  {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16",
   job="CLERK",mgr=7902,sal=800.0}
  : {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
     job:string, mgr:int, sal:real}</div>
</div>

<p>If you want to change just the <code class="language-plaintext highlighter-rouge">sal</code> field, conventional record syntax
requires you to copy the other seven fields:</p>

<!-- morel skip
val emp2 = {emp.comm, emp.deptno, emp.empno, emp.ename, emp.hiredate,
    emp.job, emp.mgr, sal = emp.sal * 2.0};
> val emp2 =
>   {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16",
>    job="CLERK",mgr=7902,sal=1600.0}
>   : {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
>      job:string, mgr:int, sal:real}
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">emp2</span> <span class="p">=</span> <span class="p">{</span><span class="nn">emp</span><span class="p">.</span><span class="n">comm</span><span class="p">,</span> <span class="nn">emp</span><span class="p">.</span><span class="n">deptno</span><span class="p">,</span> <span class="nn">emp</span><span class="p">.</span><span class="n">empno</span><span class="p">,</span> <span class="nn">emp</span><span class="p">.</span><span class="n">ename</span><span class="p">,</span> <span class="nn">emp</span><span class="p">.</span><span class="n">hiredate</span><span class="p">,</span>
    <span class="nn">emp</span><span class="p">.</span><span class="n">job</span><span class="p">,</span> <span class="nn">emp</span><span class="p">.</span><span class="n">mgr</span><span class="p">,</span> <span class="n">sal</span> <span class="p">=</span> <span class="nn">emp</span><span class="p">.</span><span class="n">sal</span> <span class="o">*</span> <span class="mi">2</span><span class="p">.</span><span class="mi">0</span><span class="p">};</span></div>
<div class="code-output">val emp2 =
  {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16",
   job="CLERK",mgr=7902,sal=1600.0}
  : {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
     job:string, mgr:int, sal:real}</div>
</div>

<p>Being a functional programming language, Morel avoids direct field
mutation since mutation often leads to bugs. But creating an entirely
new record doesn’t have to be so verbose.</p>

<p>We have borrowed OCaml’s
<a href="https://github.com/hydromatic/morel/issues/249">syntax for functional update of records</a>:</p>

<!-- morel skip
val emp3 = {emp with sal = emp.sal * 2.0};
> val emp3 =
>   {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16",
>    job="CLERK",mgr=7902,sal=1600.0}
>   : {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
>      job:string, mgr:int, sal:real}
-->

<div class="code-block">
<div class="code-input"><span class="kr">val</span> <span class="nv">emp3</span> <span class="p">=</span> <span class="p">{</span><span class="n">emp</span> <span class="kr">with</span> <span class="n">sal</span> <span class="p">=</span> <span class="nn">emp</span><span class="p">.</span><span class="n">sal</span> <span class="o">*</span> <span class="mi">2</span><span class="p">.</span><span class="mi">0</span><span class="p">};</span></div>
<div class="code-output">val emp3 =
  {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16",
   job="CLERK",mgr=7902,sal=1600.0}
  : {comm:real, deptno:int, empno:int, ename:string, hiredate:string,
     job:string, mgr:int, sal:real}</div>
</div>

<p>The <code class="language-plaintext highlighter-rouge">with</code> keyword in a record expression tells Morel to copy the
original record (<code class="language-plaintext highlighter-rouge">emp</code> in this case) and only modify the fields
specified.</p>

<h1 id="3-tabular-mode">3. Tabular mode</h1>

<p>When giving demos of Morel queries and programs, people quickly
understand that the programs are reading data from a database (such as
the <code class="language-plaintext highlighter-rouge">emps</code> table in the
<a href="https://github.com/julianhyde/scott-data-hsqldb"><code class="language-plaintext highlighter-rouge">scott</code></a> database
that is so ubiquitous in Morel’s examples and documentation). But it
takes them a bit longer to realize that Morel is actually executing
queries. Why is that?</p>

<p>Maybe it’s because the query results don’t look much like queries.</p>

<!-- morel skip
from d in scott.depts
    join e in scott.emps on e.deptno = d.deptno
  where e.job = "CLERK"
  yield {d.dname, e.empno};
> val it =
>   [{dname="ACCOUNTING",empno=7934},{dname="RESEARCH",empno=7369},
>    {dname="RESEARCH",empno=7876},{dname="SALES",empno=7900}]
>   : {dname:string, empno:int} list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">d</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">depts</span>
    <span class="kr">join</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">on</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="nn">d</span><span class="p">.</span><span class="n">deptno</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"CLERK"</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="nn">d</span><span class="p">.</span><span class="n">dname</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">empno</span><span class="p">};</span></div>
<div class="code-output">val it =
  [{dname="ACCOUNTING",empno=7934},{dname="RESEARCH",empno=7369},
   {dname="RESEARCH",empno=7876},{dname="SALES",empno=7900}]
  : {dname:string, empno:int} list</div>
</div>

<p>By default, result sets are condensed into a compact representation,
which works well for many of Morel’s supported data types. However,
this format obscures the natural structure of tabular data. Let’s use
the new <a href="https://github.com/hydromatic/morel/issues/259">tabular mode</a>:</p>

<!-- morel skip
set ("output", "tabular");
> val it = () : unit
-->

<div class="code-block">
<div class="code-input"><span class="n">set</span> <span class="p">(</span><span class="s2">"output"</span><span class="p">,</span> <span class="s2">"tabular"</span><span class="p">);</span></div>
<div class="code-output">val it = () : unit</div>
</div>

<p>Now the tabular structure is clear:</p>

<!-- morel skip
from d in scott.depts
    join e in scott.emps on e.deptno = d.deptno
  where e.job = "CLERK"
  yield {d.dname, e.empno};
> dname      empno
> ---------- -----
> ACCOUNTING  7934
> RESEARCH    7369
> RESEARCH    7876
> SALES       7900
>
> val it : {dname:string, empno:int} list
-->

<div class="code-block">
<div class="code-input"><span class="kr">from</span> <span class="nv">d</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">depts</span>
    <span class="kr">join</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span> <span class="kr">on</span> <span class="nn">e</span><span class="p">.</span><span class="n">deptno</span> <span class="p">=</span> <span class="nn">d</span><span class="p">.</span><span class="n">deptno</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"CLERK"</span>
  <span class="kr">yield</span> <span class="p">{</span><span class="nn">d</span><span class="p">.</span><span class="n">dname</span><span class="p">,</span> <span class="nn">e</span><span class="p">.</span><span class="n">empno</span><span class="p">};</span></div>
<div class="code-output">dname      empno
---------- -----
ACCOUNTING  7934
RESEARCH    7369
RESEARCH    7876
SALES       7900

val it : {dname:string, empno:int} list</div>
</div>

<p>Tabular mode only activates if the data set is a list of records;
otherwise it falls back to the “classic” mode.  There’s still much
room for improvement – such as how values are formatted, and handling
types like <code class="language-plaintext highlighter-rouge">option</code>, nested records, and nested lists of integers and
strings – but at least your tabular data now looks tabular.</p>

<h1 id="4-unifier-performance-improvements">4. Unifier performance improvements</h1>

<p>Type inference is at the heart of Morel. We believe that a
well-designed, strong type system is the foundation for maintainable
software, but also that programmers shouldn’t need to sprinkle types
all over their program to make it compile. Therefore the language needs
to be able to infer types for itself, and the gold standard is the
<a href="https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system">Hindley-Milner</a>
type inference algorithm.</p>

<p>That algorithm needs continuous improvement. For the upcoming
<a href="https://github.com/hydromatic/morel/issues/235">ordered and unordered multisets</a>
feature we need
<a href="https://github.com/hydromatic/morel/issues/237">overloaded operators</a>,
and so we are <a href="https://dl.acm.org/doi/pdf/10.1145/224164.224195">extending the type system</a>.</p>

<p>This release, we have
<a href="https://github.com/hydromatic/morel/issues/246">tuned the internal data structures</a>
of the unification algorithm that underlies type-inference. Expect
further evolution in future releases.</p>

<h1 id="conclusion">Conclusion</h1>

<p>Two months since the previous release, and with luck another release
of Morel in a few weeks.  Work continues on the features for that
release; we are especially looking forward to launching
<a href="https://github.com/hydromatic/morel/issues/235">multisets</a>, when they
are ready.</p>

<p>Until then, give Morel a try. Go to
<a href="https://github.com/hydromatic/morel">GitHub</a> and you can have Morel
built and running in under a minute.</p>

<p>If you have comments, please reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">I&#39;m please to announce release 0.6 of <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>, just 2 months after 0.5. The release adds &#39;forall&#39;, &#39;exists&#39; and &#39;implies&#39; keywords for logic programming, &#39;with&#39; for easy record updates, and a tabular output mode to the shell. <a href="https://t.co/XDlFQLcv4K">https://t.co/XDlFQLcv4K</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1918550796335227174?ref_src=twsrc%5Etfw">May 3, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-05-02-morel-release-0-6-0.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[I am pleased to announce Morel release 0.6.0, just two months after release 0.5.0.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" /><media:content medium="image" url="http://blog.hydromatic.net/assets/img/OldDesignShop_MushroomSpringMorel-240x240.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">DML in Morel</title><link href="http://blog.hydromatic.net/2025/04/14/dml-in-morel.html" rel="alternate" type="text/html" title="DML in Morel" /><published>2025-04-14T13:00:00-07:00</published><updated>2025-04-14T13:00:00-07:00</updated><id>http://blog.hydromatic.net/2025/04/14/dml-in-morel</id><content type="html" xml:base="http://blog.hydromatic.net/2025/04/14/dml-in-morel.html"><![CDATA[<p>Until last week, I hadn’t thought much about
<a href="https://en.wikipedia.org/wiki/Data_manipulation_language">DML</a> in
<a href="/2020/02/25/morel-a-functional-language-for-data.html">Morel</a>.
I knew we would support DML, of course, but I had assumed that we
would just add commands analogous to SQL’s <code class="language-plaintext highlighter-rouge">INSERT</code>, <code class="language-plaintext highlighter-rouge">UPDATE</code>,
<code class="language-plaintext highlighter-rouge">DELETE</code> commands.</p>

<p>But then, over a beer, my brother asked me whether Morel would have a
<code class="language-plaintext highlighter-rouge">MERGE</code> command. I almost choked on my
<a href="https://www.originalpatternbeer.com/#draft-section">pale ale</a>. I
realized that <code class="language-plaintext highlighter-rouge">MERGE</code> (also known as <code class="language-plaintext highlighter-rouge">UPSERT</code>) and the other SQL DML
commands brought a complexity that could damage the fragile design of
a new language. I started thinking about the whole DML problem. DML is
about mutation, Morel is a functional programming language, and
mutation and FP have always been
<a href="https://en.wikipedia.org/wiki/Monad_(functional_programming)#State_monads">uneasy bedfellows</a>.
So I devised some language extensions that I think fit both Morel and
modern data engineering workflow.</p>

<p>In this post, I describe those DML extensions, and related ideas about
transactions, incremental processing, and view maintenance.</p>

<p>I would love to hear feedback from people who do data engineering, ELT
and ETL, and love or hate how current tools and query languages solve
the problem. My extensions to Morel are at an early stage, so there is
plenty of time to change course. I will be speaking next week at
<a href="https://www.datacouncil.ai/talks/more-than-query-future-directions-of-query-langages-from-sql-to-morel?hsLang=en">Data Council 2025</a>,
so if you are in Oakland, please tell me what you think!</p>

<h1 id="translate-sql-dml-commands-to-morel">Translate SQL DML commands to Morel</h1>

<p>The obvious thing would be to directly translate SQL’s DML commands
into Morel. To see how that would look, let’s consider a simple SQL
script that executes the three DML commands on the <code class="language-plaintext highlighter-rouge">emps</code> table of the
<a href="https://github.com/julianhyde/scott-data-hsqldb">scott</a> data set, and
then commits the changes.</p>

<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="c1">-- Delete employees who earn more than 1,000.</span>
<span class="k">DELETE</span> <span class="k">FROM</span> <span class="n">scott</span><span class="p">.</span><span class="n">emps</span>
<span class="k">WHERE</span> <span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">;</span>

<span class="c1">-- Add one employee.</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">scott</span><span class="p">.</span><span class="n">emps</span> <span class="p">(</span><span class="n">empno</span><span class="p">,</span> <span class="n">deptno</span><span class="p">,</span> <span class="n">ename</span><span class="p">,</span> <span class="n">job</span><span class="p">,</span> <span class="n">sal</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="s1">'HYDE'</span><span class="p">,</span> <span class="s1">'ANALYST'</span><span class="p">,</span> <span class="mi">1150</span><span class="p">);</span>

<span class="c1">-- Double the salary of all managers.</span>
<span class="k">UPDATE</span> <span class="n">scott</span><span class="p">.</span><span class="n">emps</span>
<span class="k">SET</span> <span class="n">sal</span> <span class="o">=</span> <span class="n">sal</span> <span class="o">*</span> <span class="mi">2</span>
<span class="k">WHERE</span> <span class="n">job</span> <span class="o">=</span> <span class="s1">'MANAGER'</span><span class="p">;</span>

<span class="c1">-- Commit.</span>
<span class="k">COMMIT</span><span class="p">;</span></code></pre></figure>

<p>If we added <code class="language-plaintext highlighter-rouge">insert</code>, <code class="language-plaintext highlighter-rouge">update</code>, and <code class="language-plaintext highlighter-rouge">delete</code> operators to Morel, this
could become the following code.</p>

<!-- morel skip
(* Delete employees who earn more than 1,000. *)
delete e in scott.emps
  where e.sal > 1000;

(* Add one employee. *)
insert scott.emps
  [{empno = 100, deptno = 20, ename = "HYDE", job = "ANALYST", sal = 1150}];

(* Double the salary of all managers. *)
update e in scott.emps
  where e.job = 'MANAGER'
  assign (e, {e with sal = e.sal * 2});

(* Commit. *)
commit;
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Delete employees who earn more than 1,000. *)</span>
<span class="kr">delete</span> <span class="n">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">;</span>

<span class="c">(*</span><span class="cm"> Add one employee. *)</span>
<span class="kr">insert</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="p">[{</span><span class="n">empno</span> <span class="p">=</span> <span class="mi">100</span><span class="p">,</span> <span class="n">deptno</span> <span class="p">=</span> <span class="mi">20</span><span class="p">,</span> <span class="n">ename</span> <span class="p">=</span> <span class="s2">"HYDE"</span><span class="p">,</span> <span class="n">job</span> <span class="p">=</span> <span class="s2">"ANALYST"</span><span class="p">,</span> <span class="n">sal</span> <span class="p">=</span> <span class="mi">1150</span><span class="p">}];</span>

<span class="c">(*</span><span class="cm"> Double the salary of all managers. *)</span>
<span class="kr">update</span> <span class="n">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
  <span class="kr">where</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="nn">'MANAGER'</span>
  <span class="kr">assign</span> <span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="p">{</span><span class="n">e</span> <span class="kr">with</span> <span class="n">sal</span> <span class="p">=</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">*</span> <span class="mi">2</span><span class="p">});</span>

<span class="c">(*</span><span class="cm"> Commit. *)</span>
<span class="kr">commit</span><span class="p">;</span></div>
</div>

<p>Note that <code class="language-plaintext highlighter-rouge">update</code> has an <code class="language-plaintext highlighter-rouge">assign</code> clause that updates
the current record, and we borrow the <code class="language-plaintext highlighter-rouge">with</code>
<a href="https://github.com/hydromatic/morel/issues/249">functional update notation</a>
from <a href="https://ocaml.org/manual/5.3/coreexamples.html#s:tut-recvariants">OCaml</a>.</p>

<p>But this doesn’t look right to me. The <code class="language-plaintext highlighter-rouge">insert</code>, <code class="language-plaintext highlighter-rouge">update</code>, and <code class="language-plaintext highlighter-rouge">delete</code>
operators all mutate their target data set – which conflicts with
functional programming’s immutability ethos – and their syntax only
uses parts of Morel’s elegant <code class="language-plaintext highlighter-rouge">from</code> expression.</p>

<p>Also, let’s remind ourselves what we are trying to achieve.</p>

<h1 id="requirements-for-a-data-engineering-language">Requirements for a data engineering language</h1>

<p>Things a data engineering language needs to do well:</p>

<ul>
  <li>Refer to any previous version of a data set (committed or uncommitted).</li>
  <li>Work in terms of deltas between data sets, or versions of a data
set, if that is the most natural way to frame the problem.</li>
  <li>Support data sets other than tables: local files, data frames, cloud
object storage, non-SQL databases.</li>
  <li>Allow you to commit atomically when you are ready, and not force you
to commit before you are ready.</li>
  <li>Create intermediate data sets (temporary tables), and materialize
them if it improves performance.</li>
  <li>Give the compiler freedom to re-order commands, parallelize
commands, and create (or suggest) intermediate data sets.</li>
  <li>Static typing, so that you can find and refactor all uses of a
particular table or field across all queries, scripts, programs, and
jobs.</li>
  <li>Parameterize scripts.</li>
  <li>Refer to previous types of a table, e.g. when the table did not have
a <code class="language-plaintext highlighter-rouge">bonus</code> column, and when the <code class="language-plaintext highlighter-rouge">hiredate</code> column was a string.</li>
</ul>

<p>SQL DML commands fail most of these requirements. (For example, they
can refer to only one previous version of a data set, namely the
contents of the table at the start of the current command.) DML looks
better when considered through the lens of functional programming.</p>

<h1 id="dml-as-functional-programming">DML as functional programming</h1>

<p>Functional programming languages really don’t like assignment: they
would rather create a new variable than mutate an existing variable.</p>

<p>This makes sense for a functional language’s compiler – if a program
has no mutations, the compiler has more freedom to re-order and
parallelize the program – but the controller running data engineering
script would also like to re-order and parallelize wherever possible.</p>

<p>SQL’s DML commands are weird. (Though many of us have been writing SQL
for so long that we’ve gotten used to the semantics.) In a command
that modifies the <code class="language-plaintext highlighter-rouge">emps</code> table, you can also read from a table called
<code class="language-plaintext highlighter-rouge">emps</code>, but it is a different table! It is the <em>before image</em>, the
state of the table before this command started inserting, modifying or
deleting rows.</p>

<p>It is reasonable, and useful, to refer to the before image, but it is
only the before image of the current command. Suppose your script has
three commands – a <code class="language-plaintext highlighter-rouge">DELETE</code>, an <code class="language-plaintext highlighter-rouge">INSERT</code> and an <code class="language-plaintext highlighter-rouge">UPDATE</code>, as in the
previous example – and the <code class="language-plaintext highlighter-rouge">UPDATE</code> command wishes to reference the
state of the <code class="language-plaintext highlighter-rouge">emps</code> table before the <code class="language-plaintext highlighter-rouge">INSERT</code> command was run. In SQL,
you’re out of luck. In the DML syntax I’m proposing for Morel, it is
straightforward:</p>

<!-- morel skip
(* Delete employees who earn more than 1,000. *)
val emps2 =
  from e in scott.emps
    where not (e.sal > 1000);

(* Add one employee. *)
val emps3 = emps2 union
  [{empno = 100, deptno = 20, ename = "HYDE", job = "ANALYST", sal = 1150}];

(* Double the salary of all managers. *)
val emps4 =
  from e in emps3
    yield if e.job = "MANAGER" then {e with sal = e.sal * 2} else e;

(* Commit. *)
commit {scott with emps = emps4};
-->

<div class="code-block">
<div class="code-input"><span class="c">(*</span><span class="cm"> Delete employees who earn more than 1,000. *)</span>
<span class="kr">val</span> <span class="nv">emps2</span> <span class="p">=</span>
  <span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="nn">scott</span><span class="p">.</span><span class="n">emps</span>
    <span class="kr">where</span> <span class="kr">not</span> <span class="p">(</span><span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">&gt;</span> <span class="mi">1000</span><span class="p">);</span>

<span class="c">(*</span><span class="cm"> Add one employee. *)</span>
<span class="kr">val</span> <span class="nv">emps3</span> <span class="p">=</span> <span class="n">emps2</span> <span class="kr">union</span>
  <span class="p">[{</span><span class="n">empno</span> <span class="p">=</span> <span class="mi">100</span><span class="p">,</span> <span class="n">deptno</span> <span class="p">=</span> <span class="mi">20</span><span class="p">,</span> <span class="n">ename</span> <span class="p">=</span> <span class="s2">"HYDE"</span><span class="p">,</span> <span class="n">job</span> <span class="p">=</span> <span class="s2">"ANALYST"</span><span class="p">,</span> <span class="n">sal</span> <span class="p">=</span> <span class="mi">1150</span><span class="p">}];</span>

<span class="c">(*</span><span class="cm"> Double the salary of all managers. *)</span>
<span class="kr">val</span> <span class="nv">emps4</span> <span class="p">=</span>
  <span class="kr">from</span> <span class="nv">e</span> <span class="kr">in</span> <span class="n">emps3</span>
    <span class="kr">yield</span> <span class="kr">if</span> <span class="nn">e</span><span class="p">.</span><span class="n">job</span> <span class="p">=</span> <span class="s2">"MANAGER"</span> <span class="kr">then</span> <span class="p">{</span><span class="n">e</span> <span class="kr">with</span> <span class="n">sal</span> <span class="p">=</span> <span class="nn">e</span><span class="p">.</span><span class="n">sal</span> <span class="o">*</span> <span class="mi">2</span><span class="p">}</span> <span class="kr">else</span> <span class="n">e</span><span class="p">;</span>

<span class="c">(*</span><span class="cm"> Commit. *)</span>
<span class="kr">commit</span> <span class="p">{</span><span class="n">scott</span> <span class="kr">with</span> <span class="n">emps</span> <span class="p">=</span> <span class="n">emps4</span><span class="p">};</span></div>
</div>

<p>The <code class="language-plaintext highlighter-rouge">insert</code>, <code class="language-plaintext highlighter-rouge">update</code>, and <code class="language-plaintext highlighter-rouge">delete</code> commands are no more, but we use
the new <code class="language-plaintext highlighter-rouge">with</code> operator during update and commit.</p>

<p>The <code class="language-plaintext highlighter-rouge">commit</code> command has changed considerably, and deserves at least
one blog posts all to itself. It is now the only command that mutates
(although what mutatation means in a language without side-effects has
yet to be nailed down). Its argument is the value of a database – in
this case the <code class="language-plaintext highlighter-rouge">scott</code> database – which is a record with a field for
each table – and therefore allows us to commit changes to all tables
atomically; if we have changed only a few tables, <code class="language-plaintext highlighter-rouge">with</code> provides a
convenient shorthand. Representing a database as a record will (with
the <code class="language-plaintext highlighter-rouge">forall</code>
<a href="https://github.com/hydromatic/morel/issues/241">quantifier just added</a>,
and the proposed <code class="language-plaintext highlighter-rouge">check</code>
<a href="https://github.com/hydromatic/morel/issues/239">keyword on type specifications</a>)
allow us to define
<a href="https://github.com/hydromatic/morel/issues/240">complex constraints</a>
such as foreign keys.</p>

<p>Because we use a new variable for each version of the <code class="language-plaintext highlighter-rouge">emps</code> table,
the previous versions are still available.</p>

<p>If you’re worried about the performance and storage requirements of
all of these copies of the <code class="language-plaintext highlighter-rouge">emps</code> table, don’t be. These copies are
virtual. For example, if your program asks for <code class="language-plaintext highlighter-rouge">emps2</code>, this might
expand to the contents of <code class="language-plaintext highlighter-rouge">emps3</code> minus the “delta” records
inserted. Or if you are using a table format such as
<a href="https://iceberg.apache.org/">Apache Iceberg</a> or
<a href="https://hudi.apache.org/">Hudi</a>, each version is probably just a
different subset of the underlying files. The proposed Morel syntax is
closer to how databases implement transactions internally (using
<a href="https://en.wikipedia.org/wiki/Multiversion_concurrency_control">snapshots</a>
and <a href="https://en.wikipedia.org/wiki/Write-ahead_logging">journals</a>).</p>

<h1 id="incremental-computation">Incremental computation</h1>

<p>Incremental computation is essential when you are maintaining large
data sets on disk. If you have a table with a billion rows and you
want to insert, update or delete 1% of those rows, it makes no sense
to write the 99% of rows that are not effected. The SQL <code class="language-plaintext highlighter-rouge">INSERT</code>,
<code class="language-plaintext highlighter-rouge">UPDATE</code> and <code class="language-plaintext highlighter-rouge">DELETE</code> commands are clearly designed with that in mind.</p>

<p>But modifying 1% of a collection is mutation, which is antithetical to
the functional programming ethos. It is more complicated to say “here
are the rows added, modified and removed” than to say “here is the new
relation”.</p>

<p>This proposal offers the best of both. A Morel modification command
returns the new set but is implemented by sending incremental commands
to the database. For example, if the “Delete employees who earn more
than 1,000” command above deletes 1% of employees, then <code class="language-plaintext highlighter-rouge">emps2</code> will
contain the remaining 99% of employees, but Morel will update the
database efficiently by sending a <code class="language-plaintext highlighter-rouge">DELETE</code> command.</p>

<p>Incremental computation also arises during incremental view
maintenance (IVM), or more generally, when maintaining some
materialized data set that has been created (explicitly by the user,
or implicitly by Morel’s planner) earlier this script run or in a
previous run. It is notoriously hard to correctly write the
incremental logic; it is preferable to declare that the data set is
always equivalent to running a particular query, and have the planner
compute the deltas.</p>

<p>For both kinds of incremental computation, we believe that people
should work in the “declarative space,” saying “this is the relation
that I want,” and Morel should figure out the incremental operations
to compute and store that relation efficiently.</p>

<h1 id="conclusion">Conclusion</h1>

<p>This article outlines how DML commands could be implemented in Morel,
in a way that we believe is superior to SQL. Rather than creating
direct analogs of the SQL commands, our approach is to use queries to
create named intermediate results: <code class="language-plaintext highlighter-rouge">INSERT</code> becomes a query with
<code class="language-plaintext highlighter-rouge">union</code>, <code class="language-plaintext highlighter-rouge">DELETE</code> becomes a query with <code class="language-plaintext highlighter-rouge">minus</code> or <code class="language-plaintext highlighter-rouge">where not</code>, and
<code class="language-plaintext highlighter-rouge">UPDATE</code> becomes a query with a conditional <code class="language-plaintext highlighter-rouge">yield</code>. The only new
command, <code class="language-plaintext highlighter-rouge">commit</code>, has the magical ability to update all tables in a
database atomically.</p>

<p>This is consistent with Morel’s ethos as a functional relational
language, and makes some of data engineering’s challenges more
tractable.</p>

<p>If you have comments, come see me
<a href="https://www.datacouncil.ai/talks/more-than-query-future-directions-of-query-langages-from-sql-to-morel?hsLang=en">next week in Oakland</a>
or reply on
<a href="https://bsky.app/profile/julianhyde.bsky.social">Bluesky @julianhyde.bsky.social</a>
or Twitter:</p>

<div data_dnt="true">
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet" data-cards="hidden"><p lang="en" dir="ltr">Until last week, I hadn’t thought much about DML in <a href="https://twitter.com/morel_lang?ref_src=twsrc%5Etfw">@morel_lang</a>. I knew we would support DML, of course, but I had assumed that we would just add commands analogous to SQL’s INSERT, UPDATE, DELETE commands.<br /><br />Read the blog post: <a href="https://t.co/P92FBqZTYK">https://t.co/P92FBqZTYK</a></p>&mdash; Julian Hyde (@julianhyde) <a href="https://twitter.com/julianhyde/status/1911938964627574890?ref_src=twsrc%5Etfw">April 15, 2025</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>
</div>

<p>This article
<a href="https://github.com/julianhyde/share/commits/main/blog/_posts/2025-04-14-dml-in-morel.md">has been updated</a>.</p>]]></content><author><name>Julian Hyde</name></author><summary type="html"><![CDATA[Until last week, I hadn’t thought much about DML in Morel. I knew we would support DML, of course, but I had assumed that we would just add commands analogous to SQL’s INSERT, UPDATE, DELETE commands.]]></summary></entry></feed>