File reader and progressive types in Morel version 0.4
Abstract
In Morel 0.4 we have added a file reader, to make it easy to browse your file system for data sets in CSV files. A directory appears to Morel a record value, and the fields of that record are the files or subdirectories. If a file is a list of records, such as a CSV file, its type in Morel will be a list of records.
To make this work, we extended Morel’s type system with a feature we call progressive types. Progressive types give you the benefits of static typing when it’s not possible (or is inefficient or inconvenient) to gather all the type information up front.
There is also a screencast based on the original post; [MOREL-209] is the feature description.
File reader
I wanted to give a quick demo of a feature we’ve just added to Morel.
We’ve added a file reader so that you can easily load and work on data sets such as CSV files. We’ll be using a data set in a directory called “data”.
$ ls -lR src/test/resources/data
src/test/resources/data/:
total 8
drwxrwxr-x 2 jhyde jhyde 4096 Dec 31 14:23 scott
drwxrwxr-x 2 jhyde jhyde 4096 Dec 31 14:38 wordle
src/test/resources/data/scott:
total 16
-rw-rw-r-- 1 jhyde jhyde 50 Dec 24 19:10 bonus.csv
-rw-rw-r-- 1 jhyde jhyde 131 Dec 31 14:23 dept.csv
-rw-rw-r-- 1 jhyde jhyde 420 Dec 24 19:10 emp.csv.gz
-rw-rw-r-- 1 jhyde jhyde 127 Dec 24 19:10 salgrade.csv
src/test/resources/data/wordle:
total 92
-rw-rw-r-- 1 jhyde jhyde 11944 Dec 24 19:10 answers.csv
-rw-rw-r-- 1 jhyde jhyde 77844 Dec 24 19:10 words.csv
That directory has subdirectories scott
and wordle
. Each of
those has some CSV files, and there’s one compressed CSV file.
Now let’s start the Morel shell:
$ ./morel --directory=src/test/resources/data
morel version 0.4.0 (java version "21.0.1", JLine terminal, xterm-256color)
-
Morel is a functional programming language that is also a query language. It is statically typed, and its main type constructors are lists and records.
In the following, we create two record values, a list of records, and write a simple query.
- val fred = {name="Fred", age=27};
val fred = {age=27,name="Fred"} : {age:int, name:string}
- val velma = {name="Velma", age=20};
val velma = {age=20,name="Velma"} : {age:int, name:string}
- val employees = [fred, velma];
val employees = [{age=27,name="Fred"},{age=20,name="Velma"}]
: {age:int, name:string} list
- from e in employees yield e.age;
val it = [27,20] : int list
We wanted to make the file reader interactive. You shouldn’t have to leave the Morel shell to see what files are available.
So you can browse the whole file system as if you are looking at the
fields of a record. The file
object is where you start.
- file;
val it = {scott={},wordle={}} : {scott:{...}, wordle:{...}, ...}
As you can see, it is a record with fields scott
and wordle
. In
the file reader, every directory is a record, and the fields are the
files or subdirectories.
Now let’s look at file.scott
:
- file.scott;
val it = {bonus=<relation>,dept=<relation>,emp=<relation>,salgrade=<relation>}
: {bonus:{...} list, dept:{...} list, emp:{...} list, salgrade:{...} list,
...}
It, too, is a record, but fields such as dept
and emp
are listed
as relations, because they are CSV files. We can run queries on those
data sets. Here is a query to compute the total salary budget for each
department. You could write a similar query in SQL using JOIN
and
GROUP BY
:
- from d in file.scott.dept
= join e in file.scott.emp on d.deptno = e.deptno
= group d.dname compute sum of e.sal;
val it =
[{dname="RESEARCH",sum=10875.0},{dname="SALES",sum=9400.0},
{dname="ACCOUNTING",sum=8750.0}] : {dname:string, sum:real} list
After we have traversed into scott
and dept
, the type of the
file
value has changed:
- file;
val it =
{scott={bonus=<relation>,dept=<relation>,emp=<relation>,salgrade=<relation>},
wordle={}}
: {
scott:{bonus:{...} list, dept:{deptno:int, dname:string, loc:string} list,
emp:
{comm:real, deptno:int, empno:int, ename:string,
hiredate:string, job:string, mgrno:int, sal:real} list,
salgrade:{...} list, ...}, wordle:{...}, ...}
Note that the scott
field has been expanded, and so have the dept
and emp
fields. This is called progressive typing. What is it, and
why did we add it?
Static, dynamic and progressive typing
Progressive typing defers collecting the type of certain record fields until they are referencing in a program. Adding it to Morel was the hardest part of building the file reader.
Why is it necessary? Imagine if that data
directory had contained a
thousand subdirectories and a million files. The type of the file
object would be so large that it would fill many screens. More to the
point, the Morel system would take an age to start up, because it is
opening all those files and directories to find out their types.
Static and dynamic typing each have their strengths (and legions of passionate fans). Static typing improves performance, code quality and maintenance, and helps auto-suggestion in IDEs, but requires a ‘closed world’ where everything is known. Dynamic typing is better for interacting with the ‘open world’ of loosely coupled systems, where not everything is known, and things are forever changing. Reading a file system is definitely in its sweet spot.
But using dynamic typing is not an option in a strong, statically
typed language like Morel. By deferring the collection of types,
progressive typing can handle the ‘open world’ of the file system. By
only ever expanding types, progressive typing retains the guarantees
of static typing. Say I compile my program and file
has a
particular type. Later, the type of file
later expands due to
progressive typing. My program will still be valid, because all the
fields and sub-fields it needs are still there.
Morel’s type system remains static. Progressive types can be injected
into programs at particular points (to date, the file
value is the
only injection point) and do not affect how the rest of the program is
typed.
Variables
You don’t lose the benefits of progressive typing if you use variables.
For example, this query replaces the expression file.scott
in the
previous query with a variable s
to make the query more concise. It
gives the same results as the previous query.
- val s = file.scott;
val s = {bonus=<relation>,dept=<relation>,emp=<relation>,salgrade=<relation>}
: {bonus:{...} list, dept:{...} list, emp:{...} list, salgrade:{...} list,
...}
- from d in s.dept
= join e in s.emp on d.deptno = e.deptno
= group d.dname compute sum of e.sal;
val it =
[{dname="RESEARCH",sum=10875.0},{dname="SALES",sum=9400.0},
{dname="ACCOUNTING",sum=8750.0}] : {dname:string, sum:real} list
Conclusion
The file reader lets you browse the file system starting
from a single variable called file
, and load data sets from CSV
files. Progressive types give you the benefits of static typing but
without filling your screen with useless type information.
If you have comments, please reply on BlueSky @julianhyde.bsky.social or Twitter:
In @morel_lang you can browse the type system, and create data frames from the files in it, as if it were statically typed. What's the trick? Progressive types. https://t.co/bgENfckrng
— Julian Hyde (@julianhyde) March 12, 2025
This article has been updated.