Expressing package queries in Morel

Morel is more than a query language. Because it is a functional language with relational algebra built in, it can express problems that a query language can’t. It already supports constraint programming and logic programming, and package queries are another example.

Consider the following problem: given a set of recipes, find the combination of three gluten-free recipes that total between 2,000 and 2,500 calories and has less saturated fat than all other such combinations. That constraint is not about any single row; it is about a combination of rows, and no where clause can express it. It is an optimization problem in the shape of a query.

A 2016 paper, Scalable Package Queries in Relational Database Systems, introduced package queries, along with PaQL, an extension to SQL. Given a table, a package query finds a combination of its rows — perhaps with repetition — that together satisfy a set of constraints while maximizing or minimizing an objective. They generalize relational queries, and can express integer linear programming (ILP) problems such as bin-packing. The paper’s running example plans a day of meals:

SELECT PACKAGE(R) AS P
FROM Recipes R REPEAT 0
WHERE R.gluten = 'free'
SUCH THAT COUNT(P.*) = 3
  AND SUM(P.kcal) BETWEEN 2.0 AND 2.5
MINIMIZE SUM(P.saturated_fat)

The recipes must be gluten-free, there must be exactly three, and REPEAT 0 says that none may occur more than once. (The paper measures calories in thousands, so 2.0 means 2,000 kcal.)

Plain SQL has no concise, declarative way to write this. When a package has a fixed cardinality — as here, with COUNT(P.*) = 3 — you can fake it with a three-way self-join, and the general case can be encoded by using recursion to build a powerset table and testing each subset; but neither is declarative, optimizable, or efficient. (The paper makes both points.) In Morel it is straightforward:

from p in subBags (from r in recipes where r.gluten = "free")
  where Bag.length p = 3
    andalso sum (from r in p yield r.cal) >= 2000
    andalso sum (from r in p yield r.cal) <= 2500
  order sum (from r in p yield r.satFat)
  take 1;

You can make it more concise by computing the count, calories, and fat up front, and using the new range-list syntax for the predicate on calories. Over a small set of recipes:

val recipes = bag [
  {name = "Grilled Salmon Bowl", gluten = "free",    cal = 700, satFat = 4},
  {name = "Quinoa Veg Stir-fry", gluten = "free",    cal = 550, satFat = 2},
  {name = "Lentil Curry",        gluten = "free",    cal = 600, satFat = 3},
  {name = "Chicken Rice Plate",  gluten = "free",    cal = 850, satFat = 6},
  {name = "Steak & Potatoes",    gluten = "free",    cal = 950, satFat = 12},
  {name = "Avocado Tofu Salad",  gluten = "free",    cal = 500, satFat = 5},
  {name = "Cheese Omelette",     gluten = "free",    cal = 650, satFat = 10},
  {name = "Pasta Carbonara",     gluten = "present", cal = 800, satFat = 14},
  {name = "Pizza Margherita",    gluten = "present", cal = 900, satFat = 11}];

the more concise query returns:

from p in subBags (from r in recipes where r.gluten = "free"),
    {c, cal, fat} =
      (from r in p
        compute {c = count over (),
                 cal = sum over r.cal,
                 fat = sum over r.satFat})
  where c = 3 andalso cal elem [2000 .. 2500]
  order fat
  take 1;
c cal  fat p
           cal gluten name                satFat
- ---- --- --- ------ ------------------- ------
3 2000  11 550 free   Quinoa Veg Stir-fry      2
           600 free   Lentil Curry             3
           850 free   Chicken Rice Plate       6

val it : {c:int, cal:int, fat:int, p:{cal:int, gluten:string, name:string, satFat:int} bag} list

The optimizer does not simply pick the lowest-fat recipes. The two non-gluten-free recipes are filtered out before any packing. And the three lowest-fat recipes overall — Salmon, Quinoa, and Lentil, at 9 g of saturated fat — total only 1,850 calories, below the floor, so the constraint forces the optimizer to drop the salmon in favor of the higher-calorie Chicken Rice Plate, landing exactly on 2,000 calories at 11 g. Minimizing the objective naively would give the wrong answer; the constraint and the objective pull against each other.

That compute clause is worth a second look. The row it builds — the package p together with c , cal , and fat — is the feature vector of a candidate package. Each of c , cal , and fat is a linear function of which tuples we chose (and how many of each), which is exactly the form an integer linear program wants.

The one thing missing from Morel is the subBags function. That’s easy to write:

fun subLists list =
    List.foldr
      (fn (x, acc) => acc @ List.map (fn s => x :: s) acc)
      [[]]
      list;
val subLists = fn : 'a list -> 'a list list

fun subBags bag =
    Bag.map Bag.fromList (Bag.fromList (subLists (Bag.toList bag)));
val subBags = fn : 'a bag -> 'a bag bag

To be at par with PaQL, subBags needs an extra parameter, k . If k is positive, each package can contain up to k + 1 occurrences of each tuple in the source. Modifying subBags is left as an exercise for the reader (or your favorite agent).

Still, we don’t recommend actually executing subBags . N recipes generate 2^N packages (even more if k is positive), so the algorithm quickly becomes intractable. We need a smarter strategy: make subBags a built-in operator, and have rules recognize it in a query and send that part of the query to a specialized solver.

The feature vector is what gets sent to the solver. The translation rules turn the compute row into an integer linear program: one integer variable per source tuple (how many times it is chosen), the aggregates c , cal , and fat into linear functions of those variables, and the where and order clauses into constraints and an objective. Because every quantity in the query is linear in that choice vector, the translation is mechanical. The moment a constraint isn’t linear — a probabilistic “the loss stays below X with high probability” bound, or a requirement that the chosen tuples form a contiguous region — you are outside what this machinery can lower, and you need a different solver.

One such solver is presented in the paper: SketchRefine, an approximate divide-and-conquer technique for answering package queries efficiently on large datasets.

Morel expresses package queries cleanly, but expressing a problem is not the same as executing it. Package queries are NP-hard in general — the paper proves that PaQL is at least as expressive as integer linear programming, and knapsack, a special case, is NP-complete — so no query language is going to make them cheap. That is exactly why it helps to have them inside a general-purpose language. The combination of rows that ordinary SQL can’t even describe is, in Morel, just another query: the irreducibly hard part goes to a solver, and the ordinary relational work around it does not have to.

Constraint programming and logic programming both run on the same from and where constructs, made runnable by predicate inversion. Coloring a map so that no two neighboring states share a color is a from over color variables whose where is a conjunction of inequalities; a small optimization — say, how many banana and chocolate cakes to bake from a limited pantry, an example from Basic Modelling in MiniZinc — is the same, with an order on the objective. And because a predicate can recurse, Morel runs Datalog; an earlier article walked through transitive closure and the classic Joe’s bar rules. Package queries add combinatorial optimization over packages of rows.

That seam is the point. A single language, type system, and environment can express a hybrid problem end to end — pull records from a database, feed them into a solver for a graph algorithm or an arithmetic optimization, and consume the result — with no marshalling of data sets between half a dozen little languages. And because the relational structure stays visible to the compiler, Morel can push down filters and materialize intermediate results before handing the hard core to a specialized solver.

If you have comments, please reply on Bluesky @julianhyde.bsky.social or Twitter:

Package queries are an optimization problem - they encompass famous NP-complete 'knapsack problem' - but I wanted to see whether I could express them in @morel_lang's query-like syntax. https://t.co/6SReL5O3xZ pic.twitter.com/ZCIBVgNL67
— Julian Hyde (@julianhyde) June 3, 2026