Scripted plug-ins in LucidDB and Mondrian

I saw a demo last week of scripted user-defined functions in LucidDB, and was inspired this weekend to add them to Mondrian.

Kevin Secretan of DynamoBI has just contributed some extensions to LucidDB to allow you to call script code (such as JavaScript or Python) in any place where you can have a user-defined function, procedure, or transform. This feature builds on a JVM feature introduced in Java 1.6, scripting engines.

Scripted functions may be a little slower than Java user-defined functions, but what they lose in performance they more than make up in flexibility. Writing user-defined functions in Java has always been laborious: you need to write a Java class, compile it, put it in a jar, put the jar on the server’s class path, and restart the server. Each time you find a bug, you need to repeat that process, and that can easily take a number of minutes each cycle. Because scripted functions are compiled on the fly, you can cycle faster, and spend more of your valuable time working on the actual application.

I am speaking about LucidDB (and SQLstream) here, but the same problems exist for Mondrian plug-ins. Scripting is an opportunity to radically speed up development of application extensions, because everything can be done in the schema file. (Or via the workbench… but that part isn’t implemented yet.)

Mondrian has several plug-in types, all today implemented using a Java SPI. I chose to make scriptable those plug-ins that are defined in a mondrian schema file: user-defined function, member formatter, property formatter, and cell formatter. A small syntax change to the schema file allowed you to chose whether to implement these plug-ins by specifying the name of a Java class (as before) or an inline script.

As an example, here is the factorial function defined in JavaScript:

<UserDefinedFunction name="Factorial">
  <Script language="JavaScript">
    function getParameterTypes() {
      return new Array(new mondrian.olap.type.NumericType());
    }
    function getReturnType(parameterTypes) {
      return new mondrian.olap.type.NumericType();
    }
    function execute(evaluator, arguments) {
      var n = arguments[0].evaluateScalar(evaluator);
      return factorial(n);
    }
    function factorial(n) {
      return n <= 1 ? 1 : n * factorial(n - 1);
    }
  </Script>
</UserDefinedFunction>

A user-defined function ironically requires several functions in order to provide the metadata needed by the MDX type system. The member, property and cell formatters are simpler. They require just one function, so mondrian dispenses with the function header, and requires just the ‘return’ expression inside the Script element. For example, here is a member formatter:

<Level name="name" column="column">
  <MemberFormatter>
    <Script language="JavaScript">
      return member.getName().toUpperCase();
    </Script>
  </MemberFormatter>
</Level>

You can of course write multiple statements, if you wish. Since JavaScript is embedded in the JVM, your code can call back into Java methods, and use the full runtime Java library.

There are examples of cell formatters and property formatters in the latest schema guide.

If you are concerned about performance, you could always translate this code back to a Java UDF when it is fully debugged. However, you might be pleasantly surprised by the performance of JavaScript: I was able to invoke a script function about 20,000 times per second. And I hear that there is a Janino “scripting engine” that compiles Java code into bytecode on the fly. In principle, it should be as fast as a real Java UDF.

I’d love to hear about Janino, or in fact any other scripting engine, with the Mondrian or LucidDB scripted functions.

By the way, you can expect to see scripted functions in a release of SQLstream not too far in the future. The Eigenbase project makes it easy to propagate features between projects, and this feature is too good not to share.