It seems that whenever I have a cross-continent flight, Mondrian gets a new feature. This particular flight was from Florida back home to California, and this particular feature is a time-dimension generator.

I was on the way home from an all-hands at Pentaho’s Orlando, Florida headquarters, where new CEO Quentin Gallivan had outlined his strategy for the company. I also got to spend time with the many smart folks from all over the world who work for Pentaho, among them Roland Bouman, formerly an evangelist for MySQL, now with Pentaho, but still passionately advocating for open source databases, open source business intelligence, and above all, keeping it simple.

Roland and I got talking about how to map Mondrian onto operational schemas. Though not designed as star schemas, some operational schemas nevertheless have a structure that can support a cube, with a central fact table surrounded by star or snowflake dimension tables. Often the one thing missing is a time dimension table. Since these time dimension tables look very much the same, how easy would it be for Mondrian to generate them on the fly? Not that difficult, I thought, as the captain turned off the “fasten seatbelts” sign and I opened my laptop. Here’s what I came up with.

Here’s how you declare a regular time dimension table in Mondrian 4:

<PhysicalSchema>
  <Table name='time_by_day'>
  <!-- Other tables... -->
</PhysicalSchema>

Mondrian sees the table name ‘time_by_day’, checks that it exists, and finds the column definitions from the JDBC catalog. The table can then be used in various dimensions in the schema.

An auto-generated time dimension is similar:

<PhysicalSchema>
  <AutoGeneratedDateTable name='time_by_day_generated'
      startDate='2012-01-01' endDate='2014-01-31'>
  <!-- Other tables... -->
</PhysicalSchema>

The first time Mondrian reads the schema, it notices that the table is not present in the schema, and creates and populates it. Here is the DDL and data it produces.

CREATE TABLE time_by_day_generated (
  time_id INTEGER NOT NULL PRIMARY KEY,
  yymmdd INTEGER NOT NULL,
  yyyymmdd INTEGER NOT NULL,
  the_date DATE NOT NULL,
  the_day VARCHAR(20) NOT NULL,
  the_month VARCHAR(20) NOT NULL,
  the_year INTEGER NOT NULL,
  day_of_month VARCHAR(20) NOT NULL,
  week_of_year INTEGER NOT NULL,
  month_of_year INTEGER NOT NULL,
  quarter VARCHAR(20) NOT NULL)
JULIAN YYMMDD YYYYMMDD DATE DAY_OF_WEEK_NAME MONTH_NAME YEAR DAY_OF_MONTH WEEK_OF_YEAR MONTH QUARTER
2455928 120101 20120101 2012-01-01 Sunday January 2012 1 1 1 Q1
2455929 120102 20120102 2012-01-02 Monday January 2012 2 1 1 Q1
2455930 120103 20120103 2012-01-03 Tuesday January 2012 3 1 1 Q1

The columns present are all of the time-dimension domains:

Domain Default column name Default data type Example Description
JULIAN time_id Integer 2454115 Julian day number (0 = January 1, 4713 BC). Additional attribute ‘epoch’, if specified, changes the date at which the value is zero.
YYMMDD yymmdd Integer 120219 Decimal date with two-digit year
YYYYMMDD yyyymmdd Integer 20120219 Decimal date with four-digit year
DATE the_date Date 2012-12-31 Date literal
DAY_OF_WEEK_NAME the_day String Friday Name of day of week
MONTH_NAME the_month String December Name of month
YEAR the_year Integer 2012 Year
DAY_OF_MONTH day_of_month String 31 Day ordinal within month
WEEK_OF_YEAR week_of_year Integer 53 Week ordinal within year
MONTH month_of_year Integer 12 Month ordinal within year
QUARTER quarter String Q4 Name of quarter

Suppose you wish to choose specific column names, or have more control over how values are generated. You can do that by including a <ColumnDefs> element within the table, and <ColumnDef>elements within that – just like a regular <Table>element.

For example,

<PhysicalSchema>
  <AutoGeneratedDateTable name='time_by_day_generated'
      startDate='2008-01-01' endDate='2020-01-31'>
    <ColumnDefs>
      <ColumnDef name='time_id'/>
        <TimeDomain role='JULIAN' epoch='1996-01-01'/>
      </ColumnDef>
      <ColumnDef name='my_year'>
        <TimeDomain role='year'/>
      </ColumnDef>
      <ColumnDef name='my_month'>
        <TimeDomain role='MONTH'/>
      </ColumnDef>
      <ColumnDef name='quarter'/>
      <ColumnDef name='month_of_year'/>
      <ColumnDef name='week_of_year'/>
      <ColumnDef name='day_of_month'/>
      <ColumnDef name='the_month'/>
      <ColumnDef name='the_date'/>
    </ColumnDefs>
    <Key>
      <Column name='time_id/>
    </Key>
  </AutoGeneratedDateTable>
  <!-- Other tables... -->
</PhysicalSchema>`

The first three columns have nested <TimeDomain>elements that tell the generator how to populate them.

The other columns have the standard column name for a particular time domain, and therefore the <TimeDomain> element can be omitted. For instance,

<ColumnDef name='month_of_year'/>

is shorthand for

<ColumnDef name='month_of_year' type='int'>
  <TimeDomain role="month"/>
</ColumnDef>

The nested <Key> element makes that column valid as the target of a link (from a foreign key in the fact table, for instance), and also declares the column as a primary key in the CREATE TABLE statement. This has the pleasant side-effect, on all databases I know of, of creating an index. If you need other indexes on the generated table, create them manually.

The <TimeDomain> element could be extended further. For instance, we could add a locale attribute. This would allow different translations of month and weekday names, and also support locale-specific differences in how week-in-day and day-of-week numbers are calculated.

Note that this functionality is checked into the mondrian-lagunitas branch, so will only be available as part of Mondrian version 4. That release is still pre-alpha. We recently started to regularly build the branch using Jenkins, and you should see the number of failing tests dropping steadily over the next weeks and months. Already over 80% of tests pass, so it’s worth downloading the latest build to kick the tires on your application.