KoalaRainbow :: Koala Rainbow Query Language (KRQL)

Introduction

Overview

The Koala Rainbow Query Language (KRQL) is the means by which you tell KoalaRainbow what data you want to visualize.  It is also the means by which you perform any processing of that data.

KRQL is intended to be 'functional', which means that a given query should not have side-effects.  This relies on functions not having side-effects (currently true), and that you as the user don't use the naive edge traversal to call no-arg mutating functions on MT objects.

KRQL is inspired by XPATH just enough to confuse you, but not so much as to be useful.  This is less that I am a jerk and more that we are not dealing with a hierarchically structured dataset.  Although the MovableType database could be serialized into a variety of nice XML hierarchies, it's not.  At least in my configuration, it's backed by a relational database, and it would be silly to throw away the benefits imbued by such a storage format and the helpful MT object representations.

Quick Examples With Explanation

To get the grokkage flowing...

/entries
Returns the set of all entries in the system (across all blogs), both draft and published.

/entries[status = 2]
Returns the set of all published entries in the system.

/authors[name = 'sombrero']
Returns the set of all authors with the name 'sombrero'.  Note that the 'authors' set in MovableType includes both authors registered with the blog and typekey users that have commented.  Since I am an author on my blog, and I also have a sombrero typekey identity that I have posted with, this results in a set containing two nodes.

/authors[type = 1 and name = 'sombrero']
Returns the set of all authors with the name 'sombrero', that are type 1 (which means a registered author with the blog, as opposed to a typekey identity).   In my case, this turns out to be a set with a single author object (the blog poster one).

/entries[author[type = 1]/name = 'sombrero']
Returns the set of all entries (published and draft) in the system authored by sombrero (the blog author).

/entries[author[type = 1]/name = 'sombrero' and status = 2]
Returns the set of all published entries in the system authored by sombrero (the blog author).

Important Notes

KRQL supports comparison operators that use < and >.  Since KRQL is intended to be used in templates, there's an obvious problem there.  The current solution is that all queries magically have "{" and "&lt;" mapped to "<", and "}" and "&gt;" mapped to ">".  This occurs at such a foolishly high level that even the contents of string literals will be transformed.  My advice is to use { and } because it's much more readable, which is also why they are present.

Syntax

Contexts

There are basically two contexts in KRQL:
  1. The Set Context.  You are always operating on sets of data and returning sets of data.
  2. The Value Context.  You may operate on and return values.  You may also operate on sets and returns sets, but the consumer of the returned data may opt to coerce the set into a scalar context, returning only the first item from the set. 
The value context is a strict superset of the value context.  For the rest of this section, I will annotate each syntax description heading for the contexts in which it is valid.

Er, as a late addition, I need to actually indicate that there are really 3 contexts, the third being a boolean context.  The boolean context is the only one in which comparisons and logical operators are valid.  When I originally wrote this section of the documentation, I think I was assuming that the user could safely be ignorant of this distinction, but I believe that there are many places in the code that only support value contexts (not boolean contexts).  So, you are warned, and I know something I need to fix.

Numbers (Value)

Decimal numbers are expressed just by writing the digits.  They may be negative. Floating point numbers work as long as you don't try and use any of that fancy scientific notation or the like.  They should not be quoted.  There is no hexidecimal support.  There is no octal support.  Just write the numbers.

Valid numbers include: 1, 2, -3, 1.0, 0.001, 42

String Literals (Value)

String literals are are expressed by putting any text you want in single quotes.   There is currently no escape mechanism, so if you want to actually put a single quote in your text, there is no way to do so unless there's a function that returns them that you can use to concatenate together with your string literal.

Examples of some string literals:
'foo'
'foo bar'
'Look, ma! "Quotes!", but not single quotes! No! None of those!'

Comparisons (Value)

The following comparisons are supported: =, !=, <, <=, >, >=.    Please note that since you are most likely going to be using KRQL in HTML templates, you probably should not actually be using "<" and ">".  Instead, replace your usages of "<" with "{" or "&lt;", and your usages of ">" with "}" or "&gt;".  This transform is currently implemented by an ugly hack that affects string literals.  Some great day, I will fix this; most likely after I have bitten by this one too many times.

These comparison operators work for both numbers and strings, with the note that the KRQL engine will use string-comparison rules unless both values meet its standard of looking like numbers (see the definition of numbers above).  The edge case that is probably not intuitive is that an empty string is _not_ a number per these rules.  If you have a value that may be an empty string (or undefined), it is suggested that you use the 'int' function to force a cast to a number.  (ex: int('') returns 0.)

Boolean/Logical Operations (Value)

We support 'and' and 'or'.  Since this is a functional query language, we're not going to admit whether we short-circuit or not.  If I find anyone depending on this, I shall change the implementation to spite them.  Muahahahah.  Not really.  I mean, I am a jerk, but a lazy jerk.  So you're safe.

Like I learned in school, 'and' binds tighter than 'or'.  Since I'm not sure if I said that correctly, let's just write some expressions and show what their equivalent would be in thoroughly unambiguous (to one who doesn't know the precedence) parentheses form.

A and B or C => (A and B) or C
A and B or C and D => (A and B) or (C and D)
A or B and C or D => A or (B and C) or D [I'm not telling which way the tree grows.  It's a secret!]

This has the non-exclusive benefit of allowing you to express any expression, as long as you repeat stuff all crazy.  Since it turns out that most people don't like thinking in K-maps, we also suppport parentheses.

Parentheses (Value)

Use these when you have some form of crazy boolean expression that is not easily represented by the standard boolean operator precedence.  (I mean, sure, technically you could, but that would be one ugly expression!)

Variables (Set, Value)

All values are consist of a dollar-sign followed by an identifier made up of letters and underscores.  No numbers in variables!  Variables retrieve values stored in the KoalaRainbow environment, and may contain either sets or values.  Unlike perl, you can't have a set variable and a value variable; there's only one storage spot.  A variable containing a value evaluated in a set context will be coerced into a set of that one value.  A variable containing a set evaluated in a value context will just return the first value in the set (if any).  Variables cannot be set within KRQL statements.

Some legal variables, all distinct: $foo, $bar, $fooBar, $BAR, $foo_bar.

Function Calls(Set, Value)

All function calls consist of an identifier (made up of letters and underscores) followed by parentheses (regardless of whether there are any arguments.)  Arguments should be separated by commas.  Technically, the parser knows when there should be a comma, so this just provides it an opportunity to yell at you if you forget the comma.

The decision whether to evaluate each argument as a value or a set is made by the function.  Although it might be a nice trick in the future to have functions that expect a value and is invoked on a set to have that function invoked once for each value in the set, returning the resulting set, it hasn't happened yet.  In summary, functions that take their arguments as values will evaluate the arguments in a value context, which generally means taking only the first value from a set.  This behavior is unlikely to change, as it is important for proper operation of singleton sets, such as in contexts.  Functions can return both values and sets.

Each argument is evaluated using the same 'current set' that the function resides in.  This is relevant for evaluating edges (next section.)

Functions cannot be defined by KRQL queries or any other MT-visible mechanism.  The existing functions are documented here.  New functions can be registered by MT plugins, but it's not advised at this time, as future optimizations will require additional function annotations be used.

Examples of some function calls:
add(1, 2)
add(div(4, 4), mul(1, 2))
concat('Hello ', 'World')

Edge (Set, Value)

Any identifier (letters and underscores) that isn't a variable or a function, is an 'edge'.  It is important to understand that at every point in a KRQL query, there is a concept of the 'current set'.  When an 'edge' is found, the KRQL engine creates a new set that is the result of following the 'edge'.  This new set may then be used as the 'current set' if followed by a constraint or an edge-traversal (see below for both); note that this also holds true for variables and functions that contain/return sets.

In theory, an edge could be anything thanks to an abstraction that allows adaptors to express a node/edge (directed) graph representation of anything (well, a simplified one).  In practice, it just means that we call the accessor functions on MT objects for every object in the set and put them in a new set.  For example, the MT::Author object has a 'name' record that is exposed as an accessor function.  If we have a set of authors, then when we evaluate the edge "name", we will get back a new set of the names of those authors.

Set Constraints (Set, Value)

Any time you have a set that you would like to filter using some set of criteria, you should use a constraint.  Set constraints take the form "[some-value-context-thing-to-parse]".  In other words, put brackets after your set, then put some value query inside it.  The value query is evaluated for every item in the set the constraint is attached to.  If the value query returns true (something other than 0, the empty string, or undef), then your item is kept and shows up in the resulting set.  If the query value returns false (0, the empty string, or undef), then the item does not end up in the resulting set.  Note that we're a functional query language, so the original set is still intact, and doesn't get modified.

As a useful tidbit, if you have a constraint where the items in the set you are constraining are already the things you care about, you can use the self() function to get at them.  For example, if you have a set ("foo", "bar", "baz"), and you want to get rid of foo, how would you do this?  There's no edge associated with the strings; they're an end-product.  And so, the solution is to use the self() function, which would return the item in question.  Ex: "$set_we_are_talking_about[self() != "foo"]" returns the set ("bar", "baz").  In all likelihood, a magical keyword will be introduced in the future as an alternative, but that would make us even more perl-like in our line-noise quality.

The result of the set constraint is a new set which you can then perform an edge traversal on.  You can't put multiple constraints in a row without an intervening edge traversal because that would be stupid.  Just use an 'and' to join whatever you would have said.

It's very important to note that when imposing a set constraint, the result of the set constraint becomes the value of evaluating the expression containing whatever came before the set constraint (and now including the set constraint).  This is just like accessing fields in a structure in C or something; if you type "a.b.c" in C, you get back the result of c which was in b which was in a, not a.  To be super clear, if I have a set (1, 2, 3) that I store in the variable 'foo', then the result of evaluating "$foo[self() = 2]" is the set (2).

Edge Traversal (Set, Value)

Anytime you have a set and you would like to make it become the new 'current set', you should use an edge traversal.  An edge traversal is just a front-slash ("/") that comes after a set.  This creates a new sub-expression where the set preceding the forward-slash is the new 'current set'.  This means if we have a set stored in the variable foo, then "$foo/bar" would give us the result of evaluating edge 'bar' on the contents of the set $foo.  You can chain together as many edge traversal as you want.

It's very important to note that when using an edge traversal, the result of the edge traversal becomes the value of evaluating the expression containing whatever came before the edge traversal (and now including the edge traversal).  This is just like accessing fields in a structure in C or something; if you type "a.b.c" in C, you get back the result of c which was in b which was in a, not a.  For example "/authors/name" evaluates to the name of the authors in the system, not the authors.

Global Sets (Set, Value)

A global set is defined by a front-slash that is not preceded by an edge/function/variable, and is followed by an identifier (letters and underscores only).  Examples include "/authors" and "/blogs".  A global set is used to retrieve a new set from a registry of global set providers.  Global sets can be used just like any other set provider, even though I think I forgot to explicitly state it in the cases above.  You can follow them with constraints (ex: /authors[name = 'sombrero']), edge traversals (ex: /entries/author), and use them in any place you'd use a set.  (Of course, if you follow them with a constraint or an edge traversal, it is the constraint or result of the edge traversal that is returned in that case.)

Current global set providers are:

Efficiency

As of version 0.8, KRQL now supports both:
There is of course still more work to be done for efficiency (this is a first generation solution), but these are at least things I wanted to implement, and did implement.

Security

KRQL forbids accessing the 'password' or 'hint' edges on MT::Author objects because there's no good reason to support accessing that data, and I'm not keen on creating security holes.  NOTE however that if you are using the web query interface, the 'dump' routine used to display returned values is not so discriminating.  It will spit out the (crypted) passwords.  If it turns out there are other 'sensitive' fields on certain object types, let me know and I'll add them.


Copyright 2004, Andrew Sutherland