KoalaRainbow :: Koala Rainbow Query Language (KRQL)
Introduction
Overview
The Koala Rainbow Query Language (KRQL) is the means by which you tell
KoalaRainbow what data you want to visualize. It is also the
means by which you perform any processing of that data.
KRQL is intended to be 'functional', which means that a given query
should not have side-effects. This relies on functions not having
side-effects (currently true), and that you as the user don't use the
naive edge traversal to call no-arg mutating functions on MT objects.
KRQL is inspired by XPATH just enough to confuse you, but not so much
as to be useful. This is less that I am a jerk and more that we
are not dealing with a hierarchically structured dataset.
Although the MovableType database could be serialized into a variety of
nice XML hierarchies, it's not. At least in my configuration,
it's backed by a relational database, and it would be silly to throw
away the benefits imbued by such a storage format and the helpful MT
object representations.
Quick Examples With Explanation
To get the grokkage flowing...
/entries
Returns the set of all entries in the system (across all blogs), both
draft and published.
/entries[status = 2]
Returns the set of all published entries in the system.
/authors[name = 'sombrero']
Returns the set of all authors with the name 'sombrero'. Note
that the 'authors' set in MovableType includes both authors registered
with the blog and typekey users that have commented. Since I am
an author on my blog, and I also have a sombrero typekey identity that
I have posted with, this results in a set containing two nodes.
/authors[type = 1 and name = 'sombrero']
Returns the set of all authors with the name 'sombrero', that are type
1 (which means a registered author with the blog, as opposed to a
typekey identity). In my case, this turns out to be a set
with a single author object (the blog poster one).
/entries[author[type = 1]/name = 'sombrero']
Returns the set of all entries (published and draft) in the system
authored by sombrero (the blog author).
/entries[author[type = 1]/name = 'sombrero' and status = 2]
Returns the set of all published entries in the system authored by
sombrero (the blog author).
Important Notes
KRQL supports comparison operators that use < and >. Since
KRQL is intended to be used in templates, there's an obvious problem
there. The current solution is that all queries magically have
"{" and "<" mapped to "<", and "}" and ">" mapped to
">". This occurs at such a foolishly high level that even the
contents of string literals will be transformed. My advice is to
use { and } because it's much more readable, which is also why they are
present.
Syntax
Contexts
There are basically two contexts in KRQL:
- The Set Context. You are always operating on sets of data
and returning sets of data.
- The Value Context. You may operate on and return
values. You may also operate on sets and returns sets, but the
consumer of the returned data may opt to coerce the set into a scalar
context, returning only the first item from the set.
The value context is a strict superset of the value context. For
the rest of this section, I will annotate each syntax description
heading for the contexts in which it is valid.
Er, as a late addition, I need to actually indicate that there are
really 3 contexts, the third being a boolean context. The boolean
context is the only one in which comparisons and logical operators are
valid. When I originally wrote this section of the documentation,
I think I was assuming that the user could safely be ignorant of this
distinction, but I believe that there are many places in the code that
only support value contexts (not boolean contexts). So, you are
warned, and I know something I need to fix.
Numbers (Value)
Decimal numbers are expressed just by writing the digits. They
may be negative. Floating point numbers work as long as you don't try
and use any of that fancy scientific notation or the like. They
should not be quoted. There is no hexidecimal support.
There is no octal support. Just write the numbers.
Valid numbers include: 1, 2, -3, 1.0, 0.001, 42
String Literals (Value)
String literals are are expressed by putting any text you want in
single quotes. There is currently no escape mechanism, so
if you want to actually put a single quote in your text, there is no
way to do so unless there's a function that returns them that you can
use to concatenate together with your string literal.
Examples of some string literals:
'foo'
'foo bar'
'Look, ma! "Quotes!", but not single quotes! No! None of those!'
Comparisons (Value)
The following comparisons are supported: =, !=, <, <=, >,
>=. Please note that since you are most likely
going to be using KRQL in HTML templates, you probably should not
actually be using "<" and ">". Instead, replace your usages
of "<" with "{" or "<", and your usages of ">" with "}" or
">". This transform is currently implemented by an ugly
hack that affects string literals. Some great day, I will fix
this; most likely after I have bitten by this one too many times.
These comparison operators work for both numbers and strings, with the
note that the KRQL engine will use string-comparison rules unless both
values meet its standard of looking like numbers (see the definition of
numbers above). The edge case that is probably not intuitive is
that an empty string is _not_ a number per these rules. If you
have a value that may be an empty string (or undefined), it is
suggested that you use the 'int' function to force a cast to a
number. (ex: int('') returns 0.)
Boolean/Logical Operations (Value)
We support 'and' and 'or'. Since this is a functional query
language, we're not going to admit whether we short-circuit or
not. If I find anyone depending on this, I shall change the
implementation to spite them. Muahahahah. Not really.
I mean, I am a jerk, but a lazy jerk. So you're safe.
Like I learned in school, 'and' binds tighter than 'or'. Since
I'm not sure if I said that correctly, let's just write some
expressions and show what their equivalent would be in thoroughly
unambiguous (to one who doesn't know the precedence) parentheses form.
A and B or C => (A and B) or C
A and B or C and D => (A and B) or (C and D)
A or B and C or D => A or (B and C) or D [I'm not telling which way
the tree grows. It's a secret!]
This has the non-exclusive benefit of allowing you to express any
expression, as long as you repeat stuff all crazy. Since it turns
out that most people don't like thinking in K-maps, we also suppport
parentheses.
Parentheses (Value)
Use these when you have some form of crazy boolean expression that is
not easily represented by the standard boolean operator
precedence. (I mean, sure, technically you could, but that would
be one ugly expression!)
Variables (Set, Value)
All values are consist of a dollar-sign followed by an identifier made
up of letters and underscores. No numbers in variables!
Variables retrieve values stored in the KoalaRainbow environment, and
may contain either sets or values. Unlike perl, you can't have a
set variable and a value variable; there's only one storage spot.
A variable containing a value evaluated in a set context will be
coerced into a set of that one value. A variable containing a set
evaluated in a value context will just return the first value in the
set (if any). Variables cannot be set within KRQL statements.
Some legal variables, all distinct: $foo, $bar, $fooBar, $BAR, $foo_bar.
Function Calls(Set, Value)
All function calls consist of an identifier (made up of letters and
underscores) followed by parentheses (regardless of whether there are
any arguments.) Arguments should be separated by commas.
Technically, the parser knows when there should be a comma, so this
just provides it an opportunity to yell at you if you forget the comma.
The decision whether to evaluate each argument as a value or a set is
made by the function. Although it might be a nice trick in the
future to have functions that expect a value and is invoked on a set to
have that function invoked once for each value in the set, returning
the resulting set, it hasn't happened yet. In summary, functions
that take their arguments as values will evaluate the arguments in a
value context, which generally means taking only the first value from a
set. This behavior is unlikely to change, as it is important for
proper operation of singleton sets, such as in contexts.
Functions can return both values and sets.
Each argument is evaluated using the same 'current set' that the
function resides in. This is relevant for evaluating edges (next
section.)
Functions cannot be defined by KRQL queries or any other MT-visible
mechanism. The existing functions are documented here. New functions can be registered
by MT plugins, but it's not advised at this time, as future
optimizations will require additional function annotations be used.
Examples of some function calls:
add(1, 2)
add(div(4, 4), mul(1, 2))
concat('Hello ', 'World')
Edge (Set, Value)
Any identifier (letters and underscores) that isn't a variable or a
function, is an 'edge'. It is important to understand that at
every point in a KRQL query, there is a concept of the 'current
set'. When an 'edge' is found, the KRQL engine creates a new set
that is the result of following the 'edge'. This new set may then
be used as the 'current set' if followed by a constraint or an
edge-traversal (see below for both); note that this also holds true for
variables and functions that contain/return sets.
In theory, an edge could be anything thanks to an abstraction that
allows adaptors to express a node/edge (directed) graph representation
of anything (well, a simplified one). In practice, it just means
that we call the accessor functions on MT objects for every object in
the set and put them in a new set. For example, the MT::Author
object has a 'name' record that is exposed as an accessor
function. If we have a set of authors, then when we evaluate the
edge "name", we will get back a new set of the names of those authors.
Set Constraints (Set, Value)
Any time you have a set that you would like to filter using some set of
criteria, you should use a constraint. Set constraints take the
form "[some-value-context-thing-to-parse]". In other words, put
brackets after your set, then put some value query inside it. The
value query is evaluated for every item in the set the constraint is
attached to. If the value query returns true (something other
than 0, the empty string, or undef), then your item is kept and shows
up in the resulting set. If the query value returns false (0, the
empty string, or undef), then the item does not end up in the resulting
set. Note that we're a functional query language, so the original
set is still intact, and doesn't get modified.
As a useful tidbit, if you have a constraint where the items in the set
you are constraining are already the things you care about, you can use
the self() function to get at them. For example, if you have a
set ("foo", "bar", "baz"), and you want to get rid of foo, how would
you do this? There's no edge associated with the strings; they're
an end-product. And so, the solution is to use the self()
function, which would return the item in question. Ex:
"$set_we_are_talking_about[self() != "foo"]" returns the set ("bar",
"baz"). In all likelihood, a magical keyword will be introduced
in the future as an alternative, but that would make us even more
perl-like in our line-noise quality.
The result of the set constraint is a new set which you can then
perform an edge traversal on. You can't put multiple constraints
in a row without an intervening edge traversal because that would be
stupid. Just use an 'and' to join whatever you would have said.
It's very important to note that when imposing a set constraint, the
result of the set constraint becomes the value of evaluating the
expression containing whatever came before the set constraint (and now
including the set constraint). This is just like accessing fields
in a structure in C or something; if you type "a.b.c" in C, you get
back the result of c which was in b which was in a, not a. To be
super clear, if I have a set (1, 2, 3) that I store in the variable
'foo', then the result of evaluating "$foo[self() = 2]" is the set (2).
Edge Traversal (Set, Value)
Anytime you have a set and you would like to make it become the new
'current set', you should use an edge traversal. An edge
traversal is just a front-slash ("/") that comes after a set.
This creates a new sub-expression where the set preceding the
forward-slash is the new 'current set'. This means if we have a
set stored in the variable foo, then "$foo/bar" would give us the
result of evaluating edge 'bar' on the contents of the set $foo.
You can chain together as many edge traversal as you want.
It's very important to note that when using an edge traversal, the
result of the edge traversal becomes the value of evaluating the
expression containing whatever came before the edge traversal (and now
including the edge traversal). This is just like accessing fields
in a
structure in C or something; if you type "a.b.c" in C, you get back the
result of c which was in b which was in a, not a. For example
"/authors/name" evaluates to the name of the authors in the system, not
the authors.
Global Sets (Set, Value)
A global set is defined by a front-slash that is not preceded by an
edge/function/variable, and is followed by an identifier (letters and
underscores only). Examples include "/authors" and
"/blogs". A global set is used to retrieve a new set from a
registry of global set providers. Global sets can be used just
like any other set provider, even though I think I forgot to explicitly
state it in the cases above. You can follow them with constraints
(ex: /authors[name = 'sombrero']), edge traversals (ex:
/entries/author), and use them in any place you'd use a set. (Of
course, if you follow them with a constraint or an edge traversal, it
is the constraint or result of the edge traversal that is returned in
that case.)
Current global set providers are:
- authors - Returns the set of MT::Author objects.
- bannedips - Returns the set of MT::IPBanList objects.
- blogs - Returns the set of MT::Blog objects.
- categories - Returns the set of MT::Category objects.
- comments - Returns the set of MT::Comment objects.
- entries - Returns the set of MT::Entry objects.
- notifications - Returns the set of MT::Notification objects.
- placements - Returns the set of MT::Placement objects.
- trackbacks - Returns the set of MT::TBPing objects.
Efficiency
As of version 0.8, KRQL now supports both:
- Partial evaluation of query strings in order to increase
performance. For our purposes, this occurs immediately prior to
the actual evaluation of the statement, which means we can treat all
variable references as constants. As it ends up, the only things
that aren't evaluated during partial evaluation are global set queries
and functions annotated to be 'nondeterministic'. (Where
nondeterministic in this case means that it depends on something that
isn't an argument, like self() or parent().)
- Propagation of constraints imposed on global sets into the actual
MT database interface query. This actually occurs as part of the
partial evaluation mechanism, but taking place after the relevant
evaluations. Three scenarios support propagation:
- Comparisons between edges and constants using one of the
following comparison ops: <, <=, =, >, and >=. (So,
basically, not !=).
- Comparisons between functions with special propagation
annotations. (currently: position, minutes_old, days_old, hours_old.)
- Functions that return a boolean value with special propagation
annotations. (currently: date_in_range.)
There is of course still more work to be done for efficiency (this is a
first generation solution), but these are at least things I wanted to
implement, and did implement.
Security
KRQL forbids accessing the 'password' or 'hint' edges on MT::Author
objects because there's no good reason to support accessing that data,
and I'm not keen on creating security holes. NOTE however that if
you are using the web query interface, the 'dump' routine used to
display returned values is not so discriminating. It will spit
out the (crypted) passwords. If it turns out
there are other 'sensitive' fields on certain object types, let me know
and I'll add them.
Copyright 2004, Andrew Sutherland