I have been working with StringTemplate for a few months now. I know that isn’t too long and others have been using it for much longer. But in that time I have thought a lot about it and formed some opinions. (I hope it is clear from what I have already written that I’m a big fan.) In this and the next few articles I’ll try to organize my thoughts, hopefully stir up some discussion and possibly impact future versions of the language.
StringTemplate the language
Presently StringTemplate is described in terms of both its syntax and implementation (API). I think it is important to describe the StringTemplate language independently from its implementation. I see the following benefits to doing this:
- It allows thinking more clearly about the language without implementation details affecting language design decisions.
- It promotes independent interoperable implementations either for additional host languages or alternate implementations for currently supported languages. Implementations could be interpreters or compilers.
A consequence of describing the StringTemplate language is that the language should be versioned independent of the implementation.
Implementation details include locating, loading/parsing, and invoking templates, providing data to the templates, output filters and rendering.
For the most part the language specification can state that an implementation will provide a way to do X without having to specify how. For example, it is up to the implementation to provide a method for invoking a template.
Some things will be dependent on the host language for example the character set encodings supported for templates. Even though there is no syntax in SringTemplate for specifying the character set encoding the language specification should state any underlying assumptions or dependencies.
One very important thing to describe independent of the implementation is the data model that StringTemplate operates on. StringTemplate should clearly define its data model and then, separately, the implementation would define how it maps native types onto that data model. Although a template knows nothing about the type of an attribute (with a minor exception for booleans in
if expressions) it is very aware of the â€œshapeâ€ of the data. StringTemplate supports data with these shapes: scalars (single value), lists (ordered collection), and maps (unordered keyed values). It also has a special value called null that produces no output, and evaluates as false in
Lists can contain any mix of scalars, lists, maps and nulls. A map entry value can be a list, scalar, map or null. The keys of a map must be scalar strings.
The ability to represent the data model literally in StringTemplate is currently inconsistent.
There is a literal syntax for strings, which makes sense since everything gets turned into a string in the end. The grammar includes a definition for integers (INT in
action.g) and this should be removed. There is no reason why StringTemplate should know anything about numbers of any kind. I think Terence agreed to this.
List literals can only be defined in the context where an expression is expected and are unnamed but maps can only be defined in group files at the file level and they are named. I don’t see the harm in allowing literal maps, lists and strings at both the group file level and in the context of an expression. One problem is that literal maps use square brackets as delimiters just like lists. I’m not sure what the best syntax for literal maps should be. Curly brackets are already used for anonymous templates, square brackets are used for lists, parentheses are used for template argument lists. Perhaps [: could introduce a literal map. At the group level you should be able to name a list, map or string like so:
group example; map ::= [: "key1": "value1", default: "none"] list ::= [ "1", "2", "3" ] string ::= "something" ...
The capabilities for literal data should be similar to the JSON
format even though the syntax will likely need to be different. The exception is that numbers and boolean types are not supported.
The program can define a list of lists (for example to represent a matrix) but the literal syntax doesn’t preserve the nesting. Even though
$table(t=[ [ "a", "b" ], [ "c", "d" ] ])$ is allowed it gets flattened. Not sure if this is intentional or a bug but I think that the literal structure should be preserved. If there is a need to flatten a list then it should be made into a function such as
$flatten(list) : ...$. Once there is a literal syntax for maps it should be possible to compose a literal map of lists and a list of maps.
There should be a literal syntax for null so that you can create lists (or maps) with nulls in them and explicitly pass null to a template.
Map and object unification
Maps and objects should be unified. There are a few differences between how StringTemplate deals with objects and maps. Ideally from the StringTemplate data model point of view there should just be one thing; call it a map or call it an object it should behave the same. When you write $A.B$ you have no idea if A is an object and B is a property or A is map and B is a key. In fact the program should be able to change the implementation from an object to a map or the other way around without the template being affected.
The differences are: 1) Maps support the pseudo keys “keys” and “values” and objects do not. 2) Objects have an implied call to
toString that maps don’t. 3) They have different behavior when a property/key doesn’t exist. These differences make the template aware of the underlying implementation type.
I never did like these pseudo keys because they pollute the namespace of map keys. Also there is no reason why you couldn’t ask for all the properties of an object. The “values” key is not really needed because $map$ has the same effect as $map.values$. To solve these problems I think values and keys should be functions like
$keys(A)$ will return a list of all the keys of a map or the properties of an object. For a scalar it should probably return an empty list and for a null return null. The
values function would be defined similarly except that it would return a list of the values of all the keys or all the properties.
The remaining two inconsistencies (access as a scalar and access a key that isn’t present) are trickier. I’m not too sure what the best behaviors are.
There is good reason to support accessing an object as a scalar and providing access to its properties as well. Imagine a date object. When you access it as a scalar it should result in the text representation of its date value. But it may also have useful methods such as month and day. Example:
$today$, $today.month$. For a map I’m not sure if
$map$ should return null (empty string) or a list of all its values as it does now. Either way a fundamental differences remains between maps and objects. Hopefully it isn’t too surprising; this extra ability of objects.
It would be nice if you had control over the behavior when a property/key doesn’t exist in an object/map. I’m not sure if the program or template should get to control it and if it is the template how it would be specified. The literal map syntax already has a default keyword to specify the value when a key doesn’t exist. I think the default should be to give an error if an object property or map key doesn’t exist. For either a map or an object there should be a way to specify the default value for a non-existent key/property. The value of the default (as well as the values in the literal map syntax) should support template expressions. The implicit attribute key would be available to the template. This is more powerful than the key keyword currently supported in the literal map syntax.
One way to support adding a default specification for maps/objects would be composing maps from other maps. Suppose
[: map1, map2... default: expr ] was a literal syntax for composing a new map from map1 and map2. The action
$foo(arg=[: myMap, default: "none" ])$ would create a new map and add all the key value pairs from myMap to it and also specify that if a key is missing the string “none” is returned. This map is then passed as the value for argument arg to template foo. There may be a better way to specify the non-existent key behavior.
Another possibility might be an option like the null option. It might look like this:
Template argument shape
Clearly there is no point in specifying the type of arguments in template definitions when templates have no knowledge of the underlying attribute type. Example:
myTemplate(int i, Date d) ::= ... makes no sense. But would it be useful for a template to declare the shape of the argument value (scalar, map, or list)?
This may be similar to what Terence wrote in this post:
I have been thinking for quite a while about adding *, +, and ? to the definition of template arguments so that you know whether you should get zero more, one or more, or zero or one values in the attribute. That way ST can check the cardinality and existence
automatically for you.
It seems that this would only apply to lists and would tell something about how many items are expected in the list.
If a template definition declared that its argument must be a scalar and a list or map is passed instead an error would be given. The distinction between a scalar and an object/map is very subtle. The declaration is more about how you intend to use it rather than the actual underlying type. In one case you expect it to have some specific properties and in the other case you don’t care if it has properties or not.
The shape declaration could be optional. I’m not sure if this is useful or not. At a minimum it would help document what the template expects. Which leads to the next topic.
Agreeing on the data
Every now and then the question of “how can I find out what attributes a template uses” gets asked on the stringtemplate-interest mailing list. From the other vantage point a reasonable question is “what data does the program make available to the template”. The way you look at it may come down to who is driving the application design; the template writer or the program writer. If they are the same person it may depend on how they see the problem; either driven by the needs of the output or driven by the program and the data it has.
Either way the important point is that the program and the template have to agree on the data and need some way to communicate what the data looks like. Currently StringTemplate has no way to facilitate this communication or validate that the data meets the agreed upon structure. I’m not saying that it needs to. I’m just wondering if it would be useful and what it might look like.
In my own work so far I have not done anything to write down what the structure of the data is. It has not been a problem for me because the application is small and I am writing both the program and the templates but I know this won’t scale. Simply looking at the template arguments is of little help because the data can be deeply nested. Even the above mentioned suggestion to declare the shape of the template arguments doesn’t help much because of this deep structure. How do you solve this problem? (Thats not a rhetorical question – I’d like to know what you do.)
In the XML world a schema is used for this purpose. The producer and consumer of some structured data both agree on the schema. The schema describes the structure of the data as well as the types of individual values. Any time between when the data is produced and when it is consumed the data can be validated against the schema. With XML the data is usually serialized in a text document in XML format but it doesn’t need to be. The schema language may be Relax NG (unless you are unfortunate and have to use XML Schema).
This makes sense for XML but may be overkill for StringTemplate. A big difference between the two is that the coupling between the producer (the program) and the consumer (the template) is much tighter with StringTemplate. The template and the program are designed and built together and the data is unlikely to be serialized in between. Still I wonder if something like Relax NG could be useful as a way to communicate the data agreement. Validation of the data could be done as part of a unit test. Another potential benefit of a schema would be assisted editing. The editor would know about what attribute and attribute properties are available.
In the Java/OO world the class declarations describe the data. The data structure may be given as a UML diagram. This works for communication. Within the program code Java’s type checking can help ensure that the data is valid but there is nothing to enforce, at “build” time, that the template is using the data according to the Java types. Again, I’m not suggesting that StringTemplate needs strong or static type checking – just making a comparison.
Possible host languages
Thinking about StringTemplate as an independent language has me wondering what kinds of host languages could it be implemented in. Are there any languages or classes of languages where it wouldn’t work well, isn’t needed, or would have limitations.
One issue that comes to mind is that the rendering of attributes depends on the type of the attribute. The identification of rendering as a distinct aspect of template processing is a very powerful feature of StringTemplate. The template doesn’t need to know about data types or type specific formating because the renderer knows how to format the attribute according to its type.
What would happen in a dynamically typed language, would the renderer know how to format the attribute in a meaningful way? If not then the template would need to rely on the format option.
Well that’s more than enough thoughts for now. I hope to get the next batch written up soon.