Text Generation

Monday, October 10, 2005

Backgrounders

Ingredients

The text generation mechanism is provided by the CTextGenerator class in Gregor.Core. Going from an arbitrary root object, an object graph is created by CObjectGraph.CreateWithPathInfo. The object graph is then further processed, creating text files from literals, code expressions, and simple textual templates (which again support code expressions). All the latter are defined within the path information XML document.

Templates support replacing expressions as well. Graph creation, code expression evaluation, as well as template processing require an object implementing ICodeInterpreter2 (.NET Console, typically).

A Most Basic Example

Here's the XML document with (not so much) path information (PathInfo.xml):

<Root outfilename="Result.txt"
      literal="Hello, Literal!"
      expression="System.DateTime.Now"
      template="Template.htm" />

Here's the template file (Template.htm):

<p>
Hello, Template!
</p>

Let's run it in WebEdit.NET's Console window, using the Code Interpreter:

code:
info = new CTextGenerationInfo(new object())
info.Interpreter = new Gregor.NetConsole.Engine.CCurlyInterpreter()
info.PathInfoDocument = Dev.ReadXmlFile(Vars.DocPath) // assuming PathInfo.xml is active
info.InputDirectory = Vars.DocFolderPath              // assuming any doc is active
info.OutputDirectory = Vars.DocFolderPath             // assuming any doc is active
generator = new CTextGenerator()
generator.Generate(info)

A few notes:

The literal, expression, and template attributes each come with *before, *between, and *after variants, which more precisely control the order in which the respective output appears in relation to output for any child nodes.
Output file names are by default inherited by child nodes, and can be paths relative to the output directory set in the CTextGenerationInfo object. The may also be built dynamically from code expressions; more on this further below.
The paths of all output files produces can be seen in CTextGenerationInfo.FilesProduced.

Using Conditions

In the path info document, which is used by the graphing API, conditions can be expressed with filter, condition, and match attributes. See here.

Templates, as used by the base CTextGenerator class, support conditions only indirectly by calling appropriate routines. These routines may be quite specific to a special purpose; a more general case is passing in boolean flags, like calling CGraphContext.EchoIf(bool, object). The graph context is accessible via the Context variable. For example:

$Context.EchoIf(Context.CurrentNode.IsFirst, "<table>")%
<tr><td>Customer.Name</td><td>Customer.Phone</td></tr>
$Context.EchoIf(Context.CurrentNode.IsLast, "</table>")%

Grouping Collections

I can think of four possibly ways of grouping:

Using the value change management methods (ValueChanges/ExchangeValue) of the graph context. This is fine for any starter content that appears when some property changes, such as inserting a heading when the city of a customer address changes.
```
<Root>
  <Type select="Assembly.GetTypes()[*]">
    <Namespace select="Type.Namespace"
               condition="Context.ExchangeValue(&#x22;Namespace&#x22;, Type.Namespace)" />
  </Type>
</Root>
```
The downside: you can't find out whether a value is about to change on the next graph node.

Note that you can also access the current graph node (with the expression Context.CurrentNode), and navigate from there on (using the NextSiblingNode, PreviousSiblingNode, and their respective Value properties - but it can be a bit awkward with special cases like the first and last node. See the next item.
Using the extended value change tracking facilities of the graph context, which allow peeking back and forth, like this:
```
<Root>
  <Type select="Assembly.GetTypes()[*]">
    <Namespace select="Type.Namespace"
               condition="Context.NodeValueHasChanged()" />
  </Type>
</Root>
```
You can compare of the current with the previous or next node values (see the NodeValueHasChanged() and NodeValueWillChange() methods, respectively), as well as values based on expressions changing from the previous to current or current to next node (see the ExpressionValueHasChanged() and ExpressionValueWillChange() methods, respectively).

Using pre-grouped data structures: an Assembly has Namespaces which have Types which have Members, for example.

<Root>
  <Namespace select="Reflect.GetNamespaces(Assembly.InnerAssembly)[*]">
    <GroupedType select="Assembly.GetTypes(Namespace)[*]" />
  </Namespace>
</Root>

Using ad-hoc grouping with APIs like Gregor.Core.CReflectingGrouper, possible with convenience routines like Walk.Groups().

<Root>
  <Type select="Assembly.GetTypes()[*]">
    <GroupNode select="Walk.Groups(Types, 1, &#x22;Namespace&#x22;)[*]">
      <GroupedType select="GroupNode.Values[*]" />
    </GroupNode>
  </Type>
</Root>

The first two are condition-based: it's simply conditional output that doesn't change the data structure, which remains a flat list. The other two are hierarchical: groups are based on the tree structure.

Managing Output Files

Let's change the path info file. The idea is to generate a couple of output files, relating to four path nodes each.

<Root>

    <Node select="new object()" outfilename="Result.txt">
        <ChildNode select="new object()" template="Template.txt" />
        <ChildNode select="new object()" template="Template.txt" />
    </Node>

    <Node select="new object()" outfilename="Result.txt">
        <ChildNode select="new object()" template="Template.txt" />
        <ChildNode select="new object()" template="Template.txt" />
    </Node>

    <Node select="new object()" outfilenameexpression="&#x22;DynResult.txt&#x22;">
        <ChildNode select="new object()" template="Template.txt" />
        <ChildNode select="new object()" template="Template.txt" />
    </Node>

    <Node select="new object()" outfilenameexpression="&#x22;DynResult.txt&#x22;">
        <ChildNode select="new object()" template="Template.txt" />
        <ChildNode select="new object()" template="Template.txt" />
    </Node>

</Root>

You can set the output file name with any combination of the outfilenameprefix, outfilename, outfilenameexpression, outfilenamesuffix attributes. Things are simply concatenated in the order listed here.

The CTextGenerator class has smart file stream management built-in, as can be studied in this example. If a file name is dynamically build, that is, if an outfilenameexpression attribute is present, the stream is closed when that node (not it children!) is done.

TextGen Command-Line Tool

Part of the FileTools collection (available off the Downloads page) is the TextGen utility. Supplying an XML path info file, you can use it for generating files based on any input.

Tip: within select expressions, you can load additional libraries that provide the object creation or data retrieval services you need:

<Root>
  <Library select="Gregor.Core.Reflect.LoadAssembly(&#x22;System.Data.dll&#x22;)" />
  <Table select="new System.Data.DataTable()" >
    <!-- ... -->
  </Table>
</Root>

See below for notes on the order of evaluation.

Another tip: for reusing code, you can also define user functions in select expressions:

<Root>
  <Foo select="foo(s){Gregor.Core.Parse.Tighten(s, ' ');}" />
  <Bar select="&#x22;   abc   xyz   &#x22;"
    <Bazz select="foo(Bar)"
       <!-- ... -->
    </Bazz>
  </Bar>
</Root>

WebEdit.NET

In WebEdit.NET, choose File/Template/Open Complex Template ..., choosing a path information document (currently preset to a template stored on my web site, which creates a new class along with a type-safe collection), and an output directory.

Have a look at the Resources page for interesting path information documents.

The most simple path info doc looks like this:

<Root literal="Hello, Complex Template!" outfilename="Hello.txt" />

Final Notes

While CTextGenerator processes the path information file in depth-first order, with pre-order traversal, the order in which the underlying object graph is built is undefined (currently: breadth-first, see CGraphNode). Right now, parent and preceding sibling nodes' select attributes will have been evaluated at a given point, but not the select attributes of child nodes of sibling nodes, even preceding siblings.