Flexibly Navigated Object Graphs

Sunday, September 25, 2005

Remember that I introduced a general API for navigating among objects using reflection this spring? This is implemented at two levels:

The GetChildObjects() methods in the Gregor.Core.Reflect module provide a simple, procedural query approach for immediate child objects.
The CObjectGraph class and related types (in Gregor.Core.Collections) wrap a graph of objects into a lazily-built tree structure. The nodes in the tree (CGraphNode) allow adding arbitrary properties. Navigation is based on the general Reflect.GetChildObjects() API.

Building on the second layer, I've found another way to use the .NET Console code interpreter: object relationships need not be deduced blindly from references and collections stored in fields or returned from properties, but can be obtained more flexibly from expressions that a code interpreter (implementing Gregor.Core.ICodeInterpreter2) is capable of evaluating.

Using the code interpreter opens up a world of opportunities, such as filtering, sorting, or grouping objects using standard APIs, formatting strings, and so on, but the most useful task is even establishing a relationship between otherwise unrelated (that is, by object references) objects:

class Customer {
    // no link to orders available!
}

static class OrderUtil {
    public static List<Order> GetOrders(Customer cust);
}

The code interpreter can simply call the latter function, and pass the current customer object. And thereby establish an association.

Code Time

So let's look at an example. Our root object is the following:

object root = new object();

Our task shall be to create a tree that contains the object's type, its members, and a few gimmicks later on. How will it work?

A CObjectGraph can be built with path information contained in an XML document, known as a path information document. A path info document describes a multifold path through a data structure. The XML document may (but doesn't have to) contain expressions for the code interpreter. Here's the template:

<Root>
    <Type select="Root.GetType()">
        <FullName />
        <Members select="Type.GetMembers()">
            <Member select="Members[*]" />
        </Members>
    </Type>
</Root>

A few observations:

The select attributes contain expressions for the code interpreter.
The element names will become variables holding all the objects among the path from the current node the the root node. Inner objects will hide outer objects of the same name.
The element names serve another purpose: if no select attribute is present, a member access expression (field or property) will be supplied for the code interpreter. This usually means that a property will be evaluated, which is just fine for many cases.
Collections become nodes in the graph themselves.
Collection items may be regularly accessed with indexer expressions.
If all items in a collection are to be processed, as you'd typically want, an expression ending with [*] is a special case: it means to enumerate the object, or executing the enumerator, returned by evaluating the expression without the trailing bracketry (Members, in this case), which is required to implement IEnumerable or IEnumerator, respectively. The so enumerated items may still be filtered; more on that later.
The graph will be created lazily as the user reads CObjectGraph.RootNode and the relevant CGraphNode properties, such as ChildNodes.
The order in which the object graph is built (and select attributes are evaluated) is undefined. Right now, it's breadth-first, ie., parent and preceding sibling nodes' select attributes will have been evaluated at a given point, but not the select attributes of child nodes of sibling nodes, even preceding siblings. Keep this in mind with respect to side effects.

Let's run this example:

private static void TestPathInfoGraph(){

    ICodeInterpreter2 preter = new Gregor.NetConsole.Engine.CCurlyInterpreter();
    XmlDocument xd = Dev.ReadXmlFile("ObjectPathSimple.xml");
    
    CObjectGraph graph = CObjectGraph.CreateWithPathInfo(new object(), xd, preter);
    TraceGraph(graph);
}

private static void TraceGraph(CObjectGraph graph){
    ITreeWalkHelper helper = new CDefaultTreeWalkHelper(graph);
    CTreeWalker walker = new CTreeWalker(helper);
    NStringIndents indents = new NStringIndents("  ");
    while(walker.MoveNext()){
        CGraphNode graphNode = (CGraphNode) walker.Current;
        Dev.Trace(indents[walker.Level] + graphNode.Value);
    }
}

The following output will be produced:

System.Object
  System.Object
    System.Object
    System.Reflection.MemberInfo[]
      Int32 GetHashCode()
      Boolean Equals(System.Object)
      System.String ToString()
      Boolean Equals(System.Object, System.Object)
      Boolean ReferenceEquals(System.Object, System.Object)
      System.Type GetType()
      Void .ctor()

Working with the Graph

The graph's nodes (CGraphNode) store their corresponding XML elements, which may be used by an application for assistance in further processing. Here's how you can obtain an element, and read its attributes:

foreach(CGraphNode graphNode in Walk.Tree(graph)){
    XmlElement xe = (XmlElement) graphNode.Properties[CGraphNode.PROPERTY_PATHINFOELEMENT];
    XmlAttribute xa = xe.Attributes["MyAttribute"];
    // ... use xa
}

That said, there are a number of attribute names reserved for use by the graphing implementation:

select: These contain expressions for selecting objects into the tree, that is, navigation. All parent elements may be accessed by element name. It's also possible to invoke static members, such as module functions or properties like System.DateTime.Now.
filter: These contain expressions much like those in select attributes. The current as well as all parent elements may be accessed by element name. The result of a filter expression is compared against the current object. This allows simplified equality tests: the object is selected only if it is equal to another object.
match: These contain patterns for matching strings instead of code expressions, which makes for simplified pattern matching. The current object is converted to a string by calling the ObjectToString() method of Gregor.Core.Conv. Null references are converted to empty strings. The actual match is performed using Gregor.Core.Parse.StringMatches.
condition: These, in contrast to filter attributes, contain expressions that must yield a boolean value. The current object is selected if the result is true. Here, more complex test may be performed, usually by calling specific equality functions. The current as well as all parent elements may be accessed by element name.

Here are some examples for using the filtering attributes:

<!-- select member type only for constructors -->
<Member select="Members[*]">
    <MemberType filter="System.Reflection.MemberTypes.Constructor" />
</Member>

<!-- select members containing "q" -->
<Member select="Members[*]" match="*q*" />

<!-- select members whose name starts with "Get" -->
<Member select="Members[*]" condition="Member.Name.StartsWith(&#x22;Get&#x22;)" />

<!-- select declared members only
     ("Type" is the element name of an ancestor node, see above) -->
<Member select="Members[*]" condition="Member.DeclaringType.Equals(Type)" />

See the TestCore3 project in the Gregor.Core source download for a comprehensive expample using these attributes.

Outlook

Object graphs built with path information can be used for a variety of purposes, such as:

Querying an object graph for persistance. It's like serialization, but the difference is that it can be controlled not by custom attributes on the objects' types, but rather by the path information supplied. While in general it seems the right thing to do (aspect-wise) to store such information right with the data definition (ie., the class) itself, separating it out into XML files provides more flexibility.
Enabling code or document generation right on objects themselves, like in object query languages (another approach would be serialization to XML, and then using XSLT). I'm going to invest into that area - querying objects with XML path information and the code interpreter, and then using a template-based approach on a node-by-node basis, again using the code interpreter. We'll see how that compares against XSLT. In the meantime, have a look at Parse.ProcessTemplate(), and the CTextGenerator class.
The object graph may also be used on an XML document itself, as an alternative over or as an extension to XPath (the code interpreter may still evaluate calls to XmlNode.SelectNodes etc.). For example:
```
<Root>
    <DocumentElement>
        <Children select="DocumentElement.SelectNodes(&#x22;child[@type='foo']&#x22;)">
            <Child select="Children[*]" />
        </Children>
    </DocumentElement>
</Root>
```
Likewise, any related processing, such as templated document generation, may be an alternative over XSLT.
Providing a user interface for an object model, as is done in property grids. See also the CDynamicGridSetting class in Gregor.AppCore.Settings (it needs refactoring).

The idea is that a data model is defined once, and ideally only once, and everything else - GUI, persistance, querying - builds on that model. Things should work automatically, or at least with little effort, like templated code generation. The data model, while it should be maintained as class definitions ("only once"; aspect orientation) can be modified somewhat if object graphs can be created more flexibly. This is about "smart data structures" versus "smart tools"; I'm leaning toward the latter, lately.