domingo, 30 de outubro de 2016

Monadic Parser Combinators in C#

I have only recently come across the now classic paper by Graham Hutton and Erik Meijer on Monadic Parser Combinators. This beautifully written paper walks the reader, in surprisingly accessible terms, through a technique for developing recursive descent parsers using functional programming.

In the paper, Hutton and Meijer use the functional programming language Gofer for all the examples, but in fact C# is also a perfectly valid language in which to develop these techniques. Doing so offered me a number of insights about C# type inference and generics, LINQ, and laziness, so in this series of blog posts I would like to offer my own translation of the examples in the paper into C#, section by section.

The type of parsers

In the first section of the paper, Hutton and Meijer introduce the parser monad as a function that takes a string as input and returns a collection of results, where each result can be thought of as containing both the abstract syntax tree (AST) corresponding to the parsed input (i.e. the Value), as well as any unconsumed text (i.e. the Tail) that remains to be parsed:

type Parser a = String -> [(a,String)]

Their definition can be translated directly into the following C# delegate type:

public delegate IEnumerable<IResult<TValue>> Parser<out TValue>(string input);

Each individual result from the collection of results can be defined using the following types:

public interface IResult<out TValue>
{
    TValue Value { get; }

    string Tail { get; }
}

public class Result<TValue> : IResult<TValue>
{
    public Result(TValue value, string tail)
    {
        Value = value;
        Tail = tail;
    }

    public TValue Value { get; private set; }

    public string Tail { get; private set; }
}
Notice how we have made the type of TValue covariant through the use of the out keyword in the interface and delegate definition. In this way, we get for free the ability to build parsers for more abstract syntax trees out of parsers for more concrete types.

Primitive parsers

Monadic parsers exploit function composition to build more complex parsers out of combinations of more primitive parsers. In the paper, three primitive parsers are presented as building blocks out of which all other parsers are built: result, zero and item.

In order to keep our C# parser monad more idiomatic relative to other existing monads such as IEnumerable<T> or IObservable<T>, I decided to rename these primitives as Return, Empty and Char, respectively. They are defined below as functions of a static class Parser, much like the enumerable extension methods are defined as part of static class Enumerable:


public static partial class Parser
{
    public static Parser<TValue> Return<TValue>(TValue value)
    {
        return input => EnumerableEx.Return(new Result<TValue>(value, input));
    }

    public static Parser<TValue> Empty<TValue>()
    {
        return input => Enumerable.Empty<IResult<TValue>>();
    }

    public static Parser<char> Char()
    {
        return input => input.Take(1)
                             .Select(x => new Result<char>(x, input.Substring(1)));
    }
}
The Return method creates a parser that succeeds by returning the single result value without consuming any of the input string. In order to create a collection with a single result, I have used the Return method over IEnumerable<T> defined in the Interactive Extensions for the .NET framework (Ix). In case you don't want to include such a large library just for this method definition, its implementation is quite trivial:

static class EnumerableEx
{
    internal static IEnumerable<TValue> Return<TValue>(TValue value)
    {
        yield return value;
    }
}

The Empty method creates a parser that always fails (i.e. produces an empty list of results), in a very close parallel to Enumerable.Empty.

Finally, the Char method returns a parser that consumes a single character from the input string, and returns that character as the result value. Note that this implementation is potentially a very inefficient way to consume single characters from large input strings, but we are focusing for now on clarity rather than performance. Indeed, one advantage of monadic parsers is that if we later come back and reimplement Char using a more efficient technique, the entire library of parsers will automatically benefit from the added performance.

Parser combinators

The parsers defined above will not be very useful until we have provided a solid foundation for combining them into more complex parsers. In monadic parsers, Hutton and Meijer introduce the bind function as a way to integrate sequencing of parsers with processing of their result values:

bind :: Parser a -> (a -> Parser b) -> Parser b
p `bind` f = \inp -> concat [f v inp' | (v,inp') <- p inp]


In the C# world, bind is more commonly known as SelectMany, which is also one of the cornerstones of the IEnumerable<T> monad. The above signature (and implementation) can be translated into idiomatic C# by the following:

public static Parser<TResult> SelectMany<TValue, TResult>(
    this Parser<TValue> parser,
    Func<TValue, Parser<TResult>> parserSelector)
{
    return input => parser(input).SelectMany(
           result => parserSelector(result.Value)(result.Tail));
}
Following the monadic parsers paper, the above definition can be interpreted as follows. First, we apply parser to the input and obtain a collection of results (TValue, string). Notice that we also have a function parserSelector that takes a value and returns a parser. This suggests that what we want to do next is simply to apply that function to each result value (and unconsumed input string) in turn. In this way, for each result we get a collection of new results, so we end up with a collection of collections.

In order to keep the operator inside the monad (i.e. without creating any additional nested structures) we need to flatten this collection of collections. That way, we will have as a result a Parser<T> instead of a Parser<IEnumerable<IResult<T>>>. This is important because we can treat Parser<T> just as any other parser, without thinking of how to deal with the details of some complicated nested structure (i.e. Parser<T> is our monad). Another way to think about this monadic operator is that it allows each of the results of the first parser to be made directly available for the second parser to process.

The easiest way to achieve all of this is simply to apply the existing Enumerable.SelectMany to the collection of results from the parser. This method already expects us to define a function that maps every element in the collection to a new collection (i.e. from one element, select many elements) and it also takes care of concatenating all of the output collections into one final grand collection.

Query comprehension syntax

Following the implementation of the monadic bind (or SelectMany), Hutton and Meijer proceed to define the sat function in terms of bind, which allows parsing characters that satisfy a given predicate. At this point, however, it seemed to me more idiomatic to implement the generic Where combinator, which tests values from any parser against a defined predicate function:

public static Parser<TValue> Where<TValue>(
    this Parser<TValue> parser,
    Func<TValue, bool> predicate)
{
    return input => parser(input).Where(result => predicate(result.Value));
}

Again, we can simply make use of the matching Enumerable combinator applied to the collection of results from the parser in order to make the implementation trivial.

At this point, we might as well also implement the Select method, which will allow us to project parsing results directly into other types:

public static Parser<TResult> Select<TValue, TResult>(
    this Parser<TValue> parser,
    Func<TValue, TResult> selector)
{
    return input => parser(input).Select(
           result => new Result<TResult>(selector(result.Value), result.Tail));
}

Indeed, we can extend the previous implementation of SelectMany to also support projection of the results from the monadic combinator by quite simply applying Select to the intermediate list of results:

public static Parser<TResult> SelectMany<TValue, TIntermediate, TResult>(
    this Parser<TValue> parser,
    Func<TValue, Parser<TIntermediate>> parserSelector,
    Func<TValue, TIntermediate, TResult> resultSelector)
{
    return input => parser(input).SelectMany(
           result => parserSelector(result.Value)(result.Tail).Select(
           iresult => new Result<TResult>(
               resultSelector(result.Value, iresult.Value),
               iresult.Tail)));
}
Now, the amazing thing to realize is that by implementing this set of monadic parser combinators, we have precisely implemented all of the methods that are required to enable LINQ query comprehension syntax over our Parser<T> monad.

Basically, this means we can now write more complex parsers by writing expressions like so:

var digit = from c in Parser.Char()
            where '0' <= c && c <= '9'
            select c;

While digit is not exactly the most complicated of parser combinators, in the next post we will make use of query comprehension syntax freely to enable chaining and sequencing of multiple parser results in order to allow for readable implementations of more advanced (and useful) parsers.

quinta-feira, 31 de março de 2011

SVG XmlSerializer Classes

I've been struggling lately with getting a parser of the SVG file format working in .NET C#. Scavenging the web in search of one has turned out that the few that do exist are often coupled to vector graphics rendering libraries and/or are often incomplete. Some have turned to writing their own simple SVG parsers, usually just for the support of paths, which was not enough for my purposes.

Thinking that SVG should be a simple enough file format to handle, being XML-based, I tried to automatically generate classes for it using the XSD tool. It didn't work right out the bat because of some ill-defined attributes in the SVG.xsd file, but I finally managed to work around it and generate class files that XmlSerializer can read and write, and even tested them successfully with an Inkscape file.

Without further ado, you can find the source here. Feel free to use it for whatever you feel like and to comment if any questions pop out.

quarta-feira, 11 de novembro de 2009

Ramblings on the Command design pattern

I was recently in need of implementing a mechanism for Undo/Redo for a GUI and was led to reminisce on the good old Command design pattern. Basically, I led myself to imagine what was the cleanest and most minimalist way of implementing the pattern in the .NET framework, specifically taking advantage of the C# 3.0 approach to functional programming.

It turns out implementing a generic Command pattern in C# 3.0 is so easy and powerful, via a clever use of delegates, that even the need for some of the classes specified in the original pattern is virtually eliminated.

Here's the implementation of a CommandExecutor class, which single-handedly deals with all the reusable aspects of the pattern:

public class CommandExecutor
{
  private int currentCommand = -1;
  private readonly List<Command> history = new List<Command>();

  public event EventHandler StatusChanged;

  public bool CanUndo
  {
    get { return currentCommand >= 0; }
  }

  public bool CanRedo
  {
    get { return currentCommand < history.Count - 1; }
  }

  public void Execute(Action command, Action undo)
  {
    if (command == null)
    {
      throw new ArgumentNullException("command");
    }

    command();
    if (undo != null)
    {
      history.RemoveRange(
        ++currentCommand,
        history.Count - currentCommand
      );
      history.Add(new Command(command, undo));
    }
    else
    {
      history.Clear();
      currentCommand = -1;
    }

    OnStatusChanged(EventArgs.Empty);
  }

  public void Undo()
  {
    if (CanUndo)
    {
      history[currentCommand--].Undo();
      OnStatusChanged(EventArgs.Empty);
    }
  }

  public void Redo()
  {
    if (CanRedo)
    {
      history[++currentCommand].Execute();
      OnStatusChanged(EventArgs.Empty);
    }
  }

  protected virtual void OnStatusChanged(EventArgs e)
  {
    var handler = StatusChanged;
    if (handler != null)
    {
      handler(this, e);
    }
  }

  private class Command
  {
    private readonly Action execute;
    private readonly Action undo;

    public Command(Action execute, Action undo)
    {
      this.execute = execute;
      this.undo = undo;
    }

    public Action Execute
    {
      get { return this.execute; }
    }

    public Action Undo
    {
      get { return this.undo; }
    }
  }
}

With this functor based implementation, commands can be easily passed into the executor along with the undo method, either as references to existing methods, or as lambda expressions.

Using the delegate indirection, this type of implementation conveniently eliminates the need for implementing a common Command interface, which makes it that much more reusable, as you can easily pass in methods of classes which you did not implement, or very easily and concisely transform specific method calls into commands without the need to implement yet another class.

Also, using the scoping semantics of anonymous delegates, you can easily use closures as data members for sharing immutable data between the execution of the command and the undo operation. For instance, consider the following example:

public class Test
{
  private readonly CommandExecutor executor = new CommandExecutor();

  public void TestScoping()
  {
    int sum = 0;

    for (int i = 0; i < 10; ++i)
    {
      var scopedData = i;
      executor.Execute(
        () => sum += scopedData,
        () => sum -= scopedData
      );
    }

    for (int i = 0; i < 10; ++i)
    {
      executor.Undo();
    }
  }
}

In this example, scopedData is used in the closure of the command delegates. Each iteration of the for-loop produces a different numeric value, which is implicitly preserved onto each command execution. By undoing the command, the sum gets preserved.

If instead the for-loop counter variable was passed, the undo commands would not be adequately preserved, as that variable would have been shared between all closures of all commands. In this way, you can adequately choose the level of sharing between command executions and undo operations, and effectively use the principles of functional programming to insulate each command execution from side-effects, while at the same time writing clean, concise, and efficient code.

segunda-feira, 19 de maio de 2008

Lightweight Fiber/Coroutines implementation on C#

I was recently experimenting with exploiting the C# 2.0 iterators to simulate continuations, as suggested in Wesner Moise's blog and came up with what seems to be a general and reasonable way to implement fibers in .NET.

To implement a logical execution thread we basically need a stack to keep track of "subroutines" and a "program counter" to keep track of the current execution point. Basically we take on the IEnumerator interface as the basic type for program counters.

Simulating an execution is simply a matter of stepping through the current IEnumerator routine. Every time this enumerator yields a subroutine, the current routine is pushed onto the stack and the "program counter" switches to the new routine. Every time a routine terminates, the stack is tested for parent routines by popping().

Here's the lightweight fiber implementation and a small sample of how to use it:
public class Fiber
{
private readonly Stack<IEnumerator> stackFrame =
new Stack<IEnumerator>();
private IEnumerator currentRoutine;

public Fiber(IEnumerator entryPoint)
{
this.currentRoutine = entryPoint;
}

public bool Step()
{
if (currentRoutine.MoveNext())
{
var subRoutine = currentRoutine.Current
as IEnumerator;
if (subRoutine != null)
{
stackFrame.Push(currentRoutine);
currentRoutine = subRoutine;
}
}
else if (stackFrame.Count > 0)
{
currentRoutine = stackFrame.Pop();
}
else
{
OnFiberTerminated(
new
FiberTerminatedEventArgs(
currentRoutine.Current
)
);
return false;
}

return true;
}

public event EventHandler<FiberTerminatedEventArgs>
FiberTerminated;


private void OnFiberTerminated(
FiberTerminatedEventArgs
e)
{
var handler = FiberTerminated;
if (handler != null)
{
handler(this, e);
}
}
}

public class FiberTerminatedEventArgs
: EventArgs
{
private readonly object result;

public FiberTerminatedEventArgs(object result)
{
this.result = result;
}

public object Result
{
get { return this.result; }
}
}
Now to implement a small recursive test program:
class FiberTest
{
private static IEnumerator Recurse(int n)
{
Console.WriteLine(n);
yield return n;
if (n > 0)
{
yield return Recurse(n - 1);
}
}

static void Main(string[] args)
{
var fiber = new Fiber(Recurse(5));
while (fiber.Step()) ;
}
}
This will print all the numbers from 5 to 0 by using fiber recursion. By yielding an IEnumerator, the fiber recursive function pushes the current fiber execution down to the recursive call. Test it yourself on the debugger and step through the execution to understand the basic mechanism.

With such a fiber/coroutine methodology its possible to define logical execution threads controlled by your application in which all involved iterator routines can be arbitrarily interrupted and resumed while preserving context.

Cool stuff once it clicks on you. Nice going for C# 2.0 iterators.

sexta-feira, 9 de maio de 2008

Immutability and the good old 'const' keyword

With all the recent interest on C# functional programming and language integration of related concepts, one of the more interesting ideas is that of expressing immutability of an object.

An object is immutable if no methods beside the constructor can modify its internal state. Automatic verification of immutability is far from trivial, so many have been asking about how one could specify that an object is immutable at compile-time.

This would be interesting in multithreaded functional algorithm implementation, since we could potentially disregard parallelism concerns during function composing. All the shared data between functions was carefully protected simply by the fact that the data itself is immutable.

A couple of days ago I led myself to think: "Wait, isn't this what the const keyword in C++ was all about?". In fact, it's interesting how easily one could verify in C++ that an object was indeed immutable. Simply declare a const reference to an object and you only have access to const methods, which are methods guaranteed by the programmer that they won't change the object's internal state.

C# effectively abolished lots of extraneous keywords, in a quest to end unnecessary verbosity and functionality perhaps, but could it be that we're on the verge of witnessing a comeback of the so-called 'const programming'? No doubt it is an interesting way of solving the problem AND making sure that everything's verified at compile time. But it definitely increases the complexity in programming.

sexta-feira, 18 de abril de 2008

NSvn Handling of Repository Externals Property Text

This first post does not entirely relate to the .NET framework itself, but rather to a useful assembly implemented by the Ankh SVN project: NSvn.

NSvn is a managed assembly to programmatically call Subversion commands like Update, Commit and others. It's a rather straightforward, and usable, library. However, there are some strange behaviors regarding some of its innards which can lead to some rather obscure bugs.

One of the latest I encountered was the handling of the svn:externals property in code. Here's a small annotated snippet of the endeavour:

Client client = new Client();
client.Checkout(sourceSVN, projectName,
Revision.Head, NSvn.Common.Recurse.Full);

Basically I instantiated a NSvn.Client instance and performed a checkout of a repository located in sourceSVN into the directory projectName.

PropertyDictionary dictionary = client.PropGet("svn:externals", projectExternalsDir, Revision.Head, Recurse.None);
Property externalsProperty = dictionary[Path.Combine(sourceSVN, "Externals")];

One can then proceed to access the svn:externals property via the PropGet method. This returns a PropertyDictionary which one can access. The Externals property itself has to be retrieved from the dictionary by specifying the full repository path.

The problem began when we reached the point of actually modifying the value of the property, in particular, when adding a new external repository line:

string values = Encoding.UTF8.GetString(externalsProperty.Data);
values += newExternal +
'\n';
values += newExternal +
'\n';
byte[] vals = Encoding.UTF8.GetBytes(values);
client.PropSet(new
Property(externalsProperty.Name, vals), projectExternals, Recurse.None);

By doing this, NSvn threw a SvnClientException indicating that there was an error parsing the property text.

It turns out that each Externals line has to be separated by a '\n' (not a '\r\n'!), but it's absolutely vital that the last line does not have ANY separator whatsoever. Adding a TrimEnd call to the end of values does the trick:

byte[] vals = Encoding.UTF8.GetBytes(newValues.TrimEnd('\n'));

Strange thing...

quarta-feira, 9 de abril de 2008

FxCritic launches

"The word critic comes from the Greek κριτικός, kritikós - one who discerns, which itself arises from the Ancient Greek word κριτής, krités, meaning a person who offers reasoned judgment or analysis, value judgment, interpretation, or observation. The term can be used to describe an adherent of a position disagreeing with or opposing the object of criticism." - Wikipedia


FxCritic is a blog dedicated to giving reasonable criticism of Microsoft's .NET framework and associated libraries and utilities. Expect commentaries about new functionalities, framework inconsistencies, bugs, weird tweakings, workarounds, interesting ways to achieve desired effects, etc.