msgbartop
A site for my programming pet projects
msgbarbottom

20 Mar 09 Coco/R plugin for Visual Studio

I’ve blogged before about the excellent Coco/R parser generator. I’m using it a lot in my masters project and I’m happy with it but there were a few things I wished worked differently. The main thing was that I wanted better Visual Studio integration. I had set up a pre-build event that generated the parser and scanner before every build. However there is obviously no need to re-generate the files unless the grammar file has changed. Generating on every build also had the effect that Visual Studio kept prompting me about reloading changed files and I had to build to see if there were any errors in my grammar. So, I decided to create a Visual Studio plugin for Coco/R myself.

Coco/R is open source and written in C# (at least one version of it) so it was easy to get the source. I then looked at a couple of tutorials on Visual Studio plugins and managed to hack together an plugin that works well enough for my needs. I also made a small change to the way Coco/R generates its parsers and scanners from frame files. Here are the features that are unique to the plugin:

  • Works with Visual Studio 2005 and 2008
  • When you click ‘Add a new item’ in a C# project, you’ll find a ‘Coco/R Attributed Grammar’ option under My Templates at the bottom of the screen. The .atg file you get has a simple example grammar that just reads numbers or identifiers. I wanted this because everytime I create a new .atg I start by finding an old one and copying the basics from it.
  • Every time an .atg file is saved, the parser and scanner are re-generated. If the generated files are open then the AddIn closes them before re-generating, to avoid the dreaded “Files were reloaded” prompt.
  • Errors and warnings from Coco show up in Visual Studio’s error list window just like build errors as soon as you save the .atg file.
  • Instead of using frame files, the plugin uses partial classes for the parser and scanner. There are four files, Parser.cs, Parser.generated.cs, Scanner.cs and Scanner.generated.cs. This allows you to add stuff to your parser and scanner in an actual .cs file so you get the benefit of the Visual Studio editor, instead of having to write it in the .frame file or the .atg file.

So, that’s it. You can download an MSI installer for it and you can also view or checkout the source at http://einaregilsson.googlecode.com/svn/dotnet/CocoPlugin/. Enjoy.

Reader's Comments

  1. |

    Hi,
    thanks a lot. Great tool.
    Just a question: I am about to use coco/r with Silverlight which doesn’t support Hashtable. I found that I should use System.Collections.Generic.Dictionary. I changed the definition of start in scanner.cs to static readonly Dictionary start. But when scanner.generated.cs is regenerated, there is still Hashtable there.
    Is there any flexible way of setting the underlying frame file and specifically this datatype?
    Thanks a lot.
    Pavel K.

  2. |

    Hi

    Glad you could use the tool. Unfortunately you can’t really change the frame file without re-compiling the plugin since the frame files are embedded in the CocoPlugin.dll file. Maybe not the best decision I made there…

    But you can if you want check out the source from Subversion at http://einaregilsson.googlecode.com/svn/dotnet/CocoPlugin/

    Then change the Scanner.frame file in the Frames folder and rebuild the .dll. You don’t need to rebuild the installer or anything, just build the dll and dump it in \Visual Studio 200X\AddIns or something like that, where the old CocoPlugin.dll is.

    If that doesn’t work for you for some reason then just let me know exactly how you want the line to read (” = new Dictionary“) and I’ll do a custom build for you.

  3. |

    Hi,
    thanks. I changed it, recompiled it and have been using it for a few day.
    Great tool!! What a relief. All’s changed just with hitting the save button.

    Thanks again.
    Pavel K.

  4. |

    Good work!

    Is it only me? The partial classes approach is great and as it should be done!
    But…would it not be better to declare partial “hook” functions in the generated code files as well, as well as calling them in the appropriate spots during parsing, so a (third?) partial file could contain the implementations, such that regenerating does not affect the hand written code?

  5. |

    Hi Jochen

    Do you have particular hooks in mind that you think would be useful here?

    There is also now an “official” Coco/R plugin for VS (at least made at the same institute that makes Coco/R), however I don’t know exactly what approach they take as I’ve never tried it.

  6. |

    Yes, the hooks could be created automatically, so the .atg file only contains the grammar, but no C# code at all.
    The idea is, that partial functions not implemented are automatically removed by the compiler. And thus, there is no performance hit coming with “too many” of them.
    The usual procedure would be to manually declare a set of partial functions in the upper part of the .atg file and then
    to add (. MyPartialFunction(…); .) at the corresponding spots.
    This is still too much work for a lazy person such as me ;)
    Instead, the idea of the boost/spirit parser could be applied, such that semantic actions are generated as partial functions automatically, maybe along the scheme:

    SomeProduction
    = (. SomeProductionBegin(); .)
    // more “hooks” at each position where a match happens, (so data can be extracted).
    (. SomeProductionEnd(); .)
    .

    Users would then simply implement the partial functions they are interested in in a third .cs partial class part.

  7. |

    Well, it sounds interesting. I don’t really have time to work on this now, but feel free to download the code and try modifying it. If you do, let me know how it works out :)

  8. |

    A bit late…but…

    I used Coco and the plugin a bit more, but am now close to stop using it. The plugin is fine, but what I was sceptical of ever since, is that since the establishment of lex like tool chains, people write their application code into the grammar files. Studying the .ATG samples for the C# parser a bit more, revealed to me, that it also contains a hand written scanner.
    And this is exactly what ails me in my project: I have /several/ applications for the same grammar, not just one, so mixing application and grammar is a bit of a no no for me.
    Also, the standard scanner is a single state automaton, which makes it harder to get useful semantic actions attached.
    Theory says, not without reason, that a scanner is a FSM (Finite state machine), as it needs to be context sensitive as to the state of the parser.
    Example:
    You have doted identifiers (e.g. namespaces) and you have strings, which might contain dots. The naive approach would be cumbersome, as the Scanner would split the string within the “” delimiters, as there is also the dot token, the scanner recognizes.
    I tend more and more to return to my traditional way of tackling my task, using the parser composition approach (see boost/spirit for the idea, while I do it in simpler style in C#). With that, there is no factored out scanner and the state of the parser naturally is coupled to the way, the scanner operates.
    I know, this is probably not the right forum, but I would be interested in a discussion about the question, as to whether a seperated scanner is truly a good idea.

  9. |

    Hi

    Well, first of all Coco itself is not really my work, so I don’t know all the thinking behind it. But as for my opinion, I don’t really understand the problem you’re having:

    1. Several applications for one grammar. What are these applications doing? Why don’t you parse your input into an abstract syntax tree (or some other datastructure) and then let your applications work on that datastructure? What kind of application code are you talking about?

    2. Dotted identifiers and dots inside strings. Are you talking standard quoted strings here, e.g. “hello. World”? If so, that string is one token, so the fact that there is a token inside it doesn’t matter.

    But in general, yes, I could see where state could become an issue for example parsing a php file where on one hand you have everything inside <? ?> which has its own token rules but on the other hand you’re parsing the html outside those tokens.

Leave a Comment