Debugging data driven development - Doable?

In a discussion with some architects yesterday, I came upon the point that a new form of debugging is needed. So much development these days is data driven, and when something goes wrong you're back to trial-and-error poking at the problem. I'm sure this isn't a completely original thought, but I haven't seen much written about it.

Consider the following three scenarios:

1) You're writing an XSLT transform for some XML data. If you don't get the syntax just right, the cryptic error message often doesn't give much help.

2) You're using a highly templated class library such as STL. If you don't get the syntax just right, the cryptic error message often doesn't give much help.

3) You're writing a hairy SQL "select" statement. If you don't get the syntax just right, the cryptic error message often doesn't give much help.

It's easy to kill a lot of time "debugging" these problems.

The common denominator in these scenario is that your "code" is really just a data declaration. Some "engine" takes that data and performs "magic" on it, and you see the results, be it new XML, machine code, or a dataset. The problem is that there's little visibility into these engines. Wouldn't it be cool if you could "step" through the engine as all the conceptual processing take place?

As I see it, there's two ways this problem could be addressed. Feel free to pop in with others.

1) The engine (be it XSLT, the compiler or the database) could provide a debugger-like view of its processing, letting you see inspect the intermediate data and control each step of the processing. Obviously, writing something like this would be a lot of work. I have seen tools such as XSLT debuggers, but they don't appear to be totally in the mainstream. More importantly, it would be great if other engines could do something similar.

2) The engine could be "instrumented" so that it emits intermediate results in some usable format. For this to really work well, the engines should standardize on some sort of output format. For instance, database engines (be they SQL Server, Oracle, or MySql) should generate the same basic format so that you wouldn't need to wrap your ahead around a completely different format when going from one DB to another. Ditto for compilers.

I know I haven't proposed any actual workable solutions here, but it's a problem space where I often find my mind wandering.

Comments

  • Anonymous
    January 06, 2005
    It is very much needed... writing SQL statements is a real PITA, and I can see how messing up XSLT could be bad too (it can vary though, some rendering engines are better at pointing out errors)
  • Anonymous
    January 06, 2005
    "Instrumented" compiler would be extremely useful for C++ template metaprogramming.
  • Anonymous
    January 07, 2005
    I have wanted one of these for a long time with c++ templates. Sadly I end up just breaking down the template by hand and compiling until I have built up the original template.<br><br>For Oracle and SQL in general, it would be nice to have callbacks to the parse phase so you could get a better idea of what was happening. I think that you would end up with problems with the common format though, since most vendors parse differently and probably would not be able to create a common framework or set of callbacks.<br>
  • Anonymous
    January 07, 2005
    When debugging these kinds of problems manually, one technique is to temporarily simplify the code/query/declaration. Go back to a canonical form that does work, then in small, incremental steps, add the complexity back in until something goes wrong. (This technique also works for regular code debugging as well as data debugging.)<br><br>Perhaps there's some way to automate this process rather than trying to make the interpretter's process more transparent.<br><br>Steve Maguire has some good examples of data debugging in Writing Solid Code. He has a disassembler that's almost completely table driven, so he added sanity checks to search the table for ambiguities, duplication, holes, etc. This gives more specific diagnostics than you could get from a boolean validator.<br><br>Parsers that are auto-generated from grammar rules are usually the worst at giving useful error messages. Hand-crafted parsers can be much better here.<br><br>It's probably impractical to build a C++ parser manually, but perhaps we could make a handcrafted error message parser that can trim away the noise in an STL syntax diagnostic. Scott Meyers outlines a process for this in Effective STL.
  • Anonymous
    January 20, 2005
    The comment has been removed