Saturday, July 21, 2007

UML vs. Domain-Specific Languages - a false dichotomy?

I have just listened to the panel discussion at code generation 2007 entitled UML vs. Domain-Specific Languages - a false dichotomy?

It's very interesting. The panel includes some very well known luminaries from the modelling world (e.g Steve Cook) and some less well know but equally capable (such as Tony Clarke of xactium).

Perhaps the heart of the discussion is given over to the subject of modelling versus programming languages. This is something I've come across in various contexts, where the following views are not uncommon:
  • "Modelling languages are for drawing pictures, you build software with programming languages"
  • "If it's graphical, it's a modelling language; programming languages are textual"
Both views are as understandable as they are wrong. Promulgating them is perhaps the most heinous crime of the uml juggernaut. Even today, Grady Booch - one of the uml's originators - maintains it should be used for sketching things out; clarifying your thoughts before writing code. There are plenty of others who also support the view (e.g. Scott Ambler).

As a result the majority body of opinion equates uml - and therefore graphical notation - with sketches. Something that precedes the real work (coding).

That's a great shame - and baseless. In other industries (mechanical engineering and construction for example) graphical notations are standard. Not for sketching, but as the normative specification. The code if you like.

Graphical notations can be used in software too. Some companies have formalised a subset of the uml into a language that is precise, executable and graphical. Conversely this post is textual - but definitely not executable.

Abstract Syntax, Concrete Syntax and Semantics

UML 2.0 wasn't all bad. One of the good things it popularised was a way to look at languages as consisting of three parts: abstract syntax, concrete syntax and semantics. (I prefer to name them concepts, representations and meanings, but that's personal preference):
  • Concepts (or Abstract Syntax) are the things we want to talk about: 'Cars', 'Accounts', 'Dogs', 'Equations', whatever.
  • Representations (or Concrete Syntax) are how we depict those concepts; textual, graphical, aural, olfactory...
  • Meanings (Semantics) are, well, what we mean when we discuss concepts.
It's often difficult to separate representation from concept, since we can only communicate concepts by giving them representation. Using different representations for the same concepts is so intrinsic to our way of life we don't even notice it. (Imagine the confusion if we didn't know the sound 'car' referred to the same concept as the word 'car').

But analysing languages in terms of their concepts, representations and meaning provides a useful framework for classifying languages.

So is UML vs DSLs a false dichotomy?

Let's consider it from the 3 perspectives.
  • Concepts. At its (most useful) core, the uml is intended for describing concepts, their relationships and behaviour. So its concepts - classes, relationships, states, events, etc. - are those useful for describing other concepts. The concepts in a DSL are, by definition, specific to the domain in question. So perhaps Dogs, Breeds and Owners in the domain of pedigree competition. Is that a dichotomy? No. Just a different focus. In fact the raison d'etre of UML's core is to enable description of other domains. The uml is, in effect, a DSL for describing other domains.
  • Representations. UML ascribes a graphical notation to many of its concepts - perhaps most obviously the boxes and lines in class diagrams or state charts. Many would argue - and with good cause - this is in fact the most useful thing about UML. DSLs usually enable a representation appropriate for the domain. It might be graphical (perhaps an image for each dog appropriate to its breed), symbolic (the extended alphabet used in mathematical equations) - whatever suits the problem. Is that a dichotomy? Again, no. However it does expose a limitation in UML, in that it doesn't explicitly allow for representations to be ascribed to concepts described in a UML model.
  • Meaning. The semantics of UML is a subject of life study, not a paragraph in a blog. Despite its supposed formalisation, UML does not have a precise semantics in any useful meaning of the term. That belies its origins as a sketching tool. While some have divined a precise subset it's been done despite the language rather than because of it. DSLs, by comparison, assume that the results of modelling will be machine processed: translated into some other executable form or interpreted directly. Is that a dichotomy? Fundamentally yes. Those are two very different philosophies.
So UML and DSLs do share significant properties. DSLs extend UML in their support for domain specific representation, but it's an extension rather than a conflict. Were UML to allow the assignment of domain-specific notation to the models this difference would disappear.

Where the two fundamentally differ is on semantics. As long as the majority influence on UML supports the standpoint of a sketching toolkit, the dichotomy will remain. There will continue to be those who provide a formal semantics for some subset of uml and in such cases there is no real dichotomy; however those are the exception and not the rule.