Doubt and uncertainty!

The C and C++ standards documents can be a bit of a beast to trawl through and quite often you’ll find yourself reading the same sentence a number of times trying to fathom out what it is actually saying. It’s just like when you read the EULA for a software product; lots of big words and long sentences that don’t actually seem to make a lot of sense.

Of course, they do make sense but the language used is necessarily long winded because it has to cover all bases. After all, the standard documents are the final word in how the language is used and defines what compiler vendors need to do to make sure their product behaves correctly and as expected.

Unfortunately, even with all the long sentences and big words the standards documents cannot hope to capture all cases. There are an infinite number of ways the C and C++ programming languages can be used to develop programs and there are a plethora of Operating Systems and hardware platforms that need to be considered.

To this end the standards have a get out of jail card. Put simply, anything that is not specifically defined by the documents as having an explicit behaviour is, by its very definition, implicitly undefined. Basically, if the standard doesn’t guarantee something you cannot safely write code that will depend on it, regardless of what you favourite compiler might do.

Actually, the standards documents do go one step further than this by quite often stating what the expectations should be for something that it doesn’t explicitly define. The standards documents use two different phrases to set the level of expectation and they both have very precise meaning as far as they are concerned:

  • undefined behaviour
  • unspecified behaviour

At face value these two phrases look pretty much the same. They both make it clear that the standards don’t provide any guarantees on the behaviour. They are; however, very different in terms of the semantics assigned to them by the standards. Let’s look at the formal wording for both of these as defined by both the C11 and C++11 standards.

Undefined behaviour

The C++11 standard

Behaviour for which this International Standard imposes no requirements. Undefined behaviour may be expected when this International Standard omits any explicit definition of behaviour or when a program uses an erroneous construct or erroneous data. Permissible undefined behaviour ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

The C11 standard

Behaviour, upon use of a non-portable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. Possible undefined behaviour ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). An example of undefined behaviour is the behaviour on integer overflow.

Unspecified behaviour

The C++11 standard

Behaviour, for a well-formed program construct and correct data, that depends on the implementation. The implementation is not required to document which behaviour occurs. The range of possible behaviours is usually delineated by this International Standard.

The C11 standard

Use of an unspecified value, or other behaviour where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance. An example of unspecified behaviour is the order in which the arguments to a function are evaluated.

As you can see, in both cases the meanings are very well defined but to be sure what they mean is understood let’s try and put them into more every-day language.

Undefined behaviour has no sensible outcome as far as the standards documents are concerned. Things might work as you’d expected (hoped?) but then again they might not. If things do work they are working more by luck and chance than anything else.

If you write code that contains constructs that would result in undefined behaviour your code should be considered defective. That’s not to say things won’t work. They actually might but just because they work today doesn’t mean they’ll work tomorrow. Something that works by luck rather than design cannot really be considered working, can it?

To put this into a littler perspective, let’s consider code that contains a buffer overrun that trashes a function’s stack frame. It might just be that the result of this exhibits no noticeable mal-effects. At some point you add a new variable to the function, which is held on the stack, and as such changes all the off-sets for that stack-frame.

Bingo! The result is that this code may very well now behave in a completely different and random way, the result of which is… undefined!

Unlike undefined behaviour, code that relies on unspecified behaviour isn’t considered erroneous. On the contrary, it is well formed (assuming no other issues). In this case; however, the behaviour will be determined by the compiler, OS and/or hardware. As long as none of these change the behaviour will be consistent. What isn’t consistent; however, is the behaviour when one or more of these is changed. In other words, the code is non-portable, or otherwise platform specific.

In other words, how the code behaves is dependent on the environment it is born and runs within. It will always behave the same as long as the environment isn’t changed but the standards documents take no view on what this behaviour will be. It is necessary to refer to compiler, platform and/or hardware documentation to determine how the code will behave.

Writing code that has unspecified behaviour is perfectly reasonable if you are targeting a particular platform using a particular compiler and the code is designed to run on a particular Operating System. The problem is that because the standards make no promises about how the code will behave if any of these environmental conditions change the behaviour of the code could be broken and it will no longer function as you’ve come to expect.

This means that it will be necessary to undergo intensive testing, for example, each time you upgrade your compiler, install a service pack onto the target platform or upgrade hardware. On the contrary, code that is not platform dependent is guaranteed to always behave the same way as as described by the standards documentation (assuming the compiler is standards compliant and contains, itself, no defects).

Writing code that has “undefined behaviour” is a recipe for disaster. It is almost certainly going to end in tears. If your code doesn’t have behaviour that is explicitly defined by the relevant standards document you should assume it is defective and you should fix it. Likewise, if the standard explicitly states the behaviour is undefined then it is defected and not fixing it is really the coding equivalent of playing Russian Roulette.

If your code contains constructs that are defined as having “unspecified behaviour” at least you can bask in the fact it is not defective; however, don’t be too complacent. It works today and will behave as you expect but you need to have your wits about you. If your environment changes then, so too, might the behaviour of your code.

Avoid relying on unspecified. Avoid it at all costs. If you need to do something that is platform specific consider using a quality library, such as Boost, that provides an abstraction between your code and the task you are looking to achieve. In this way your code remains robust and the problem of ensuring unspecified behaviour works as expected is up to the library publisher.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s