Emma Marie McClellan

Theresa Lou Epley

Noah Roscoe Ray Hardcastle

2014

Jun

09

Software Development: Plan for Things to Go Wrong

By Duane

his is not a software blog.  Given that the number of them already on the Internet (like StackOverflow and Developer.com) may best be expressed in powers of 10, it is unlikely that I need to add another one.  However, if you have read Neuromancer, The Adolescence of P1, or even "The Nine Billion Names of God," you know already that without software, today's science fiction would be a vapid shadow of what it actually is.  So it won't hurt for me to advertise that, being a scientist and an engineer in addition to a writer, when I write about science and engineering, I have at least some idea what I'm talking about.

I haven't thought much about the generalities of software development for the last few years, as I have been in an academic world, immersed in MATLAB, which only marginally suits the discussion of software in generalities.  Python more so, but that's another story.  But given that I wrote my first code at the age of 12, in assembly language, for a hypothetical processor that I designed myself, it is a good bet that over the years, I've thought about it a great deal.

Software development monographs

I've written a few monographs on software development, which, unfortunately, have been lost.  That is not a horrible deprivation to the development world, because other people have written much the same thing, and their works are available at Amazon.

One monograph was how imposing methods to evaluate software development productivity always impairs productivity.  It encourages worse code.  When such measurements are in place, the programmers start coding to the metric, and not to the objective.  Or rather, the metric becomes the objective — much as many school teachers now teach to the test — and I have seen no metric yet that can accurately measure software productivity.  The latter part of the monograph shows that such a metric is possible, if you don't mind inverting 100x100 matrices all day long, but that it is virtually impossible to get it right, and even if it is right, the effort invested by management far outweighs any benefit from measuring productivity.  Even if it worked, the psychology of Big Brother is going to destroy enthusiasm.

Another monograph dealt with writing good software, not just software that works but software that is understandable and therefore maintainable.  I remember I used a number of terms in a nonstandard way, like closure. which already had another meaning.  Why someone had chosen that particular term for that particular other usage was a mystery for a long time, until it dawned on me that it occurred during a decade when hallucinogens were popular.

But what I want to focus on here is not closure, but sequential congruity.

Software development and sequential congruity

Sequential congruity is the premise that, allowing for function calls, the sequence of instructions executed by the computer matches the sequence of statements in the program.  This might seem obvious, but some of the most severe and puzzling bugs I've encountered have come about when the system violated sequential congruity.

How can this happen?  Many ways.  One of them is at the behest of optimizing compilers.  You write code one way, and the compiler decides to move things around so that it's more efficient (it isn't always) and so that it's harder to debug.  You sit looking at the machine language disassembly and at the code you wrote, and wonder when universes got switched.  I'm a programmer; I can optimize my own code.  I prefer to turn off optimization so I don't find myself in Wonderland sometime later.

The event handler trap

But the worst bugs have come not from optimizing compilers.  It's happened to you.  The application is blowing up and you're trying to figure out why.  Let's see ... step through the code ... we update a database record ... and it starts making random files.  The profanities from your lips paint the walls odd shades.  Well, what happens is a violation of sequential congruity.  Someone has attached a weird piece of code to an event handler, and that code does something that triggers a third person's event handler somewhere else, and one by one all the building blocks fall down.

There is no way by looking at your source code that you can figure out what is happening!

That's why we need sequential congruity.  Unfortunately, unlike compiler optimization, we can't just turn off event handlers.  They're far too powerful, and make for cleaner code than we would have without them.  Windows, is pretty much ALL event handlers.  There has to be another way.

Software development and visibility.

This code is visible: you can see it.

Visibility is another term that I used.  It means that all of the source code that makes an application run is visible, i.e., not hidden.  Unfortunately, most of the actual code that runs a Windows application is inside Windows itself, and you're not going to get that code before hell freezes over, which is why I say that Windows will never be a real operating system, because a real operating system has visible source code  The first time I ran BoundsChecker (an apparently deceased memory allocation validator) more than 80% of the memory allocation problems it turned up were within Windows.  I could, at best, eliminate 20% of them.

I've had trouble in Delphi, too.  The internal code is not available, so you spend an hour stepping through it one instruction at a time until it gets back to some code that you or a coworker wrote that has debug information for it.  Then: ahah!  Now we can find the bug.  Unless there's another handler....

Unfortunately, the lack of visibility is not going to go away because proprietary secrets are a religious experience for most vendors, even when it renders their product a pain in the butt.

The flaw in software development psychology

These are contributing factors to the problem, and it takes us back to that first monograph that left you wondering why I even mentioned it.  Productivity.  Productivity isn't just getting code out the door.  It's getting out code that you can fix when it comes back.  That you can modify without breaking things.  That doesn't cost you two months' debugging time for something that took three days to write.

This is a problem I have with Agile and similar "rapid development" methodologies.  I suspect that they are most beneficial to their creators who are sitting back collecting royalties while companies are stumbling over each other to grab onto the latest fad.  They focus on quick results, not sound results.

The factors that make actual software productivity so difficult to measure is that every little pieces impacts every other little piece.  How you write code today will impact how someone else adds a feature or tracks down a bug two years down the road, and that is something that counting lines of code per day is never going to accomplish.

And here is the flaw in our psychology: we focus so much on how the software is supposed to work that we neglect how it is supposed to break.

There is a solution

I recall a problem that I ran into at Heidelberg Web Systems.  They had an application in C++ that would crash every now and then, and as the story went, three developers before me tried to track it down and failed. I don't remember the whole story now, so please be tolerant.  What I do remember is that I wrote a stack trace class similar to Java's that when provoked by the top-level exception handler would tell me exactly where the exception was thrown and how it got there.

Problem fixed that day.

Event handlers are a similar case.  We need to know not only what happens in the event handler, but exactly how it got there, even if it has to go through the operating system in the process.  I imagine there is a way to do that in C++; I haven't thought but about it yet.  But that's not the point.

The point is that we need a different mind set.  When we start software development, from the ground up, we need the hooks built in to let us track down elusive bugs like these.  We need that information, because we can't debug without it.  Sure, they can be in compiler switches or something and so excised for the production build, but they have to be there!

Failing in this regard is a big ding in overall programmer productivity.  What management has to do is not measure it, but encourage it.

/caption]

Visibility is another term that I used.  It means that all of the source code that makes an application run is visible, i.e., not hidden.  Unfortunately, most of the actual code that runs a Windows application is inside Windows itself, and you're not going to get that code before hell freezes over, which is why I say that Windows will never be a real operating system, because a real operating system has visible source code  The first time I ran BoundsChecker (an apparently deceased memory allocation validator) more than 80% of the memory allocation problems it turned up were within Windows.  I could, at best, eliminate 20% of them.

I've had trouble in Delphi, too.  The internal code is not available, so you spend an hour stepping through it one instruction at a time until it gets back to some code that you or a coworker wrote that has debug information for it.  Then: ahah!  Now we can find the bug.  Unless there's another handler....

Unfortunately, the lack of visibility is not going to go away because proprietary secrets are a religious experience for most vendors, even when it renders their product a pain in the butt.

The flaw in software development psychology

These are contributing factors to the problem, and it takes us back to that first monograph that left you wondering why I even mentioned it.  Productivity.  Productivity isn't just getting code out the door.  It's getting out code that you can fix when it comes back.  That you can modify without breaking things.  That doesn't cost you two months' debugging time for something that took three days to write.

This is a problem I have with Agile and similar "rapid development" methodologies.  I suspect that they are most beneficial to their creators who are sitting back collecting royalties while companies are stumbling over each other to grab onto the latest fad.  They focus on quick results, not sound results.

The factors that make actual software productivity so difficult to measure is that every little pieces impacts every other little piece.  How you write code today will impact how someone else adds a feature or tracks down a bug two years down the road, and that is something that counting lines of code per day is never going to accomplish.

And here is the flaw in our psychology: we focus so much on how the software is supposed to work that we neglect how it is supposed to break.

There is a solution

I recall a problem that I ran into at Heidelberg Web Systems.  They had an application in C++ that would crash every now and then, and as the story went, three developers before me tried to track it down and failed. I don't remember the whole story now, so please be tolerant.  What I do remember is that I wrote a stack trace class similar to Java's that when provoked by the top-level exception handler would tell me exactly where the exception was thrown and how it got there.

Problem fixed that day.

Event handlers are a similar case.  We need to know not only what happens in the event handler, but exactly how it got there, even if it has to go through the operating system in the process.  I imagine there is a way to do that in C++; I haven't thought but about it yet.  But that's not the point.

The point is that we need a different mind set.  When we start software development, from the ground up, we need the hooks built in to let us track down elusive bugs like these.  We need that information, because we can't debug without it.  Sure, they can be in compiler switches or something and so excised for the production build, but they have to be there!

Failing in this regard is a big ding in overall programmer productivity.  What management has to do is not measure it, but encourage it.

Comments

There are no comments for this post.

You must be logged in to post a comment.