Saturday, June 07, 2008

Hell is Other People's Code

My current assignment has me analyzing code for an embedded system that been under development for 15 years.

There is a lot of money invested in the development of this software.

After 15 years of blood, sweat, and tears, the embedded system now needs modern hardware. But what to do about the very expensive software? Start over or devise some way to migrate the code base so that it may execute on the new hardware?

The idea of throwing away 15 years of software development is hard to swallow - so my task is to come up with a reuse/migration strategy.

But wait, there is more.
1. The software is very brittle and cannot accommodate modification.
2. Even basic maintenance activities are becoming a risky proposition.
3. The software is monolithic with no obvious means of splitting functionality into subsystems.

A secondary goal of the system modernization is to partition the software to increase robustness, maintainability and modifiability.

So I begin to look at the code. And to be honest, I was humbled by its cleanliness. The code was well commented, and consistent naming conventions and styles were applied throughout. It was the code of experienced software developers.

So let us take a step back. The tool I am employing in my analysis is to judge the competence of the engineers that created this troubled software. I think it is fair to carefully judge software developer competence as an analysis tool - but often there is more to the story than what meets the analyst's eye. For example most software systems are developed under the duress of schedule constraints - "trying to pack ten pounds of crap into a five pound bag". But I digress - back to my story.

The "coding in the small" looks good but I had already suspected that the problem existed in the design, or more specifically, how the software modules interact with one another. Let's put the software designer under the microscope.

Now the problem became very evident, many software modules where highly dependent on many other software modules. To be specific, it was easy to find software modules that were dependent on 50 other modules. It was impossible to create a informative diagram showing the dependencies in any type of meaningful way. I also searched for a more specific way to classify or present this problem to management but this also escaped me. This software simply was a rats nest of overly coupled modules that would continue to challenge our efforts to meet our goals.

So now I have indicted the software designer, hopefully he has made a clean get-a-way after 15 years. But now I need a motive - how could a software engineer let this happen? Lets travel back in time to 15 years ago and see what was going on. I remember those days and I remember that object-oriented techniques where still a little esoteric to the average developer. Many developers failed to make the leap to OO and many projects suffered through initial forays into OO. The main problem with OO in those days was in the development of frameworks and architectures. OO gave us the ability to define code modules that mapped to objects in our domain but the ability to create the executive sections of our software were often not understood and under-designed.

None the less, our indicted software designer still should have used the most basic and powerful software design technique, the idea of layering. But no, under analysis, the software has no structure.

The trial of the software designer continues. In defense of the designer, the software has been segmented into a multitude of modules that model the domain - but somehow this act of decomposition has created a mess. Let me introduce exhibit A, the programming language.

The system was programmed in the original version of Ada, the programming language developed for the U.S. Department of Defense in the late 1970's. Ada has been upgraded several times with the most notable upgrade coming in 1995 and termed Ada95, the original Ada is usually called Ada83.

Ada was specifically created to solve the so-called software crisis that plagued the development of large software systems. Yet here was a large software system in crisis, developed using Ada.

So now I will turn the glare of the spotlight away from the software designer and place it on this unpopular programming language. Now I must confess that I am not an "Ada man". I have worked on Ada projects for short terms but I have never become enamored with the language, although I know a few who have.

I looked-up a few of my (with all due respect) "grey haired" colleagues to borrow some manuals on Ada83, not Ada95 because I wanted to know what Ada developers were thinking back during the early days of the system's development. My Ada-savvy co-workers were eager to supply me with the manuals and to share a few thoughts with me.

Now Ada83 is not object-oriented but instead is object-based. Actually Ada83 is a procedural programming language similar to C. One main difference is that Ada includes the concept of a "package". The Ada package abstracts the "file module" and has some very powerful and useful capabilities. It is important to note that the Ada package is not a type like a Java or C++ class is a type. A package is used to define types, functions and variables - much like a class, but with the distinction that only one instance of a package exists in the program. A variable declared within a package is equivalent to a class-wide variable in Java or C++ - i.e a static variable.

My object-oriented mind now saw the embedded software system as ~500 singleton software classes - with public variables - mostly interconnected to one another. Could it be that the system's engineers did not really understand the object-based, procedural nature of the Ada language and used the Ada package as some type of brain-dead object-oriented class instead of a very powerful file module abstraction?

So let's be clear, in the object-based Ada language, the object is data defined, instantiated and processed by functions in the package, and not the package itself.

Interestingly, it all goes back to some of my previous musing of how State Stinks! Certainly the most stinkiest form of state would be static, globally accessible state.

I could go on but I have said all that I need to say about this matter. And I release the incarcerated designer from his cell, we all have skeletons in our closet and besides my job is not to find out what and why but how. How to fix this software and move it into the future. So far I am the one who is falling short of his goals.

But such is life in the hell of other people's code.

Friday, February 22, 2008

Bottom-Up Software Design is for Sheep

Considering the devisive issues of our day: Vim vs Emacs, Ruby vs Python, .Net vs Java, Mac vs PC, and REST vs SOAP, there are plenty of opportunities to spark a holy war. Lets consider another polarizing issue:

Is it better to design and develop software from the top-down or from the bottom-up?
I am fortunate to have the opportunity to work across a wide variety projects and I am always curious to discover how my newly met colleagues are inclined regarding design. This evening I came across a compelling blog, by Gustavo Duarte, that took a stance opposed to my own. Here is my take on the issue.

Design by decomposition is the quintessential technique of the engineer

One of Gustavo's main points is that Feynman believed that software engineering has much in common with the other engineering disciplines. Considering this point, the most important concept in engineering is modularization. Most products and devices are made up of replaceable parts. Modularization is achieved by decomposing a system into sub-systems. System decomposition is top-down design.

Certainly we can agree that some of the first decisions we might make include the selection of a platform (Linux, Windows), technology stack (LAMP, Java, .NET), or language/framework (Ruby on Rails, Django).

Our choices will determine the composition of our system, how functionality is divided among system components, and usage of key patterns (e.g Model-View-Controller).

Do we even realize that we are doing top-down design?

Perhaps the availability of powerful platforms and frameworks is removing the importance of the top-down design perspective from our collective conscience.
None-the-less, software system design by decomposition into sub-systems implies a top-down approach to design.

The Space Shuttle Challenger was not a victim of top-down design.

Gustavo's blog walks us through some evidence that suggests that Dr. Feynman believed that top-down design doomed Challenger. The Challenger Disaster illustrates that a likely point of system failure is at sub-system interfaces. I don't see this as an indictment against top-down design.

For a analysis of the mulitude of events that led to the Challenger disaster, I recommend the book NO DOWNLINK. Essentially, the Challenger and her crew where the victims of pork-barrel politics, post-space race budget cuts, and the Challenger Syndrome.

Poor design choices such as the use of Solid Rocket Boosters (SRBs) and then the segmenting of the SRBs to aid in transportability where driven more by budgetary and political concerns then by faulty engineering practices.

Defending up-front software design

Another point made by Gustavo is that "Big up-front design is foolish". The word "Big" implies a non-incremental approach. Up-front design is not a foolish idea. However developing software in a non-incremental fashion (i.e. the waterfall method) is a bad idea.

Top-down software design leads to effectivly modularized code that fosters many good things such as reuse, testability, and maintainability.
Gustavo drags UML into this discussion and I agree with some of his points. The idea of designers churning out UML blue-prints to throw "over the wall" to "implementors" has no appeal to me and in fact, simple does not work.

UML is not implicitly evil and the use of UML does not have to imply some type of bureaucratic software factory where coder slaves toil away, mindlessly "implementing" designs.

UML is just a tool that lets you work on your system at a high level. UML is just a white board.

Using UML to create diagrams (which are essentially pictures) to foster communication is a beautiful thing.
To summarize, UML should be used to open a new communication channel, never to replace face-to-face discussion.

Risk

The main problem with up-front design from a developer's perspective is that it doesn't address many of the underlying risks to the success of the project. The term "risk" is a nice way to articulate that feeling you get in your stomach when you know you are doomed.

The best way to handle "risk" is to start working on the risky parts of the system as soon as possible, so that you have plenty of time to handle all the foo bar.

Risk is what leads developers to favor the bottom-up approach. This is because the risky sections of a system exist deep within the system and at sub-system interfaces such as APIs.
Up-front design may add risk because some design decisions are based on assumptions that can only be validated with working code.

Managing Risk

All forms of engineering require lab testing, research and development and prototyping to determine how elements of a system will actually perform and software is no different.

Risk-reduction prototyping is a technique used to mitigate risks, validate design assumptions, and to develop techniques that will be used during the production of a software system.


Conclusion

As software developers, we are also software users. We use frameworks and tool-sets that allow us to focus on the specific application that we are creating. If we don't maintain the ability to design software from the top down, we function more as users and less as developers. We become in effect, sheep walking the well-worn path instead of being being the pioneering trail-blazers we had hoped to be.

I believe that a pragmatic combination of both top-down (up-front design) and bottom-up (risk-reduction prototyping) techniques will greatly increase a software project's chances for success.

Dr. Feynman believed that software engineer has much in common with the other engineering disciplines. This is certaintly true, but 'how much' is less certain. Thinking of software engineering as just another engineering discipline comes with its own pitfalls.

What is your take on this topic?


Thursday, January 10, 2008

Erlang and Tail Recursion Part II - Still Living, Still Dreaming

In a previous post, I described how I experienced tail recursion and tail call optimization first hand. I have learned to accept that the Erlang compiler can detect when recursion can be replaced with iteration, preventing stack growth and eventual overflow (which means "hard crash"). The idea is that developers can stay within the language's functional "state-less" paradigm and let the compiler do the messy work.

I spent some time dissecting a very interesting Erlang example that I found on the web at builder.au. There are a lot of interesting nuggets in this example, requiring a little time with the Erlang reference manuals.

However there was one, seemingly innocuous passage of code, that made me realize that I had a few more lessons to learn regarding Tail (or more commonly, Last) Call Optimization (LCO).

Note that while the code is from builder.au, the comments are mine.



Function searchNode

searchNode(Name, LinkDict) ->
receive
{link, ToName, ToPid} ->
% Tail recursion that will be Last Call Optimized
searchNode(Name, dict:store(ToName, ToPid, LinkDict));
{search, Path} ->
Free = notIn(Name, Path),
if
Free ->
lists:foreach(
fun(X)->
{_, Pid} = X,
Pid ! {search, Path++[Name]}
end,
dict:to_list(LinkDict));
true -> true
end,
% Tail recursion that will be Last Call Optimized
searchNode(Name, LinkDict);
{endPoint} ->
% The {endPoint} message identifies that the node
% 'embodied' by thisfunction is the 'target' of
% the search, requiring different message handling,
% and therefore a separate function 'last'.
% But this is not tail recursion?
last(Name, LinkDict)
end.


Function searchNode is the function that both represents a node in a graph and an Erlang process. Once this function is initially invoked, it awaits a message from another process. Upon receiving and handling a message, it invokes itself 'tail recursively' to continue as part of the program and graph.

Except when it receives the endPoint message. In this case another function is invoked, the last function. Message endPoint is sent during the initial phase of a search, to the node that is the target of the search. In this case the node does not propagate the search but identifies when the search has completed. Essentially the "endPoint" node/process requires different behavior, therefore control is transferred to a different function.

So what happens when the search is complete? Does last return and if so, wouldn't searchNode return - ending the process/node it represents?

Note that while the code is from builder.au, the comments are mine.



Function last

last(Name, LinkDict) ->
receive
{link, ToName, ToPid} ->
% Tail recursion that will be LCO
last(Name, dict:store(ToName, ToPid, LinkDict));
{search, Path} ->
% This is the message of interest for this
% function. This indicates that this node was
% found by the search, displays the results and -
% returns control back to searchNode!
io:format("Found path: ~w~n", [Path++[Name]]),
searchNode(Name, LinkDict);
{endPoint} ->
% Probably just handles the off-chance corner-case
% in which a node is identified as the target for a
% search while in the process of being currently
% searched. This may not work correctly.
searchNode(Name, LinkDict)
end.

Interestingly, when last has achieved its purpose it invokes function searchNode, returning control back to the process/node's primary function.

To my imperative mind this is just wrong

Think about the crazy call stack we are creating that will never unwind, eventually crashing the program. Perhaps this is handled by Last Call Optimization?

Actually the answer was right at my finger-tips, in Concurrent Programming in Erlang Part I (search on LCO).

Essentially LCO handles functions that are mutually recursive as well as the classic 'tail recursion' case.

Some final thoughts.

This example is interesting to me because it shows a situation that requires a node (or some type of entity) be assigned a special status. For example, in imperative languages we may choose to set a flag. In Erlang we wish to remain state-less, it is improper and probably very difficult to set a flag. So instead, we request a different behaviour, which is exactly what we wanted in the first place.

Tuesday, January 08, 2008

Thinking in Erlang - Coding in Everything Else

With apologies to Bruce Eckel.

Concurrent Programming in Erlang Part I uses a Binary Tree implementation to demonstrate Erlang's tuple data type.



Erlang

insert(Key, Value, nil) ->
{Key, Value, nil, nil};
insert(Key, Value, {Key,_,Smaller, Bigger}) ->
{Key, Value, Smaller, Bigger};
insert(Key, Value, {Key1,V, Smaller, Bigger}) when Key < Key1 ->
{Key1, V, insert(Key, Value, Smaller), Bigger};
insert(Key, Value, {Key1, V, Smaller, Bigger}) when Key > Key1 ->
{Key1, V, Smaller, insert(Key, Value, Bigger)}.

I was intrigued by this elegant approach to the Binary Tree.

Erlang's tuple data type collects the elements that make up a node including the node's key, value, and the references to its children nodes.

The first clause of the insert function is the "base case" that actually creates a new node. The second clause handles the case when the key is found and the value is simply replaced. Clauses three and four handle the cases that require traversal down through the tree.

Notice how the recursion is so wonderfully encoded. This is much cleaner and intuitive than the imperative solutions that I have worked with.



Python

def insert(key, value, tree):
if tree == None:
return (key, value, None, None)
(key1, v1, smaller, bigger) = tree
if key == key1:
return (key1, value, smaller, bigger)
if key1 > key:
return (key1, v1, insert(key, value, smaller), bigger)
return (key1, v1, smaller, insert(key, value, bigger))

I would guess that this elegant Binary Tree implementation is common to functional programming languages in general. Python has some functional language capability and has a tuple data type. I was able to implement the "elegant" Binary Tree using Python.



JavaScript

function insert(key, value, tree) {
if(tree == null)
return {key:key, value:value, smaller:null, bigger:null};
if(tree.key == key){
tree.value = value;
return tree;
}
else if(key < tree.key)
return {key:tree.key, value:tree.value,
smaller:insert(key, value, tree.smaller),
bigger:tree.bigger};
else
return {key:tree.key, value:tree.value,
smaller:tree.smaller,
bigger:insert(key, value, tree.bigger)};
}

I have read that JavaScript is classified as a functional programming language and its object data type looks syntactically similar to Erlang's tuple. I was able to implement the "elegant" Binary Tree using JavaScript. For more fun, I created an interactive Binary Tree web page.