Wednesday, January 28, 2009

Friqing Out with Python Closures

You have to love snow days. After spending an hour shoveling snow this morning, I fired up my lap-top for a guilt-free afternoon of tinkering around with some code. I opened up some Python code that I had coded last year when I was discovering functional programming with Erlang and relating it to languages that I was familiar with.

Python is a great language for exploration. Back in the old days I did a lot of programming using C++ and Microsoft's Component Object Model (COM). C++ COM is a fairly complex technology so I found it preferable to prototype object-oriented designs using Java. The Java language's interface mechanism was a good analog for COM's interface-based programming paradigm. Java was a sexy thing compared to COM and Java was the lingua-franca of object oriented design during this period.

Then I discovered Python and found that it featured a familar approach to object orientation and it looked like it would support rapid prototyping and concept exploration.

Python proved to be a excellent language, but it did not serve well as a prototypical language for C++ applications. In the C++ language, data types are everything. Most of the advanced features of the language, such as templates and classes, provide ways to manage functionality across data types. The C++ compiler is the ultimate master and the programmer is subjected to its rule of type.

Python cannot be used to prototype solutions to C++ problems because the problems do not appear in Python.

Most circa 2000 C++ programmer were ill-equipped to absorb Python. To absorb Python meant to be influenced and changed by Python - to refute idioms previously held sacrosanct.

To close the loop, a next logical step would be to gain perspective by relating Python to functional programming languages and not just to Java and C++.

My Python Data Structures Project

I go through a cycle every now and again that involves Python and data structures. A good data structure to code is the Priority Queue. The binary heap-based Priority Queue features a tree data structure, implemented using an array and clever array index manipulations. A binary min heap is shown below. Essential a binary heap must always be correctly ordered, children must always be greater than its parents (in the case of the binary min heap).

Add Image

The binary heap is stored in a simple array. Algorithms maintain the relations between array elements as shown below. One exception is that it is easier to implement the array-based binary heap if the first element is stored at index 1 in the array.





One reason to code up a Priority Queue is that it is a building-block data structure with multiple uses. The Priority Queue conceptually fits well into an abstract data type (ADT) such as a C++/Java/Python class. In fact there are Priority Queue ADTs in all three languages. My recent investigations into functional programming led me to create a Priority Queue using lexical closures. My original class-based Priority Queue was named priq. Thus the functional-closure based version became friq. And indeed it is a friq.

The friq



The screen shot above shows the definition of friq which is function with a single parameter lt. Parameter lt is a comparison "less-than" function which is easier demonstrated than explained. It will be demonstrated in the next section. Note that friq is a function definition that encloses other function definitions, and a rather strange nested list heef. The functionality exported by friq is done with its return statement (see below).



The functions enqueue and dequeue shown above are the two functions, enclosed by friq, that are exported to the outside world using friq's return statement.




Functions down_heap and up_heap implement the heavy lifting algorithms for the priority queue and are not exported to the outside world.


Finally we have the enclosing function friq returning a tuple containing the functions we wish to export.

Demonstrating the friq

The following code demonstrates usage of the friq. A list d is loaded with some random integers. The friq is invoked returning a tuple containing a function to enqueue values and another for dequeueing. List d is iterated and its values are enqueued. Then the friq is drained by dequeueing.




The usage of friq appears reasonably simple and straighforward. Note friq's lt parameter being passed a Python lambda function. The results are seen below.





Next we have a slightly more complex demonstration in which a tuple containing a number and a string is enqueued. Note that the code that handles dequeueing has to do extra work to expand the tuple. Also note that the lambda function is slightly more complex in order to handle the tuple. The index 0 signifies that the sort will be on the tuple's number.



The next demonstration is identifical to the previous with the exception that the lamda function uses an index of 1 which signifies that the sort will be on the tuple's string.



And finally, a display of the results of both demonstrations that involve the (number, str) tuple.




Here is the complete listing of friq.py


friq.py

#! /usr/local/bin/python

def friq(lt):
"""
function friq that returns lexical closure function objects.
param lt - a 'less than' function or lambda
"""
heef = [[None]]

def enqueue(x):
"""
enqueue - visible function
Add to the priority queue
"""
heap = heef[0]
heap.append(x)
up_heap(heap)

def dequeue():
"""
dequeue - visible function
Get the highest priority elemement
"""
value = None
heap = heef[0]
if lenf() > 0:
# The dequeue value is taken from the 2nd element in the array.
# The first element is not used to ease index arithmetic.
value = heap[1]
if lenf() > 1:
# The last element value is copied to the first position.
heap[1] = heap[-1]
# The last value is deleted to reduce the size of the pq by one.
del heap[-1]
if lenf() > 1:
# The val at heap[1] is moved down heap to maintain order.
down_heap(heap)
return value

def heap():
"""
heap - non-visible function
Returns the underlying array-based binary heap.
Can be made visible for debug.
"""
return heef[0]

def lenf():
"""
lenf - non-visible function
Returns the size of the underlying array-based binary heap.
Can be made visible for debug.
"""
return len(heef[0]) - 1

def down_heap(heap):
"""
down-heap - non-visible function
The first element is moved down heap as required
to satisfy heap order.
"""
# px is an index to a bi heap parent
# cx is an index to a bi heap child
px = 1
v = heap[px]
while px <= (len(heap)-1)//2:
# calc the index of the child
cx = px + px
# find the index of the min of the two children
# (if there are two).
if cx < len(heap) - 1 and \
lt(heap[cx + 1], heap[cx]):
cx = cx + 1
# make the comparison - if v is higher pri - we are done.
if lt(v, heap[cx]):
break;
else:
# move the childs value to the parent
heap[px] = heap[cx]
# make the parent index that of the child.
px = cx
# finally the v is set to the current parent
heap[px] = v

def up_heap(heap):
"""
up-heap - non-visible function
The last element is moved up heap to satisfy the heap order.
"""
# cx//2 idenifies cx's parent in the bi heap.
cx = len(heap) - 1
v = heap[cx]
while cx//2 > 0 and lt(v, heap[cx//2]):
heap[cx] = heap[cx//2]
cx = cx//2
heap[cx] = v

# the enclosing function returning functions that are to be visible
return (enqueue, dequeue)

Saturday, June 07, 2008

Hell is Other People's Code

My current assignment has me analyzing code for an embedded system that is 15 years old. Not only is the system and software 15 years old, but the software has been continously under development for 15 years.

There is a lot of money invested in the development of this software.

After 15 years of blood, sweat, and tears, the embedded system now needs modern hardware. But what to do about the very expensive software? Start over or devise some way to migrate the code base so that it may execute on the new hardware?

The idea of throwing away 15 years of software development is hard to swallow - so enters myself - Pete who likes to Think [Way Too Much].

But wait, there is more.
1. The software is very brittle and cannot accomidate modification.
2. Even basic maintenance activites are becoming a risky proposition.
3. The software is monolithic with no obvious means of splitting functionality into subsystems.

A secondary goal of the system modernization is to partition the software to increase robustness, maintainability and modifiability.

So I begin to look at the code. And to be honest, I was humbled by its cleanliness. The code was well commented, and consistent naming conventions and styles were applied throughout. It was the code of experienced software developers.

So let us take a step back. The tool I am employing in my analysis is to judge the competence of the engineers that created this troubled software. I think it is fair to carefully judge software developer competence as an analysis tool - but often there is more to the story than what meets the analyst's eye. For example most software systems are developed under the duress of schedule constraints - "trying to pack ten pounds of crap into a five pound bag". But I digress - back to my story.

The "coding in the small" looks good but I had already suspected that the problem existed in the design, or more specifically, how the software modules interact with one another. Let's put the software designer under the microscope.

Now the problem became very evident, many software modules where highly dependent on many other software modules. To be specific, it was easy to find software modules that were dependent on 50 other modules. It was impossible to create a informative diagram showing the dependencies in any type of meaningfull way. I also searched for a more specific way to classify or present this problem to management but this also escaped me. This software simply was a rats nest of overly coupled modules that would continue to challenge our efforts to meet our goals.

So now I have indicted the software designer, hopefully he has made a clean get a way after 15 years. But now I need a motive - how could a software engineer let this happen? Lets travel back in time to 15 years ago and see what was going on. I remember those days and I remember that object-oriented techniques where still a little esoteric to the average developer. Many developers failed to make the leap to OO and many projects suffered through initial forays into OO. The main problem with OO in those days was in the development of frameworks and architectures. OO gave us the ability to define code modules that mapped to objects in our domain but the ability to create the executive sections of our software were often not understood and under-designed.

None the less, our indicted software designer still should have used the most basic and powerful software design technique, the idea of layering. But no, under analysis, the software has no structure.

The trial of the software designer continues. In defense of the designer, the software has been segmented into a multitude of modules that model the domain - but somehow this act of decomposition has created a mess. Let me introduce exhibit A, the programming language.

The system was programmed in the original version of Ada, the programming language developed for the U.S. Department of Defense in the late 1970's. Ada has been upgraded several times with the most notable upgrade coming in 1995 and termed Ada95, the original Ada is usually called Ada83.

Ada was specifically created to solve the so-called software crisis that plagued the development of large software systems. Yet here was a large software system in crisis, developed using Ada.

So now, I will let the software designer cool his heels in the clink while I turn my judgement to this rather unpopular language. Now I must confess that I am not an "Ada man". I have worked on Ada projects for short terms but I have never become enamored with the language, although I know a few who have.

I looked up a few of these (with all due respect) "grey haired" colleagues to borrow some manuals on Ada83, not Ada95 because I wanted to know what Ada developers were thinking back during the early days of the system's development. My Ada-savvy co-workers were eager to supply me with the manuals and to share a few thoughts with me.

Now Ada83 is not object-oriented but instead is object-based. Actually Ada83 is a procedural programming language similar to C. One main difference is that Ada includes the concept of a "package". The Ada package abstracts the "file module" and has some very powerful and useful capabilities. It is important to note that the Ada package is not a type like a Java or C++ class is a type. A package is used to define types, functions and variables - much like a class, but with the distinction that only one instance of a package exists in the program. A variable declared within a package is equivalent to a class-wide variable in Java or C++ - i.e a static variable.

My object-oriented mind now saw the embedded software system as ~500 singleton software classes - with public variables - mostly interconnected to one another. Could it be that the system's engineers did not really understand the object-based, procedural nature of the Ada language and used the Ada package as some type of brain-dead object-oriented class instead of a very powerful file module abstraction?

So let's be clear, in the object-based Ada language, the object is data defined, instantiated and processed by functions in the package, and not the package itself.

Interestingly, it all goes back to some of my previous musing of how State Stinks! Certainly the most stinkiest form of state would be static, globally accessible state. I have to hold my nose!

I could go on but I have said all that I need to say about this matter. And I release the incarcerated designer from his cell, we all have skeletons in our closet and besides my job is not to find out what and why but how. How to fix this software and move it into the future. So far I am the one who is falling short of his goals.

But such is life in the hell of other people's code.

Friday, February 22, 2008

Bottom-Up Software Design is for Sheep

Considering the devisive issues of our day: Vim vs Emacs, Ruby vs Python, .Net vs Java, Mac vs PC, and REST vs SOAP, there are plenty of opportunities to spark a holy war. Lets consider another polarizing issue:

Is it better to design and develop software from the top-down or from the bottom-up?
I am fortunate to have the opportunity to work across a wide variety projects and I am always curious to discover how my newly met colleagues are inclined regarding design. This evening I came across a compelling blog, by Gustavo Duarte, that took a stance opposed to my own. Here is my take on the issue.

Design by decomposition is the quintessential technique of the engineer

One of Gustavo's main points is that Feynman believed that software engineering has much in common with the other engineering disciplines. Considering this point, the most important concept in engineering is modularization. Most products and devices are made up of replaceable parts. Modularization is achieved by decomposing a system into sub-systems. System decomposition is top-down design.

Certainly we can agree that some of the first decisions we might make include the selection of a platform (Linux, Windows), technology stack (LAMP, Java, .NET), or language/framework (Ruby on Rails, Django).

Our choices will determine the composition of our system, how functionality is divided among system components, and usage of key patterns (e.g Model-View-Controller).

Do we even realize that we are doing top-down design?

Perhaps the availability of powerful platforms and frameworks is removing the importance of the top-down design perspective from our collective conscience.
None-the-less, software system design by decomposition into sub-systems implies a top-down approach to design.

The Space Shuttle Challenger was not a victim of top-down design.

Gustavo's blog walks us through some evidence that suggests that Dr. Feynman believed that top-down design doomed Challenger. The Challenger Disaster illustrates that a likely point of system failure is at sub-system interfaces. I don't see this as an indictment against top-down design.

For a analysis of the mulitude of events that led to the Challenger disaster, I recommend the book NO DOWNLINK. Essentially, the Challenger and her crew where the victims of pork-barrel politics, post-space race budget cuts, and the Challenger Syndrome.

Poor design choices such as the use of Solid Rocket Boosters (SRBs) and then the segmenting of the SRBs to aid in transportability where driven more by budgetary and political concerns then by faulty engineering practices.

Defending up-front software design

Another point made by Gustavo is that "Big up-front design is foolish". The word "Big" implies a non-incremental approach. Up-front design is not a foolish idea. However developing software in a non-incremental fashion (i.e. the waterfall method) is a bad idea.

Top-down software design leads to effectivly modularized code that fosters many good things such as reuse, testability, and maintainability.
Gustavo drags UML into this discussion and I agree with some of his points. The idea of designers churning out UML blue-prints to throw "over the wall" to "implementors" has no appeal to me and in fact, simple does not work.

UML is not implicitly evil and the use of UML does not have to imply some type of bureaucratic software factory where coder slaves toil away, mindlessly "implementing" designs.

UML is just a tool that lets you work on your system at a high level. UML is just a white board.

Using UML to create diagrams (which are essentially pictures) to foster communication is a beautiful thing.
To summarize, UML should be used to open a new communication channel, never to replace face-to-face discussion.

Risk

The main problem with up-front design from a developer's perspective is that it doesn't address many of the underlying risks to the success of the project. The term "risk" is a nice way to articulate that feeling you get in your stomach when you know you are doomed.

The best way to handle "risk" is to start working on the risky parts of the system as soon as possible, so that you have plenty of time to handle all the foo bar.

Risk is what leads developers to favor the bottom-up approach. This is because the risky sections of a system exist deep within the system and at sub-system interfaces such as APIs.
Up-front design may add risk because some design decisions are based on assumptions that can only be validated with working code.

Managing Risk

All forms of engineering require lab testing, research and development and prototyping to determine how elements of a system will actually perform and software is no different.

Risk-reduction prototyping is a technique used to mitigate risks, validate design assumptions, and to develop techniques that will be used during the production of a software system.


Conclusion

As software developers, we are also software users. We use frameworks and tool-sets that allow us to focus on the specific application that we are creating. If we don't maintain the ability to design software from the top down, we function more as users and less as developers. We become in effect, sheep walking the well-worn path instead of being being the pioneering trail-blazers we had hoped to be.

I believe that a pragmatic combination of both top-down (up-front design) and bottom-up (risk-reduction prototyping) techniques will greatly increase a software project's chances for success.

Dr. Feynman believed that software engineer has much in common with the other engineering disciplines. This is certaintly true, but 'how much' is less certain. Thinking of software engineering as just another engineering discipline comes with its own pitfalls.

What is your take on this topic?


Thursday, January 10, 2008

Erlang and Tail Recursion Part II - Still Living, Still Dreaming

In a previous post, I described how I experienced tail recursion and tail call optimization first hand. I have learned to accept that the Erlang compiler can detect when recursion can be replaced with iteration, preventing stack growth and eventual overflow (which means "hard crash"). The idea is that developers can stay within the language's functional "state-less" paradigm and let the compiler do the messy work.

I spent some time dissecting a very interesting Erlang example that I found on the web at builder.au. There are a lot of interesting nuggets in this example, requiring a little time with the Erlang reference manuals.

However there was one, seemingly innocuous passage of code, that made me realize that I had a few more lessons to learn regarding Tail (or more commonly, Last) Call Optimization (LCO).

Note that while the code is from builder.au, the comments are mine.



Function searchNode

searchNode(Name, LinkDict) ->
receive
{link, ToName, ToPid} ->
% Tail recursion that will be Last Call Optimized
searchNode(Name, dict:store(ToName, ToPid, LinkDict));
{search, Path} ->
Free = notIn(Name, Path),
if
Free ->
lists:foreach(
fun(X)->
{_, Pid} = X,
Pid ! {search, Path++[Name]}
end,
dict:to_list(LinkDict));
true -> true
end,
% Tail recursion that will be Last Call Optimized
searchNode(Name, LinkDict);
{endPoint} ->
% The {endPoint} message identifies that the node
% 'embodied' by thisfunction is the 'target' of
% the search, requiring different message handling,
% and therefore a separate function 'last'.
% But this is not tail recursion?
last(Name, LinkDict)
end.


Function searchNode is the function that both represents a node in a graph and an Erlang process. Once this function is initially invoked, it awaits a message from another process. Upon receiving and handling a message, it invokes itself 'tail recursively' to continue as part of the program and graph.

Except when it receives the endPoint message. In this case another function is invoked, the last function. Message endPoint is sent during the initial phase of a search, to the node that is the target of the search. In this case the node does not propagate the search but identifies when the search has completed. Essentially the "endPoint" node/process requires different behavior, therefore control is transferred to a different function.

So what happens when the search is complete? Does last return and if so, wouldn't searchNode return - ending the process/node it represents?

Note that while the code is from builder.au, the comments are mine.



Function last

last(Name, LinkDict) ->
receive
{link, ToName, ToPid} ->
% Tail recursion that will be LCO
last(Name, dict:store(ToName, ToPid, LinkDict));
{search, Path} ->
% This is the message of interest for this
% function. This indicates that this node was
% found by the search, displays the results and -
% returns control back to searchNode!
io:format("Found path: ~w~n", [Path++[Name]]),
searchNode(Name, LinkDict);
{endPoint} ->
% Probably just handles the off-chance corner-case
% in which a node is identified as the target for a
% search while in the process of being currently
% searched. This may not work correctly.
searchNode(Name, LinkDict)
end.

Interestingly, when last has achieved its purpose it invokes function searchNode, returning control back to the process/node's primary function.

To my imperative mind this is just wrong

Think about the crazy call stack we are creating that will never unwind, eventually crashing the program. Perhaps this is handled by Last Call Optimization?

Actually the answer was right at my finger-tips, in Concurrent Programming in Erlang Part I (search on LCO).

Essentially LCO handles functions that are mutually recursive as well as the classic 'tail recursion' case.

Some final thoughts.

This example is interesting to me because it shows a situation that requires a node (or some type of entity) be assigned a special status. For example, in imperative languages we may choose to set a flag. In Erlang we wish to remain state-less, it is improper and probably very difficult to set a flag. So instead, we request a different behaviour, which is exactly what we wanted in the first place.

Tuesday, January 08, 2008

Thinking in Erlang - Coding in Everything Else

With apologies to Bruce Eckel.

Concurrent Programming in Erlang Part I uses a Binary Tree implementation to demonstrate Erlang's tuple data type.



Erlang

insert(Key, Value, nil) ->
{Key, Value, nil, nil};
insert(Key, Value, {Key,_,Smaller, Bigger}) ->
{Key, Value, Smaller, Bigger};
insert(Key, Value, {Key1,V, Smaller, Bigger}) when Key < Key1 ->
{Key1, V, insert(Key, Value, Smaller), Bigger};
insert(Key, Value, {Key1, V, Smaller, Bigger}) when Key > Key1 ->
{Key1, V, Smaller, insert(Key, Value, Bigger)}.

I was intrigued by this elegant approach to the Binary Tree.

Erlang's tuple data type collects the elements that make up a node including the node's key, value, and the references to its children nodes.

The first clause of the insert function is the "base case" that actually creates a new node. The second clause handles the case when the key is found and the value is simply replaced. Clauses three and four handle the cases that require traversal down through the tree.

Notice how the recursion is so wonderfully encoded. This is much cleaner and intuitive than the imperative solutions that I have worked with.



Python

def insert(key, value, tree):
if tree == None:
return (key, value, None, None)
(key1, v1, smaller, bigger) = tree
if key == key1:
return (key1, value, smaller, bigger)
if key1 > key:
return (key1, v1, insert(key, value, smaller), bigger)
return (key1, v1, smaller, insert(key, value, bigger))

I would guess that this elegant Binary Tree implementation is common to functional programming languages in general. Python has some functional language capability and has a tuple data type. I was able to implement the "elegant" Binary Tree using Python.



JavaScript

function insert(key, value, tree) {
if(tree == null)
return {key:key, value:value, smaller:null, bigger:null};
if(tree.key == key){
tree.value = value;
return tree;
}
else if(key < tree.key)
return {key:tree.key, value:tree.value,
smaller:insert(key, value, tree.smaller),
bigger:tree.bigger};
else
return {key:tree.key, value:tree.value,
smaller:tree.smaller,
bigger:insert(key, value, tree.bigger)};
}

I have read that JavaScript is classified as a functional programming language and its object data type looks syntactically similar to Erlang's tuple. I was able to implement the "elegant" Binary Tree using JavaScript. For more fun, I created an interactive Binary Tree web page.

Wednesday, December 26, 2007

Erlang and the Towers of Hanoi

My brilliant, 15 year-old son is teaching himself how to program Ruby using Chris Pine's wonderful book, Learn to Program. The book devotes an entire chapter to recursion and I spent an enjoyable evening with my son as we dissected an example program. Pine's example is reminiscent of the Towers of Hanoi, a puzzle that my son has been able to solve since he was 7. So I decided to code up this classic, first in Python and then in Erlang.

Towers of Hanoi - Erlang Style



My son could not believe that this function would solve the puzzle! I used a pencil and graph paper to illustrate an example with 4 plates and was once again amazed by the sheer beauty of this algorithm. One can only marvel as too how someone was able to discover this elegant solution.
It was my son who pointed out the "depth first" nature of the algorithm. The program recurses to a depth determined by the number of plates and then retreats back to the top of the stack of function calls, moving plates as it goes. Each retreat up the stack results in another incursion down to the depth determined by the number of plates. This continues until the original call returns and the program completes.

Essentially the process of recursion lays out a data structure of sorts that establishes a sequence of plate moves.

Notice the line in the code above: managerPid ! {self(), A,C}. This line of code is used to move a plate. The movement of the plate is handled by a separate process that represents a device or maybe even a person responsible for moving the plates.


Function manage_towers, in the code illustration above, executes within the second process and writes the tower movements to the console.

I had some philosophical concerns with the multi-process approach to the Towers of Hanoi. Recursion and distributed computing are two distinct and orthogonal solutions that are based on the same idea, the divide and conquer algorithm. However the solution does have some pragmatic appeal to me and it all begins with the immutability of Erlang's variables.
The thought is that the Towers of Hanoi program cannot actually move the plates, the plates would have to be moved by some entity apart from the program. This allows philosophical head room for both immutable variables and asynchronous calls to a process that records the plate movements.

As I watched the console display the plate movements, my mind reeled as I thought of the program descending and ascending through the call stack, briefly pausing to issue asynchronous messages of instruction to its sister process.



The Towers of Hanoi program is initiated by the code illustration shown above.

Notice that when the call to process_towers returns, the tower manager process (managerPid) is sent a message that indicates that the program has finished.

managerPid ! {self(), finished}

The main process (thread) of the program then blocks at the receive statement, awaiting acknowledgment from the tower manager process. Essentially, the main process is deferring termination until the tower manager process completes.

As you can imagine, the process_towers function computes the solution far quicker than the console can update its display. At some point, the main process waits until the tower manager handles its backlog of messages, in the order in which they arrived.

This is accomplished using Erlang's Mail Box, which probably is implemented as a thread-safe queue.

Many of the popular languages provide thread-safe queue classes but Erlang's Mail Box is built-in, readily available and seamless.

But does it work robustly? I can say yes because of my carelessness.

I tested the program with 10 plates and then decided to really test it with 20 plates. I sat there for a second before I realized what I had done. I believe Towers has an exponential growth rate, therefore 10 plates required 1000 plate movements and 20 plates requires a million plate movements.

I let the program run, all day and into the evening. My son saw the program running and asked if I had mistakenly created an infinite loop.

I ran the concept by him, "If we have 4 plates, it takes 16 moves, 5 plates takes 32 moves and 20 plates takes ...". "One million moves", he immediately replied. "But a million is not a big number for a computer", he said.

We could roughly count the number of plate moves being written to the console and determined that approximately 7 moves where displayed per second.
Therefore it would take about two days for the program to run to completion.

I went to sleep that night, thinking about the many thousands of messages queuing-up in the Tower Manager's Mail Box.

I awoke the next morning to find the program dutifully displaying plate movements. I shut the program down - satisfied that Erlang's messaging system met its challenge.

Saturday, December 15, 2007

Erlang, Scalability - The Next Wave


I had a great time watching the TV series Surface on the SciFi Channel, with my son who is 10. In the final episode, Marine Biologist Dr. Laura Daugherty warns us that Tsunamis waves come in sets.

The first wave of the Erlang Tsunami was realizing Erlang's ability to harness the potential of multi-core CPU's to perform computations, by using concurrently executing threads and processes.

I had not recovered from the first wave when I was hit by the next wave in the set. The second wave is Erlang's ability to scale. Usually, scalability is used as characteristic of an architecture, system or technology. I don't recall if I have seen scalability used to characterize a programming language. Certainly, most programming languages provide features that can be used to develop scalable systems - but what would it take to make the language inherently scalable?

loop(State) ->
receive
{call, From, Request} ->
{Result, State2} = handle_call(Request, State),
From ! Result,
loop(State2);
end.

Function loop(State) waits at the receive statement for a message that matches tuple {call, From, Request}. When a matching message is received. it is handled by the statement that calls the aptly named function, handle_call. From is messaged with the Result returned by function handle_call.

So, the received message includes information that identifies the operation to perform, the return address, and perhaps some parameters. This operation is "stateless" because the state required to process the request is provided with the request and is not maintained by loop. This is how REST works.

A key-word used in the last couple of paragraphs is "message". Note that the only coupling between sender and receiver in this message passing approach is the "shape" of the data passed (e.g. the tuple {call, From, Request}). This results in a very dynamic, and decoupled approach.

Erlang is highly scalable because its stateless, message-passing approach is suitable for all levels of distributed computing. It matters not if the distribution is among threads in one process or among hosts arrayed around the world. Erlang programmers use the same syntax, patterns and idioms, regardless of the level of distribution.

This is true scalability.

The Tsunami metaphor might have seemed like an over-statement, hopefully I have proved otherwise, but let us consider one final point. The Erlang programmer's skill scales. If a programmer can develop programs using concurrency and Erlang, it matters little if the program runs stand-alone on one box, or is widely distributed across the internet.
Do you feel the wave?