Last modified 4 years ago Last modified on 06/25/15 16:17:17


A system with well defined interfaces and objectives is fairly reliable on a single host. Distribute that same system over multiple hosts and it becomes inherently unreliable. Why? You do not have control over intermediary "thingies" between those systems. You may think you do, but you don't. At any given time, something will fail. This is a given, accept it and plan for it.

With that being said, this environment tries to be reliable.


Reliability starts from the ground up. You should have a consistent call interface within your modules. You should have well defined interfaces between your modules. When your modules are combined into procedures they should have consistent exception handling. There should be consistent command line options and return known exit codes when done running. When procedures communicate between themselves they should all use the same standard protocols. There should be no surprises. When you start doing this you start to have a reliable system.

And you can do this in any programming language. There really is nothing in any particular language that makes it more "reliable". Sure, some have some built in capabilities for this, but it all comes down to a disciplined programmer. A disciplined programmer can write good, reliable, software in any language, a undisciplined one can't, it is as simple as that.


This is loosely coupled environment. There is no direct one-to-one communications between procedures. A message queue is used as an intermediary between them. This is done for a reason. It makes the endpoints simpler. They don't have to maintain an internal queue of messages with all the management overhead. It can be pushed off to a dedicated process and that process can exist anywhere within the environment. Here is a diagram of how this environment works.

                        (message queue server)
                             /         \
                            /           \
         +----+            /             \               +----+
         |    |           /               \              |    |
         |    |-->[spooler]                [collector]-->|    |
         |    |                                          |    |
         +----+                                          +----+
   spool directories                                    datastore

The spooler is standalone. It knows its local environment and how to communicate to the message queue. When it sends a message all it knows is that it reached its destination. It is responsible for maintaining the local datastore.

The collector is also standalone. It knows its local environment and how to communicate to the message queue. When it receives a message it does something with it.

This is known as "store and forward" messaging. It is a reliable, tried and true way to send something across a network.

Self Healing

In this environment, if a "thingie" falls off the network. The messages queue up, ready to be delivered when the "thingie" comes back on-line. When the message queue server is configured to store messages in a backing store, you will not loose messages.

Which makes the environment reliable.