I received many responses to my blog post about the ten basic rules of enterprise systems design. I have been asked repeatedly: “How can you separate systems into slices?”, “How do you make a good pull system?”, and most importantly, “How do you take the load off IIS and distribute it across multiple servers without the need of a web farm and load balancing?”
I cannot disclose all of my secrets in this blog, and in the real world each system is different. There is no universal solution to the questions outlined above; however, I have decided to release one of the frameworks I use to help me distribute workload across enterprise systems. This is the fifth edition of this framework. It was originally written to help me calculate taxes in a payroll system and evolved into a distributed processing engine. A container you simply “turn on” and will be there for you day in and day out processing and running your business rules. From media encoding servers to report preparations, mailing, printing, archiving and file moves this container is able to do it all. It provides an in-process workflow engine and contains some behavioral attributes as well as a scalable model allowing you to simply turn more VMs on when your system becomes busy, and power them down when workload is light. The framework’s scalability will allow it to continue working as long as there is an OS running somewhere in your data center. As a result this system might very well be the most reliable part of your infrastructure. It currently powers some of the biggest online businesses including some brand names you know and is truly battle tested.
The heart of the framework is a dependency injection based workflow engine. The engine uses a workflow XML configuration file in order to decide what set of processing steps to execute on a request. The system is logically separated into producers and consumers. The producers produce a work request. They signal the framework that they need something done sometime in the future. They request work to be done and they don’t sit around waiting for it. They simply continue executing code. For example: a producer can request a sales report to be calculated and emailed to the sales manager. The report might be generated on another process, another computer or even in another data center. The producer is unaware of when and where the report is generated, it simply knows a report will be emailed and continues on with its tasks. A consumer constantly listens to work requests. The request can come in from different channels. If parts of the systems are down, the consumer simply stops listening to work requests from it until it is up again. When a work request comes in, a consumer inspects the type of work to be done. It then reads the workflow for this type of request and executes the work steps sequentially or in parallel. It can notify the system when it is done or when an error occurs and a producer can choose to wait for a result and act on it. The open release does not include the real time work report or the producer wait system and if you need them you will have to write the code in your own app. It’s easy and takes less than ten lines of code.
How does the Inverted Software Workflow Engine work?
The Framework uses Microsoft Message Queue Server. The server is included in the Windows operating system and can be installed using the “Turn windows features on or off” dialog.
Microsoft Message Queue is an easy and reliable way to send messages cross processes and cross-forest. Messages are simply kept as files on the disk until they are removed by the server. This ensures no data ever gets lost in memory and if there is a server failure, everything can be recovered.
The entire process can run on three or less logical server groups. All parts can even work on the same server if you are willing do away with the scalability aspects of the system.
The groups are:
The producer group: A group of computers that belongs to the end users. They are used to send work requests to the queue servers.
The queue servers group: This group receives work requests and holds them until a consumer picks a request.
The consumers group: Run the workflow container and pick up work requests from the queue servers. Since the container is constantly “on” it is best to use a windows service to host the framework.
We can also use a single queue to handle work requests; however this creates a single point of failure in your system. If the queue server is down the entire system is down. This does not matter if the queue resides on the same physical box as the service, but, for a truly distributed system you need to use separate machines.
To insure a timely processing of messages, the order of message delivery is reverse to the order of pickup.
Installation and Configuration
The framework uses an XML workflow file in order to specify the steps to run when a work request is received. Each work request contains data and the name of the job requested on the message. A typical workflow looks like this:
<Job Name="ExampleJob" MessageClass="ExampleMessage" NotifyComplete="true" MaxRunTimeMilliseconds="36000000">
<Queue MessageQueue=".\Private$\WorkflowEngine.ExampleJob" ErrorQueue=".\Private$\WorkflowEngine.ExampleJobError" PoisonQueue=".\Private$\WorkflowEngine.ExampleJobPoison" CompletedQueue=".\Private$\WorkflowEngine.ExampleJobComplete" MessageQueueType="Transactional"></Queue>
<Queue MessageQueue=".\Private$\WorkflowEngine.ExampleJobBackup" ErrorQueue=".\Private$\WorkflowEngine.ExampleJobErrorBackup" PoisonQueue=".\Private$\WorkflowEngine.ExampleJobPoisonBackup" CompletedQueue=".\Private$\WorkflowEngine.ExampleJobCompleteBackup" MessageQueueType="Transactional"></Queue>
<Step Name="CopyFiles" Group="group1" InvokeClass="InvertedSoftware.WorkflowEngine.Steps.CopyFiles" OnError="Exit" RetryTimes="3" WaitBetweenRetriesMilliseconds="5000" RunMode="STA" DependsOn="" DependsOnGroup="" WaitForDependsOnMilliseconds="" RunAsDomain="" RunAsUser="" RunAsPassword=""></Step>
<Step Name="RenameFiles" Group="group1" InvokeClass="InvertedSoftware.WorkflowEngine.Steps.RenameFiles" OnError="Exit" RetryTimes="3" WaitBetweenRetriesMilliseconds="5000" RunMode="STA" DependsOn="CopyFiles" DependsOnGroup="" WaitForDependsOnMilliseconds="" RunAsDomain="" RunAsUser="" RunAsPassword=""></Step>
Workflow file specifications:
Workflow: Contains a collection of jobs. Each job can be executed by an instance of the framework.
Job: A defined set of ordered operations to be executed with a defined set of steps and behavioral attributes.
Name – The friendly name of the Job. This is used to start the framework thread.
MessageClass – The class name to serialize the request message. The framework will send and pick up messages from the queue using the specified class name.
NotifyComplete – If true then it will send a message to the CompletedQueue on successful completion of a job.
MaxRunTimeMilliseconds – The maximum time the framework will allow a job to run. If a job has reached the maximum time the framework will call Thread.Abort() on the job thread.
Optionals: You can bypass the collection of Queue tags and specify only one set of MessageQueue, ErrorQueue, PoisonQueue and CompletedQueue. This will improve performance between job startups but will compromise scalability since there will only be one set of Queues.
Queues: A tag containing a collection of Queue tags to work with. The order of the Queue tags is important. When sending messages the framework will select the first online queue. When picking up messages, the framework will start at the last online queue and try to pick up any messages from it. If it is empty the framework will continue up the list of queues until it can find a message to process. If all the queues are empty the framework will start listening for incoming messages in the first queue.
Queue: Contains MessageQueue, ErrorQueue, PoisonQueue , CompletedQueue and MessageQueueType. MessageQueueType indicates if the MessageQueue is Transactional or NonTransactional. The rest of the queues will always be Transactional.
Steps: Contains a collection of Step tags to execute each time a message is being processed.
Name – The friendly name of the step, usually the class name.
Group – The logical group which this step belongs.
InvokeClass – The class to invoke by the framework.
OnError – The expected behavior in case of an error of RetryJob, Skip, RetryStep or Exit
RetryTimes – The number of times to retry the step or job in case of an error.
WaitBetweenRetriesMilliseconds – The milliseconds to wait in case the framework retries the step or job.
RunMode – The run mode of the step, either: STA or MTA. MTA mode allows for parallel execution of steps, however if your last step is an MTA, the framework might indicate complete before the last step finished running as the job itself was completed.
DependsOn – Steps that are required to complete before execution of this step. The steps to wait for are listed as a comma separated list of step names. If the step(s) this step depends on did not successfully finish, the current step will not run.
DependsOnGroup – You can specify the group or groups this step depends on. The framework will only invoke the step if all groups completed successfully.
WaitForDependsOnMilliseconds – The time to wait for DependsOn or DependsOnGroup before ignoring them.
RunAsDomain – Impersonation domain.
RunAsUser – Impersonation user.
RunAsPassword – Impersonation password.
Additional configuration values:
FrameworkMaxThreads – The maximum number of jobs an instance of the framework will process in parallel.
FrameworkConfigLocation – The location of the workflow file.
UsePipelinedOnMulticore – Indicate whether to use pipelined execution on multi core servers (currently in Beta).
Installing the queues:
ErrorQueue, PoisonQueue and CompletedQueue must be defined as transactional queues. MessageQueue can be NonTransactional. Please make sure the queues are defined private and that they have the correct security permissions to allow both producers and consumers access rights.
Calling the framework:
In order for a producer to request a job execution it needs to call the FrameworkManager in code. A call looks like this:
ExampleMessage message = new ExampleMessage()
CopyFilesFrom = @"C:\FrameworkTest\Source\",
CopyFilesTo = @"C:\FrameworkTest\Destination\"
A framework consumer must be started with a specific job name. Since the framework thread will block as long as the framework is running it is a good idea to start the framework in its own thread. The consumer’s code looks like this:
catch (Exception ex)
MessageBox.Show("Framework Error." + ex.Message);
You can add as many steps as you like. Please see the example steps I have added to the steps folder. I always try and keep all the steps in the step folder as good practice. You can think about the steps as residing at the top of your business layer. They have access to the rest of the business layer and to the data layer (if you are doing layer cross cutting). Each step needs to implement the IStep interface and define a RunStep method accepting an IWorkflowMessage. RunStep will be the first method to be invoked in the step by the framework.
internal class CopyFiles : IStep
/// The method executed by the framework
/// <param name="message"></param>
public void RunStep(IWorkflowMessage message)
ExampleMessage myMessage = message as ExampleMessage;
if (myMessage == null)
throw new WorkflowStepException("IWorkflowMessage is of the wrong type");
Parallel.ForEach<string>(Directory.EnumerateFiles(myMessage.CopyFilesFrom, "*"), f =>
File.Copy(f, myMessage.CopyFilesTo + @"\" + Path.GetFileName(f), true);
catch (Exception e)
throw new WorkflowStepException(e.Message, e);
public void Dispose()
Theoretically the framework can invoke any method in any class; however, for the benefits of maintainability I have restricted the workflow to the steps folder/RunStep method.
In order to use the framework you need to include three dlls in your project: InvertedSoftware.WorkflowEngine.Common.dll, InvertedSoftware.WorkflowEngine.DataObjects.dll and InvertedSoftware.WorkflowEngine.dll
How to get the framework:
I have attached the complete version of the framework to this blog post and included an example of a producer and consumer. In order to run it you need to install the Queues specified in the example workflow or define your own.
Unfortunately I am unable to offer free support on the framework at this time; however, I will be happy to set up a CodePlex project depending on the adoption rate.
9 Jan 2011 6:05 PM