parameterised execution

by Chris Zheng,

The release of hara 2.2.13 and the finalization of hara.concurrent.procedure is another milestone in my understanding of computation. The concept of execution, like the concept of a continuation has taken me almost 2 years to grok and then another 6 months to expand upon this idea of what I think it implies.

There is a very simple idea used all the time in programming - that a function could be replaced by a lookup. When I first heard mentioned, the concept completely blew my mind. It's a really simple idea but it held enormous consequences for how we build and use functions. This concept holds true for any trade off between time taken for computation and space required for caching. We practitioners should be aware of this fact - especially when it comes down to building for performance.

In today's world of concurrent operations, sharing of data is super important and caching becomes just one of the aspects of how processes deal with it's inputs, outputs as well as with each other.

I've always liked the idea of a cell - something that is self-contained and has enough information about itself and it's surroundings. As such, I've always felt that the currency semantics such as threads, futures and promises are not self-containing. This means that they rely on many external signals for making decisions. This is fine for simple processes, but for more interesting/intelligent tasks, it may not be enough.

A Smarter Future

The concurrent.procedure construct is like a future, but contains more information and so is more self aware. This is a very useful construct for workflow modelling and concurrent applications where the library provides rich information about the execution of a particular running instance:

the function that originated the process instance
the thread or future on which the instance is executing
the result (maybe cached) of the execution if returned
the time of execution
the id of the process (used for identification)
other running instances of the process

What the library resolves is dissociation between what should be three concepts critical to concurrent execution: the function (governed by its definition), the execution (governed by time) and the result (governed by input). By being to have them all together, execution becomes much easier to control and reason about.

Sychronous Execution

Execution is not an issue in a synchronous world; it occurs with the ticking of the system clock and there are no other processes that are able to affect the world, we can establish a link between the past and the future because there is no present as such in terms of what is happening right now. It just doesn't exist, or rather we don't need to account for it in order to build our programs. In the synchronous world, there is no difference between a function and a lookup table of inputs and outputs

concurrent execution

Concurrent Execution

In the concurrent world things happen very differently; or rather, things are required to be accounted very differently in order for a program to succeed. Time the conquerer is the master behind all calculation. To neglect time is a idealistic and will ultimately result in failure. Execution is now as important as the function and the results of the function.

concurrent execution

How a function deals with other functions around it that potentially has all the answers that it needs is really important.

Coordination Execution

An example of how this could be useful can be shown below.

Computation C is estimated to take 10 hours to complete and both process A and process B both require the same result. Now, process A has started computation for 9 and a half hours but has not finished and process B is starting. In this case, instead of waiting another 10 hours for computation, if process B is aware that process A is already doing the computation, all it needs to do is to wait on the result of A instead of starting from scratch.

Instead of taking 10 hours, process B will just take 0.5 hours.

This is a new concept that I've been playing with and at the top level, instead of writing code that looks like this:

(restart
  (sync
     (my-function)))

we write it like this:

(my-function {:retry ....
              :mode  :sync})))

The advantage is that the function can be passed a data structure that can be manipulated on the fly. This is quite a big paradigm shift more me and I believe that it the building blocks for smarter and more self sufficient processes.

There is alot more in the docs. Please have a play!

clj.orcery

18