Architecture
============

The architecture of **PaPy** is remarkably simple and intuitive yet flexible. It
consists of only four core components (classes) to construct a data processing
pipeline. Each component provides an isolated subset of the functionality, which
includes defining the: processing nodes, connectivity and computational 
resources of a workflow and further enables deployment and run-time interactions 
(e.g. monitoring).

**PaPy** is very modular, functions can be used in several places in a
pipeline or re-used in another pipelines. Computational resources can be
shared among workflows and processing nodes.

In this chapter we first introduce object-otiented programming in the context
of **PaPy**, explain briefly the core components (building blocks). In later 
sections we revisit each component and explain the how and why.


Understanding the object-oriented model
---------------------------------------

**PaPy** is written in an object-oriented(OO) way. The main components: Plumber,
Dagger, Pipers and Workers are in fact class objects. For the end-user it is
important to distinguish between classes and class instances. In Python both
classes and class instances are objects. When you import the module in your
script::

  import papy

A new object (a module) will be availble i.e. you will be able to access classes
and functions provided by **PaPy** e.g.::

  papy.SomeClass

The name of the imported ``object`` will be ``papy``. This object has several 
attributes which correspond to the components and interface of ``papy`` e.g.::

  papy.Plumber
  papy.Dagger
  papy.Piper
  papy.Worker

Attributes are accessed in Python using the ``object.attribute`` notation. These
components are classes not class instances. They are used to construct class
instances which correspond to the run-time of the program. A single class can in
general have multiple instances. A class instance is constructed by "calling"
(in fact initializing) the class i.e.::

  class_instance = Class(parameters)

The important part is that using ``papy`` involves constructing class 
instances.::

  worker_instance = Worker(custom_function(s), argument(s))
  piper_instance = Piper(worker_instance, options)
  your_interface = Plumber(options)


core components
---------------

The core components form the end-user interface i.e. the classes which the user
is expected use directly.

  * NuMap - An implementation of an iterated map function which can process
            multiple tasks (function-sequence tuples) in parallel using
            either threads or processes on the local machine or on remote
            **RPyC** servers. ``NuMap`` instances represent computational
            resources.
  * Pipers(Workers) - combined define the processing nodes by wrapping 
                      user-defined functions and handling exceptions.
  * Dagger - defines the connectivity of the pipeline in the form of a directed
             acyclic graph i.e. the connectivity of the flow (pipes).
  * Plumber - provides the interface to set-up run and monitor a workflow at
              run-time.


The ``NuMap`` class
-------------------

The ``NuMap`` class is provided by the separate module ``numap`` and is 
described further in the section about parallel and distributed workflows. Here 
it suffices to say that it is an ``object`` which models a pool of cumputational
resources and allows to execute **multiple** functions using a shared pool of 
local or remote of workers. A ``NuMap`` can be used in any python code as an 
alternative to ``multiprocessing.Pool`` or ``itertools.imap``. For details
please refer to the documentation and API for ``numap``.

object provides a method to evaluate a functions on a sequence of changing 
arguments provided with optional positional and keyworded arguments to modify
the behaviour of the function. Just like ``multiprocessing.Pool.imap`` or 
``itertools.imap`` with the key differences that unlike ``itertools.NuMap`` it 
evaluates results in parallel. Compared to ``multiprocessing.Pool.imap`` it 
supports multiple functions (called tasks), which are evaluated not one after 
another, but in an alternating fashion. ``NuMap`` is completely independent from 
``PaPy`` and can be used separately (it is a standalone package).


evaluation In ``PaPy`` the lazy
``imap`` functions is replaced with a pool implementation ``NuMap``, which 
allows for a parallelizm vs. memory requirements trade-off.


The ``Worker`` class
--------------------

The ``Worker`` is a class which is created with a function or multiple functions
(and the functions arguments) as arguments to the constructor. It is therefore a
function wrapper. If multiple functions are supplied they are assumed to be 
nested with the last function being the outer most i.e.::

    (f,g,h) is h(g(f()))

If a ``Worker`` instance is called this compsite function is evaluated on the
supplied argument.::

    from papy import Worker
    from math import radians, degrees
    def papy_radians(input):
        return radians(input[0])
    def papy_degrees(input):
        return degrees(input[0])
    worker_instance = Worker((papy_radians, papy_degrees))
    worker_instance([90.])
    90.0

In this example we have created a composite ``Worker`` from two functions 
``papy_radians`` and ``papy_degrees``. The first function converts degrees to 
radians the second converts radians to degrees. Obviously if those two functions
are nested their result is identical to their input. ``papy_radians`` is 
evaluated first and ``papy_degrees`` second so the result is in degrees.

The ``Worker`` performs several functions:

  * standarizes the inputs and outputs of nodes.
  * allows to reuse and combine multiple functions into as single node
  * catches and wraps exceptions raised within functions.
  * allows functions to be evaluated on remote hosts.

A ``Worker`` expects that the wrapped function has a defined input and output
signature. The input is expected to be boxed in a tuple relative to the output, 
which should not be boxed. For example the ``Worker`` instance expects 
``[item]``, but returns just ``item``. Any function which conforms to this is a
valid Worker function. Most built-in functions need to be wrapped. Please refer
to the API documentation and examples on how to write Worker functions. 

If an exception is raised within any of the user written functions it is cought
by the ``Worker``, but is **not** raised, instead it is wrapped as a 
``WorkerError`` exception and returned.

The functionality of a ``Worker`` instance is defined by the functions it is
composed of and their arguments. Two ``Workers`` which are composed of the same
functions **and** are called with the same arguments are functionally identical
and a single ``Worker`` instance could replace them i.e. be used in multiple 
places of a pipeline or in other words in multiple ``Piper`` instances.

The functions within a ``Worker`` instance might not be evaluated by the same
process as the process that created (and calls) the ``Worker`` instance. This is
accomplished by the **RPyC** package and ``multiprocessing`` module. A 
``Worker`` knows how to inject its functions into a **RPyC** ``connection`` 
instance, after this the worker method will called in the local process,
but the wrapped functions on the remote host.

    import rpyc # import the RPyC module
    from papy import Worker
    power = Worker(pow, (2,)) # power of two
    power([2]) # evaluated locally
    4
    conn = rpyc.classic.connect("some_host") 
    power._incject(conn) # replace pow with remot pow
    power([3]) # evaluated remotely
    9

A function can run on the remote host i.e. remote Python process/thread only if
the modules on which this function depends are availble on that host and they 
are imported. ``NuMap`` provides means to attach import statements to function 
definitions using the ``imports`` decorator. In this way code sent to the remote
host will work if the imported module is availble remotely.::

   @imports(['re'])
   def match_string(input, string):
       unboxed = input[0]
       return re.match(string, unboxed)

The above example shows a valid worker function with the equivalent of the
import statment attached.::

    import re

The ``re`` module will be availble remotely in the namespace of this function 
i.e. other injected functions might not have access to ``re``. For more 
informations see the ``NuMap`` documentation.


Built-in worker functions
-------------------------

Several classes of ``Worker`` functions are already part of **PaPy**. This 
collection is expected to grow, currently the following types of workers are included.

  * core - basic data-flow
  * io - serialisation, printing and file operations

These are available in the ``papy.util.func`` module. This includes the family 
of passer functions. They do not alter the incoming data, but are used to pass 
only streams from certain imput pipes. For example a ``Piper`` connected to 
``3`` other Pipers might propagate input from only one.

  * ``ipasser`` - propagates the "i"th input pipe
  * ``npasser`` - propagates the "n"-first input pipes
  * ``spasser`` - propagetes the pipes with numbers in "s"

For example::

  from papy.util.func import *
  worker = Worker(ipasser, (0,)) # passes only the first pipe
  worker = Worker(ipasser, (1,)) # passes only the second pipe
  worker = Worker(npasser, (2,)) # passes the first two pipes
  worker = Worker(spasser, ((0,1),) # passes pipes 0 and 1
  worker = Worker(spasser, ((1,0),) # passes pipes 1 and 0

The output of the passes is a *single* tuple of the passed pipes::

  input0 = [0,1,2,3,4,5]
  input1 = [6,7,8,9,10,11]

  worker = Worker(spasser, (1,0))
  # will produce output
  [(6,0), (7,1), ...]

Functions dealing with input/output relations i.e. data storage and 
serialization currently allow serialization using the pickle and JSON protocols
and file-based data storage.

Data serialization is a way to convert objects (and in Python almost everything
is an object) into a sequence, which can be stored or transmitted. **PaPy** uses
the ``pickle`` serialization format to transmit data between local processes and
``brine`` (an internal serialization protocol from ``RPyC``) to transmit data 
between hosts. The user might however want to save and load data in a different 
format.


Writing functions for Workers
-----------------------------

A worker is an instance of the class Worker. Worker instaces are created by
calling the Worker class with a function or several functions as the argument.
optionally an argument set (for the function) or argument sets (for multiple
functions) can be supplied i.e.::

  worker_instance = Worker(function, argument)

or::

  worker_instance = Worker(list_of_functions, list_of_arguments)

A worker instance is therefore defined by two elements: the function or list of
functions and the argument or list of arguments. This means that two different
instances which have been initialized using the same functions *and* respecitve 
arguments are functionally equal. You should think of worker instances as nested
curried functions (search for "partial application").

Writing functions suitable for workers is very easy and adapting existing
functions should be the same. The idea is that any function is valid if it
conforms to a defined input/output scheme. There are only few rules which need to
be followed:

     #. The first input argument: each function in a worker will be given a n-tuple
        of objects, where n is the number input iterators to the Worker. For example 
        a function which sums two numbers should expect a tuple of lenght 2. 
        Remember python uses 0-based counting. If the Worker has only one input
        stream the input to the function will still be a tuple i.e. a 1-tuple.

     #. The additional (and optional) input arguments: a function can
        be given additional arguments.

     #. The output: a function should return a single object _not_ enclosed in a 
        wrapping 1-tuple. If a python function has no explicit return value it 
        implicitly returns None.

Examples:

single input, single ouput::

    def water_to_water(inp):
      result = inp[0]
      return result

single input, no explicit output::

    def water_to_null(inp):
      null = inp[0]

multiple input, single output::

    def water_and_wine(inp):
      juice = inp[0] + inp[1]
      return juice

multiple input, single output, parameters::

    def water_and_wine_dilute(inp, dilute =1):
      juice = inp[0] * dilute + inp[1]
      return juice

Note that in the last exemples inp is a 2-tuple i.e. the Piper based on such a 
worker/function will expect two input streams or in other words will have two 
incoming pipes. If on the other hand we would like to combine elements in the 
input/object from a single pipe we have to define a function like the
following::
        
    def sum2elements(inp):
        unwrapped_inp = inp[0]
        result = unwrapped_inp[0] + unwrapped_inp[1]
        return result

In other words the function receives a wrapped object but returns an unwrapped. 
All python objects can be used as results except Excptions. This is because 
Exceptions are not evaluated down-stream but are passively propagated.


Writing functions for output workers
------------------------------------

An output worker is a worker, which is used in a piper instance at the end of
a pipeline i.e. in the last piper.  Any valid worker function is also a valid
output worker function, but it is recommended for the last piper to persistently
safe the output of the pipeline. The output worker function should therefore
store it's input in a file, database or eventually print it on screen. The
function should not return data. The reason for this recommendation are related
to the implementation details of the IMap and Plumber objects.

    #. The Plumber instance runs a pipeline by retrieving results from output
       pipers *without* saving or returning those results

    #. The IMap instance will retrieve results from the output pieprs *without*
       saving whenever it is told to stop *before* it consumed all input.

The latter point requires some explanation. When the stop method of a running
IMap instance is called the IMap does not stop immediately, but is schedeuled to
stop after the current stride is finished for all tasks. To do this the output
of the pipeline has to be 'cleared' which means that results from output pipers
are retrieved, but not stored. Therefore the 'storage' should be a built-in
function of the last piper. An output worker function might therefore require an
argument which is a connection to some persistent storage e.g. a file-handle.


The ``Piper`` class
-------------------

A ``Piper`` instance represents a node in the directed graph of the workflow. 
It defines what function(s) should at this node be evaluated (via the supplied 
``Worker`` instance) and how they should be evaluated (via the optional 
``NuMap`` instance, which defines the uses computational resources). Besides 
that it performs additional functions which include:

  * logging and reporting
  * exception handling
  * timeouts
  * produce/spawn/consume schemes

To use a ``Piper`` outside a workflow three steps are required:

  * creation - requires a ``Worker`` instance, optional arguments e.g. a 
    ``NuMap``  instance. (``__init__`` method)
  * connection - connects the ``Piper`` to the input. (``connect`` method)
  * start - allows the ``Piper`` to return results, starts the evaluation in 
    ``NuMap``. (``start`` method)

In the first step we define the ``Worker`` which will be evaluated by the 
``Piper`` and the ``NuMap`` resource to do this computation. Computational 
resources are represented by ``NuMap`` instances. An ``NuMap`` instance can 
utilize local or remote threads or processes. If no ``NuMap`` instance is given
to the constructor the ``itertools.imap`` function will be used instead. This 
function will be called by the Python process used to construct and start the
**PaPy** pipeline.

**PaPy** has been designed to monitor the execution of a workflow by logging at
multiple levels and with a level of detail which can be specified. It uses the 
built-in Python logging (the ``logging`` module). The ``NuMap`` function, which
should at this  stage be bug free logs only DEBUG statements. Exceptions within
``Worker`` functions are wrapped as ``WorkerError`` exceptions, these errors are
logged by the ``Piper`` instance, which wraps this ``Worker`` (a single 
``Worker`` instance can be used by multiple ``Pipers``). By default the pipeline
is robust to ``WorkerErrors`` and these exceptions are logged, but they do not 
stop the flow. In this mode if the called ``Worker`` instance returns a 
``WorkerError`` the calling ``Piper`` instance wraps this error as a 
``PiperError`` and **returns** (not raises) it downstream into the pipeline. On
the other end if a ``Worker`` receives a ``PiperError`` as input it just 
propagates it further downstream i.e. it does not try meaningless calculations
on exceptions. In this way errors in the pipeline propagate downstream as place 
holder ``PiperErrors``.

A ``Piper`` instance evaluates the ``Worker`` either by the supplied ``NuMap`` 
instance (described elswhere) or by the builtin ``itertools.imap`` function 
(default). In reality after a ``Piper`` is connected to the input it creates a 
task i.e. function, data, arguments ``tuples``, which are added to the ``NuMap``
instance used to call the imap function.

``NuMap`` instances support timeouts via the optional timeout argument supplied to the
next method. If the ``NuMap`` is not able to return a result within the specified
time it raises a TimeoutError. This exception is cought by the ``Piper`` instance
which expects the result, wrapped into a PiperError exception and propagated
down-stream exactly like WorkerErrors. If the ``Piper`` is used within a pipeline
and a timeout argument given the skipping argument should be set to true
otherwise the number of results from a ``Piper`` will be bigger then the number of
tasklets, which will hang the pipeline.::

   # valid with or without timeouts
   universal_piper = Piper(worker_instance, parallel =imap_instance, skipping =True)
   # valid only with timeouts
   nontimeout_piper = Piper(worker_instance, parallel =imap_instance, skipping =False)

Note that the timeouts specified here are 'computation time' timeouts. If for
example a worker function waits for a server response and the server response
does not arrive within some timeout (which can be an argument for the Worker)
then if this exception is raise within the function it will be wrapped into a
WorkerError and raturned not raised as TimeoutErrors.

A single ``Piper`` instance can only be used once within a pipeline (this is 
unlike ``Worker`` instances). ``Pipers`` are created first and connected to the
input data later. The latter is accomplished by their ``connect`` method.::

    piper_instance.connect(input_data)

If the ``Piper`` is used within a **PaPy** pipeline i.e. a ``Dagger`` or 
``Plumber`` instance the user does not have to care about connecting individual
``Pipers``. A ``Piper`` can only be started or disconnected if it has been 
connected before.::

    piper_instance.connect(input_data)
    piper_instance.disconnect()
    # or
    piper_instance.start()

After starting a ``Piper`` the tasks are submitted to the thread/process workers
in the ``NuMap`` instance and they are evaluated. This is a process that 
continues until either the memory "buffer" is filled or the input is consumed. 
Therefore a ``Piper``cannot be simply disconnected when it is "running". A 
special method is needed to tell the ``NuMap`` instance to stop input 
consumption. Because ``NuMap`` instances are shared among ``Pipers`` such a 
stop can only occur at "stride" boundaries, which are batches of data traversing
the workflow. The ``Piper`` stop method will eventually stop the ``NuMap`` 
instance and put the ``Piper`` in a stopped state that allows the ``Piper`` to 
be disconnected.::

    piper_instance.start()
    piper_instance.stop()
    piper_instance.disconnect() # can be connected and started

Because the stop happens at "stride" boundary data is not lost during a stop. 
This can be illustraded as follows::

    #           plus2            plus1
    # [1,2,3,4] -----> [3,4,5,6] -----> [4,5,6,7]
    # which is equivalent to the following:
    # plus1(plus2([1,2,3,4]) 

If the ``Pipers`` ``plus2`` and ``plus1`` share a single ``NuMap`` and the 
"stride" is ``2`` then the order of evaluation can be (if the results are 
retrieved)::

    temp1 = plus2(1)
    temp2 = plus2(2)
    plus1(temp1)
    plus1(temp2)
    <<return>>
    <<return>>
    temp1 = plus2(3)
    temp2 = plus2(4)
    plus1(temp1)
    plus1(temp2)
    <<return>>
    <<return>>

Now let's assume the the stop method has been called just after ``plus2(1)``. We
do not want to loose the ``temp1`` result (as ``1`` has been already consumed 
from the input iterator and iterators cannot rewind), but we can achieve this 
only if ``plus1(temp1)`` is evaluated this in turn (due to the order of e
valuation) can happen only after ``plus2(2)`` has been evaluated (i.e. ``2`` 
consumed from the input iterator). To not loose ``temp2`` ``plus1(temp2)`` has
to be evaluated and finally the evaluation can stop.::

    temp1 = plus2(1)
    temp2 = plus2(2)
    plus1(temp1)
    plus1(temp2)
    (stopped)

After the stop method returns all worker processes/threads and helper threads
return (join) and the user can close the Python interpreter. 

It is **very** important to realize what happens with the two calculated 
results. As has been already mentioned a proper **PaPy** pipeline should have
an output ``Piper`` i.e. a one that persistently stores the result.


The ``Dagger``
--------------

The ``Dagger`` is an object to connect ``Piper`` instances into a directed 
acyclic graph (DAG). It inherits most methods of the ``DictGraph`` object, which
is a concise implementation of a graph data-structure. The ``DictGraph`` 
instance is a dictionary of arbitary hashable objects i.e. the "object nodes" 
e.g. a ``Piper``. The values for the objects are instances of the ``Node`` class
i.e. "topological nodes". A "topological node" instance is a also dictionary of 
"object nodes" and their corresponding "topological nodes". An "object node"(A) 
of the ``DictGraph`` is contained in a "topological node" corresponding to 
another "object node"(B) if there exist an edge from (A) to (B). A and B might 
even be the same "object node" (self-loop). A "topological node" is therefore a
sub-graph of the ``DictGraph`` instance centered around a "object node" and the
whole ``DictGraph`` is a recursively nested dictionary. The ``Dagger`` is 
designed to store ``Piper`` instances as "object nodes" and provides additional
methods, whereas the ``DictGraph`` makes no assumptions about the ``object``
type. 


Edges vs. pipes
_______________

A ``Piper`` instance is created by specifiying a ``Worker`` (and optionally 
``NuMap`` instance) and needs to be connected to an input. The input might be 
another ``Piper`` or any Python iterator. The output of a ``Piper`` (upstream) 
can be consumed by several ``Pipers`` (downstream), while a ``Piper`` 
(downstream) might consume the results of multiple ``Pipers`` (upstream). This 
allows ``Pipers`` to be used as arbitrary nodes in a directed acyclic graph the
``Dagger``.

To be precise the direction of the edges is opposite
to the direction of the data stream (pipes). Upstream ``Pipers`` have incomming
edges from downstream ``Pipers`` this is represented as a pipe with a opposite
orientation i.e. upstream -> downstream. 

As a result of the above it is much more natural to think of connections between
``Pipers`` in terms of data-flow upstream --> downstream (data flows from 
upstream to downstream) then dependency downstream --> upstream (downstream
depends on upstream). The ``DictGraph`` represents dependancy information as 
directed edges (downstream --> upstream), while the ``Dagger`` class introduces
the concept of pipes to ease the understanding of **PaPy** and make mistakes 
less common. A pipe is nothing else then a reversed edge. To make this 
explicit::

    input -> piper0 -> piper1 -> output # -> represents a pipe (data-flow)
    input <- piper0 <- piper1 <- output # <- represents an edge (dependancy)

The data is stored internally as edges, but the interface uses pipes. Method
names are explicit.::

    dagger_instance.add_edge() # inherited expects and edge as input 
    dagger_instance.add_pipe() # expecs a pipe as input 

.. note::

    Although all ``DictGraph`` methods are availble from the ``Dagger`` the 
    end-user should use ``Dagger`` specific methods. For example the 
    ``DictGraph`` method ``add_edge`` will allow to add any edge to the 
    instance, whereas ``add_pipe`` method will not allow to introduce cycles.


Working with the ``Dagger``
___________________________

Creation of the a ``Dagger`` instance is very easy. An empty ``Dagger`` instance 
is created without any arguments to the constructor.::

    dagger_instance = Dagger()

Optionally a set of ``Pipers`` and/or pipes can be given:: 

    dagger_instance = Dagger(sequence_of_pipers, sequence_of_pipes)
    # which is equivalent to: 
    dagger_instance.add_pipers(sequence_of_pipers)
    dagger_instance.add_pipes(sequence_of_pipes)
    # a sequence of pipers allows to easily add branches
    dagger_instance.add_pipers([1, 2a, 3a, 4])
    dagger_instance.add_pipers([1, 2b, 3b, 4])
    # in this example a Dagger will have 6 pipers (1, 2a, 2b, 3a, 3b, 4), one 
    # branch point 1, one merge point 4, and two branches (2a, 3a) and (2b, 3b).

The ``Dagger`` allows to add/delete ``Pipers`` and pipes::

    dagger_instance.add_piper(``Piper``) 
    dagger_instance.del_piper(``Piper`` or piper_id)
    dagger_instance.add_pipers(pipers)
    dagger_instance.del_pipers(pipers or piper_ids)

The id of a ``Piper`` is a run-time specific number associated with a given 
``Piper`` instance. This number can be obtained by calling the built-in function
id::

    id(``Piper``)

This number is also shown when a ``Piper`` instance is printed.::

    print piper_instance

or represented::

    repr(piper_instance)

The representation of a ``Dagger`` instance also shows the id of the ``Pipers``
which are contained in the workflow.::

    print dagger_instance

The id of a ``Piper`` instance is define at run-time (it corresponds to the 
memory address of the object) therefore it should not be used in scripts or 
saved in  any way. Note that the lenght of this number is platform-specific and
that no guarantee is made that two ``Pipers`` with non-overlapping will not have
the same id. The resolve method::

   dagger_instance.resolve(``Piper`` or piper_id)

returns a ``Piper`` instance if the supplied ``Piper`` or a ``Piper`` with the 
supplied id is contained in the dagger_instance. This method by default raises a
``DaggerError`` if the ``Piper`` is not found. If the argument forgive is ``True``
the method returns ``None`` instead::

   dagger_instance.resolve(missing_piper) # raise DaggerError
   dagger_instance.resolve(missing_piper, forgive =True) # returns None


The ``Dagger`` run-time
_______________________

The run-time of a ``Dagger`` instance begins when it's start method is called.
A ``Dagger`` can only be started if it is connected. Connecting a ``Dagger`` 
means to connect all ``Pipers`` which it contains as defined by the pipes in the 
``Dagger``. After the ``Dagger`` is connected it can be started, starting a ``Dagger``
means to start all it's ``Pipers``. ``Pipers`` have to be started in the order of 
the data-flow i.e. a ``Piper`` can only be started after all it's up-stream 
``Pipers`` have been started. An ordering of nodes / ``Pipers`` of a graph / ``Dagger`` 
which has this property is called a postorder. There are possibly more then one
postorder per graph ``Dagger``. The exact postorder used to connect the ``Pipers``
has some additional properties

    - all down-stream ``Pipers`` for a ``Piper`` (A) come before the next ``Piper`` 
      (B) for which no such relationship can be established. This can be thought
      as maintaining branch contiguity.
      
    - such branches can additionally be sorted according to the branch argument
      passed to the ``Piper`` constructor.

Another aspect of order of a ``Dagger`` is the sequence by which a down-stream 
``Piper`` connects multiple up-stream ``Pipers``. The inputs cannot be sorted 
based solely on their postorder because the down-stream ``Piper`` might be 
connected directly to a ``Piper`` to which one of it's other inputs has been 
connected before. The inputs of a ``Piper`` are additionaly sorted so that all 
down-stream ``Pipers`` come before up-stream ``Pipers``, while ``Pipers`` for which no
such relation can be established are still sorted according to their index in 
the postorder. This can be thought of as sorting branches by their "generation".

You could
think of a workflow as an ``imap`` function composed  from nested ``imap`` functions i.e.::

  # nested imaps as pipelines
  pipeline = imap(h, izip([imap(f, input_for_f), imap(g, input_for_g)]))

This is a pipeline of ``3`` functions ``f``, ``g``, ``h``. Functions ``f`` and 
``g`` are upstream relative to ``h``. Because of the ``izip`` function 
input_for_f and input_for_g have to be of the same lenght.

A started ``Dagger`` is able to process input data. The simplest way to process 
all inputs is to zip it's output ``Pipers``::

    output_pipers = dagger_instance.get_outputs()
    final_results = zip(output_pipers)
    
If any of the ``Pipers`` used within a ``Dagger`` uses an ``NuMap`` instance and the 
``Dagger`` is started. The Python process can only be exited cleanly if the 
``Dagger`` instance is stopped by calling it's `stop` method. 
    

The ``Plumber``
---------------

The ``Plumber`` is an easy to use interface to **PaPy**. It inherits from the 
``Dagger`` object and can be used like a ``Dagger``, but the ``Plumber`` class 
adds  methods related to the "run time" of a pipeline. A ``Plumber`` can 
start/run/pause/stop a pipeline and additionally load and save a workflow (not 
implemented) A **PaPy** workflow is loaded and saved as executable Python code, 
which has the  same priviliges as the Python process. Please keep this in mind 
starting workflows from untrusted sources!


The additional components
-------------------------

Those classes and functions are used by the core components, but are general and
might find application in your code.

  * ``DictGraph``(``Node``) - Two classes which implement a graph data-structure
    using a recursively nested dictionary. This allows for simplicity of 
    algorithms/methods i.e. there are no edge objects because edges are the keys
    of the ``Node`` dictionary which in turn is the value in the dictionary for 
    the arbitrary object in the ``DictGraph`` instance i.e.::

      from papy import Graph
      graph = Graph()
      object1 = '1'
      object2 = '2'
      graph.add_edge((object1, object2))
      node_for_object1 = graph[object1]
      node_for_object2 = graph[object2]

    The ``Dagger`` is a ``DictGraph`` object with directed edges only and no 
    cycles.

  * imports - a function wrapper, which allows to inject import statments to
    a functions local namespace at creation (code execution) e.g. on a remote
    Python process.