Optimization ############ The throughput of a pipeline will be most significantly limited by the slowest ``Piper``. A processing node might be slow either because it does a CPU-intensive or IO-intensive task, because it waits for some data, or because it synchronizes with other nodes and waits. Identifying bottlenecks ----------------------- As a general rule you should optimize the bottleneck(s) only. Therefore it is critical to understand where and what the bottleneck is. This has good reason as most of your nodes will not limit the throughput of the workflow while parallelization is quite expensive. If your pipeline has no obvious bottleneck it's probably fast enough. If not you might be able to use a shared pool. Understanding bottlenecks ------------------------- To Be Written. Addressing synchronization ========================== Unordered Pipers ---------------- Unordered pipers return results in an arbitrary order e.g for the input sequence ``[3,2,1]`` a parallel unordered ``Piper`` instance with a function that doubles the input might return ``[6,2,4]`` or any other permutation of the doubled numbers. Unordered nodes do not compute faster they only make the results available sooner. Thus a down-stream computation that uses the same computational resource can start earlier and potentially utilize it to a fuller extent. You should consider unordered ``Pipers`` if the computation time for data items varies significantly. Addressing serialization ======================== To Be Written. Distributing Computational resources ==================================== As a general rule of you most likely should not use a shared ``NuMap`` instance among all ``Pipers`` within a workflow. If the throughput of your pipeline is limited by a cpu-intensive tasks you should parallelize this node. **PaPy** allows to parallelize cpu-bound ``Pipers``. The amount of cpu-power should be proportional to the computational requirements of a processing task. The number of recommended ``NuMap`` pool worker processes should equal or slightly larger than the number of physical CPU-cores on each local or remote computer.