PaPy - Parallel Pipelines in Python

A parallel pipeline is a workflow, which consists of a series of connected processing steps to model computational processes and automate their execution in parallel on a single multi-core computer or an ad-hoc grid.

You will find PaPy useful if you need to design and deploy a scalable data processing workflow that depends on Python libraries or external tools. PaPy makes it reasonably easy to convert existing code bases into proper workflows.

This documentation covers the design, implementation and usage of PaPy. It consists of a hand-written manual and an API-reference. Please refer also to the rich comments in the source code, examples, workflows and test cases (all included in the source-code distribution).

NuMap is a parallel (thread- or process-based, local or remote), buffered, multi-task, itertools.imap or multiprocessing.Pool.imap function replacment. Like imap it evaluates a function on elements of a sequence or iterable, and it does so lazily. Laziness can be adjusted via the “stride” and “buffer” arguments. Unlike imap, NuMap supports multiple pairs of function and iterable tasks. The tasks are not queued rather they are interwoven and share a pool or worker “processes” or “threads” and a memory “buffer”.

The package is tested on Python 2.7+


