PaPy - Parallel Pipelines in Python¶

A parallel pipeline is a workflow, which consists of a series of connected processing steps to model computational processes and automate their execution in parallel on a single multi-core computer or an ad-hoc grid.

You will find PaPy useful if you need to design and deploy a scalable data processing workflow that depends on Python libraries or external tools. PaPy makes it reasonably easy to convert existing code bases into proper workflows.

Repository: github.com/mcieslik-mctp/papy
Author: mcieslik@med.umich.edu

This documentation covers the design, implementation and usage of PaPy. It consists of a hand-written manual and an API reference. Please refer also to the rich comments in the source code, examples, workflows and test cases (all included in the source-code distribution).

NuMap¶

NuMap is a parallel (thread- or process-based, local or remote), buffered, multi-task replacement for map and multiprocessing.Pool.map. Like map it evaluates a function on elements of a sequence or iterable, and it does so lazily. Laziness can be adjusted via the "stride" and "buffer" arguments. Unlike map, NuMap supports multiple pairs of function and iterable tasks. The tasks are not queued rather they are interwoven and share a pool of worker "processes" or "threads" and a memory "buffer".

Requires Python 3.12+.