Skip to content

Dictionary of Terms

A dictionary of terms used within the documentation.

map

Higher-order map function. A function which evaluates another function on all elements of the input collection.

NuMap

A parallel implementation of a multi-task map function, which is used within PaPy. It uses a pool of worker-threads or worker-processes and evaluates functions in parallel either locally or remotely.

Worker function

A function with a standardized input written to be used by a Worker class instance. All processing of a PaPy pipeline has to be coded as worker functions.

worker process / thread

A thread or process inside an NuMap instance evaluating a tasklet remotely or locally.

Worker

An object-oriented wrapper for worker functions, it is roughly equivalent to a "function with partially applied arguments".

Piper

An object-oriented wrapper for Worker instances, corresponds to "worker with defined mode of evaluation".

Dagger

A directed acyclic graph (DAG) to store and connect Piper instances.

Plumber

A wrapper for the Dagger designed to run and interact with a running pipeline.

input stream

The input stream is the data that enters a PaPy pipeline. The data is assumed to be a collection of items expressed as a Python iterator (or any object which has the __next__ method).

Any sequence (e.g. a list or a tuple) can be made into an iterator using the Python built-in iter function e.g:

sample_sequence = [data_point1, data_point2, data_point3]
sample_iterator = iter(sample_sequence)

Files are by default line-iterators i.e.:

sample_file = open('sample_file.txt')
next(sample_file)  # returns the first line
next(sample_file)  # returns the second line

output stream

Input item saved (to disk) by an output Piper. By default the output Piper should return a None for every input item, but save the result persistently (somehow/somewhere).

item

A single element of the data stream.

input Piper

A Piper, which is connected to an input stream (or multiple input streams) is an input Piper. Such a Piper corresponds to a node in the graph which has no upstream nodes within the PaPy workflow or in other words has no outgoing edges in the directed acyclic graph.

output Piper

A Piper, which generates the output stream is an output Piper. A PaPy workflow might have multiple output Pipers in different places of the pipeline. An output Piper corresponds to a node in the graph which has no downstream nodes within the pipeline or in other words has no incoming edges in the directed acyclic graph.

lazy evaluation

The technique of delaying a computation until the result is required.

task

A task is an ordered tuple of objects added to the NuMap instance. It consists of:

  • a function, which will be evaluated on the input element-wise
  • an input (a list, tuple or any iterator object like an array)
  • a tuple of arguments e.g. (arg1, arg2, arg3)
  • a dict of keyword arguments i.e. {'arg1': value_1, 'arg2': value_2}

The optional arguments and keyworded arguments have to match the signature of the function. The task is iteratively split into evaluated calls in the following way:

result = func(element_from_iterable, *arguments, **keyworded_arguments)

inbox

The first argument of any worker function. The elements of the inbox correspond to the outputs of the upstream function in the Worker instance or to outputs of other Pipers. These outputs are defined by the pipeline topology. The contents of the inbox depend on a specific input item to the pipeline. All other arguments of a worker function are predetermined.

stride

The number of tasklets from one task submitted to the NuMap worker pool before switching to the next task. Controls the granularity of task interleaving. Should be at least equal to worker_num to keep all workers busy.

buffer

The maximum total number of pending results (across all tasks) in an NuMap instance. Limits memory consumption. Must be at least equal to stride.

consume

A Piper parameter that specifies how many consecutive input items from each upstream Piper are batched together into a single evaluation. Default is 1. See Produce / Spawn / Consume.

produce

A Piper parameter that specifies how many output items are generated from each single evaluation. The worker function must return a sequence of that length. Default is 1. See Produce / Spawn / Consume.

spawn

A Piper parameter that creates multiple implicit copies of the same Piper in the pipeline. Each copy processes a different slice of the upstream output. Default is 1. See Produce / Spawn / Consume.

repeat

A Piper parameter. When True and produce > 1, the single return value is repeated produce times instead of being iterated as a sequence.

track

A Piper parameter. When True, the NuMap stores all results for this Piper in memory. After the pipeline finishes, tracked results are available in Plumber.stats['pipers_tracked'].

timeout

The number of seconds to wait for a result before returning a PiperError(TimeoutError). Specified per-Piper. Should not be used on chained tasks within a shared NuMap.

branch

An attribute of a Piper used to sort topologically equivalent branches in the Dagger postorder. Downstream Pipers inherit the branch of their upstream Piper.

debug

A Piper parameter. When True, exceptions are raised immediately instead of being wrapped as PiperError. Useful during development, but will hang the interpreter after an error occurs.

sub-interpreter

A Python execution environment within the same OS process but with its own GIL (PEP 684). Available via worker_type='interpreter' on Python 3.14+. Provides true CPU parallelism without the forking overhead of separate processes.

@imports

A decorator for worker functions that attaches import statements. Required for functions sent to remote RPyC workers. Example: @imports(['numpy', 'scipy.stats']).

pipe

A directed connection between two Pipers in a Dagger. Data flows from upstream to downstream along pipes. A pipe is the reverse of a graph edge (edges represent dependency, pipes represent data flow).

postorder

A topological ordering of Pipers in a Dagger where all upstream Pipers appear before their downstream dependents. Used to determine the order of connection and startup.