Dictionary of Terms¶
A dictionary of terms used within the documentation.
map¶
Higher-order map function. A function which evaluates another function on all elements of the input collection.
NuMap¶
A parallel implementation of a multi-task map function, which is used within PaPy. It uses a pool of worker-threads or worker-processes and evaluates functions in parallel either locally or remotely.
Worker function¶
A function with a standardized input written to be used by a Worker class
instance. All processing of a PaPy pipeline has to be coded as worker
functions.
worker process / thread¶
A thread or process inside an NuMap instance evaluating a tasklet remotely
or locally.
Worker¶
An object-oriented wrapper for worker functions, it is roughly equivalent to a "function with partially applied arguments".
Piper¶
An object-oriented wrapper for Worker instances, corresponds to "worker with
defined mode of evaluation".
Dagger¶
A directed acyclic graph (DAG) to store and connect Piper instances.
Plumber¶
A wrapper for the Dagger designed to run and interact with a running
pipeline.
input stream¶
The input stream is the data that enters a PaPy pipeline. The data is
assumed to be a collection of items expressed as a Python iterator (or any
object which has the __next__ method).
Any sequence (e.g. a list or a tuple) can be made into an iterator using
the Python built-in iter function e.g:
Files are by default line-iterators i.e.:
sample_file = open('sample_file.txt')
next(sample_file) # returns the first line
next(sample_file) # returns the second line
output stream¶
Input item saved (to disk) by an output Piper. By default the output
Piper should return a None for every input item, but save the result
persistently (somehow/somewhere).
item¶
A single element of the data stream.
input Piper¶
A Piper, which is connected to an input stream (or multiple input streams) is
an input Piper. Such a Piper corresponds to a node in the graph which has
no upstream nodes within the PaPy workflow or in other words has no
outgoing edges in the directed acyclic graph.
output Piper¶
A Piper, which generates the output stream is an output Piper. A
PaPy workflow might have multiple output Pipers in different places of the
pipeline. An output Piper corresponds to a node in the graph which has no
downstream nodes within the pipeline or in other words has no incoming edges in
the directed acyclic graph.
lazy evaluation¶
The technique of delaying a computation until the result is required.
task¶
A task is an ordered tuple of objects added to the NuMap instance. It
consists of:
- a function, which will be evaluated on the input element-wise
- an input (a
list,tupleor any iterator object like anarray) - a
tupleof arguments e.g.(arg1, arg2, arg3) - a
dictof keyword arguments i.e.{'arg1': value_1, 'arg2': value_2}
The optional arguments and keyworded arguments have to match the signature of the function. The task is iteratively split into evaluated calls in the following way:
inbox¶
The first argument of any worker function. The elements of the inbox
correspond to the outputs of the upstream function in the Worker instance or
to outputs of other Pipers. These outputs are defined by the pipeline
topology. The contents of the inbox depend on a specific input item to the
pipeline. All other arguments of a worker function are predetermined.
stride¶
The number of tasklets from one task submitted to the NuMap worker pool
before switching to the next task. Controls the granularity of task
interleaving. Should be at least equal to worker_num to keep all workers
busy.
buffer¶
The maximum total number of pending results (across all tasks) in an NuMap
instance. Limits memory consumption. Must be at least equal to stride.
consume¶
A Piper parameter that specifies how many consecutive input items from
each upstream Piper are batched together into a single evaluation. Default
is 1. See Produce / Spawn / Consume.
produce¶
A Piper parameter that specifies how many output items are generated from
each single evaluation. The worker function must return a sequence of that
length. Default is 1. See Produce / Spawn / Consume.
spawn¶
A Piper parameter that creates multiple implicit copies of the same
Piper in the pipeline. Each copy processes a different slice of the
upstream output. Default is 1. See Produce / Spawn / Consume.
repeat¶
A Piper parameter. When True and produce > 1, the single return value
is repeated produce times instead of being iterated as a sequence.
track¶
A Piper parameter. When True, the NuMap stores all results for this
Piper in memory. After the pipeline finishes, tracked results are available
in Plumber.stats['pipers_tracked'].
timeout¶
The number of seconds to wait for a result before returning a
PiperError(TimeoutError). Specified per-Piper. Should not be used on
chained tasks within a shared NuMap.
branch¶
An attribute of a Piper used to sort topologically equivalent branches in
the Dagger postorder. Downstream Pipers inherit the branch of their
upstream Piper.
debug¶
A Piper parameter. When True, exceptions are raised immediately instead
of being wrapped as PiperError. Useful during development, but will hang
the interpreter after an error occurs.
sub-interpreter¶
A Python execution environment within the same OS process but with its own
GIL (PEP 684). Available via worker_type='interpreter' on Python 3.14+.
Provides true CPU parallelism without the forking overhead of separate
processes.
@imports¶
A decorator for worker functions that attaches import statements. Required
for functions sent to remote RPyC workers. Example:
@imports(['numpy', 'scipy.stats']).
pipe¶
A directed connection between two Pipers in a Dagger. Data flows from
upstream to downstream along pipes. A pipe is the reverse of a graph edge
(edges represent dependency, pipes represent data flow).
postorder¶
A topological ordering of Pipers in a Dagger where all upstream
Pipers appear before their downstream dependents. Used to determine the
order of connection and startup.