Known Issues¶
PaPy in the Interactive Interpreter¶
Parallel features of PaPy do not work in the interactive interpreter.
This is a limitation of the Python multiprocessing module.
This means that PaPy workflows can be created, manipulated, tested and run from within the interactive interpreter freely as long as they do not use the parallel or remote evaluation.
Code snippets, examples and use cases are not meant to be typed into the interactive interpreter console. They should be run from the command line:
The reason for this is that functions defined in the interactive interpreter
will not work (and will hang Python) if passed into NuMap instances! A
Python function can only be communicated to multiple processes if it can be
serialized i.e. pickled. This is not possible if the function is in the same
namespace as a child process (created using the multiprocessing library).
Tip
Always wrap parallel code in if __name__ == '__main__': to avoid
issues with multiprocessing on Windows and macOS (spawn start method).
Object Picklability¶
Objects are submitted to the worker-threads by means of queues, to
worker-processes by pipes and to remote processes using sockets. This requires
serialization, which is internally done by the pickle module. Additionally
RPyC uses its own 'brine' for serialization. The submitted objects include the
functions, data, arguments and keyworded arguments all of which need to be
picklable!
Worker-methods are impossible¶
Class instances (i.e. Worker instances) are picklable only if all of their attributes are. On the other hand class instance methods are not picklable (because they are not top-level module functions) therefore class methods (either static, class, bound or unbound) will not work for a parallel piper.
Lambdas and closures¶
Lambda functions and closures (functions defined inside other functions)
cannot be pickled. They will work with worker_type='thread' but will fail
with worker_type='process'. Use top-level module functions instead:
# This will NOT work with process workers:
w = Worker(lambda inbox: inbox[0] * 2)
# This WILL work:
def double(inbox):
return inbox[0] * 2
w = Worker(double)
File-handles make remotely no sense¶
Function arguments should be picklable, but file-handles are not. It is
recommended that output pipers store data persistently, therefore output workers
should be run locally and not use a parallel NuMap, circumventing the
requirement for picklable attributes.
The @imports decorator¶
When using remote workers via RPyC, functions are injected into the remote
Python process. The remote process has its own namespace and does not share
the local process's imports. Use the @imports decorator to attach required
import statements:
from numap import imports
@imports(['numpy', 'os.path'])
def process(inbox):
arr = numpy.array(inbox[0])
return os.path.join('/output', str(arr.mean()))
Without @imports, remote workers will raise NameError for any module not
available in the remote namespace.
Fallback imports¶
If a module might not be available, use comma-separated alternatives:
@imports(['ujson,json']) # try ujson, fall back to stdlib json
def parse(inbox):
return ujson.loads(inbox[0])
The forgive parameter¶
Set forgive=True to emit a warning instead of raising ImportError when
a module is not available:
Buffer deadlocks¶
A pipeline can deadlock if the NuMap buffer fills up and no results are
being consumed. Common scenarios:
Multiple output pipers, only one consumed¶
If your pipeline has two output Pipers and you only call next() on one,
the other's result queue fills the buffer and blocks all further task
submission.
Fix: always consume all output Pipers, or use Plumber.run() which
automatically consumes all outputs.
Buffer smaller than stride¶
If buffer < stride, the NuMap cannot submit a full stride of tasklets
before running out of buffer space. This causes a deadlock because the
stride must complete before results can be returned.
Fix: use the default buffer (auto-calculated) or ensure buffer >= stride.
Timeouts on chained tasks¶
If task B depends on the output of task A, and task A has a timeout, a timed-out result from A leaves task B waiting forever for input that will never arrive.
Fix: do not specify timeouts on upstream tasks within a shared NuMap.
Sub-interpreter limitations (Python 3.14+)¶
Functions must be importable¶
interp.call() requires that functions can be resolved by module and name.
Functions defined dynamically, in __main__, or as methods on class instances
may trigger NotShareableError. PaPy works around this for Worker instances
by unwrapping the task chain, but arbitrary callables may fall back to
main-interpreter execution (with a RuntimeWarning).
No remote execution¶
worker_type='interpreter' is incompatible with worker_remote. Remote
execution requires RPyC, which operates at the process level.
API maturity¶
The concurrent.interpreters module is new in Python 3.14. Its API may
evolve in future Python versions. PaPy guards interpreter support behind
HASINTERP and ImportError to ensure graceful degradation.