Known Issues¶

PaPy in the Interactive Interpreter¶

Parallel features of PaPy do not work in the interactive interpreter. This is a limitation of the Python multiprocessing module.

This means that PaPy workflows can be created, manipulated, tested and run from within the interactive interpreter freely as long as they do not use the parallel or remote evaluation.

Code snippets, examples and use cases are not meant to be typed into the interactive interpreter console. They should be run from the command line:

python example_file.py

The reason for this is that functions defined in the interactive interpreter will not work (and will hang Python) if passed into NuMap instances! A Python function can only be communicated to multiple processes if it can be serialized i.e. pickled. This is not possible if the function is in the same namespace as a child process (created using the multiprocessing library).

Tip

Always wrap parallel code in if __name__ == '__main__': to avoid issues with multiprocessing on Windows and macOS (spawn start method).

Object Picklability¶

Objects are submitted to the worker-threads by means of queues, to worker-processes by pipes and to remote processes using sockets. This requires serialization, which is internally done by the pickle module. Additionally RPyC uses its own 'brine' for serialization. The submitted objects include the functions, data, arguments and keyworded arguments all of which need to be picklable!

Worker-methods are impossible¶

Class instances (i.e. Worker instances) are picklable only if all of their attributes are. On the other hand class instance methods are not picklable (because they are not top-level module functions) therefore class methods (either static, class, bound or unbound) will not work for a parallel piper.

Lambdas and closures¶

Lambda functions and closures (functions defined inside other functions) cannot be pickled. They will work with worker_type='thread' but will fail with worker_type='process'. Use top-level module functions instead:

# This will NOT work with process workers:
w = Worker(lambda inbox: inbox[0] * 2)

# This WILL work:
def double(inbox):
    return inbox[0] * 2

w = Worker(double)

File-handles make remotely no sense¶

Function arguments should be picklable, but file-handles are not. It is recommended that output pipers store data persistently, therefore output workers should be run locally and not use a parallel NuMap, circumventing the requirement for picklable attributes.

The `@imports` decorator¶

When using remote workers via RPyC, functions are injected into the remote Python process. The remote process has its own namespace and does not share the local process's imports. Use the @imports decorator to attach required import statements:

from numap import imports

@imports(['numpy', 'os.path'])
def process(inbox):
    arr = numpy.array(inbox[0])
    return os.path.join('/output', str(arr.mean()))

Without @imports, remote workers will raise NameError for any module not available in the remote namespace.

Fallback imports¶

If a module might not be available, use comma-separated alternatives:

@imports(['ujson,json'])  # try ujson, fall back to stdlib json
def parse(inbox):
    return ujson.loads(inbox[0])

The `forgive` parameter¶

Set forgive=True to emit a warning instead of raising ImportError when a module is not available:

@imports(['optional_module'], forgive=True)
def func(inbox):
    ...

Buffer deadlocks¶

A pipeline can deadlock if the NuMap buffer fills up and no results are being consumed. Common scenarios:

Multiple output pipers, only one consumed¶

If your pipeline has two output Pipers and you only call next() on one, the other's result queue fills the buffer and blocks all further task submission.

Fix: always consume all output Pipers, or use Plumber.run() which automatically consumes all outputs.

Buffer smaller than stride¶

If buffer < stride, the NuMap cannot submit a full stride of tasklets before running out of buffer space. This causes a deadlock because the stride must complete before results can be returned.

Fix: use the default buffer (auto-calculated) or ensure buffer >= stride.

Timeouts on chained tasks¶

If task B depends on the output of task A, and task A has a timeout, a timed-out result from A leaves task B waiting forever for input that will never arrive.

Fix: do not specify timeouts on upstream tasks within a shared NuMap.

Sub-interpreter limitations (Python 3.14+)¶

Functions must be importable¶

interp.call() requires that functions can be resolved by module and name. Functions defined dynamically, in __main__, or as methods on class instances may trigger NotShareableError. PaPy works around this for Worker instances by unwrapping the task chain, but arbitrary callables may fall back to main-interpreter execution (with a RuntimeWarning).

No remote execution¶

worker_type='interpreter' is incompatible with worker_remote. Remote execution requires RPyC, which operates at the process level.

API maturity¶

The concurrent.interpreters module is new in Python 3.14. Its API may evolve in future Python versions. PaPy guards interpreter support behind HASINTERP and ImportError to ensure graceful degradation.