pycoral.pipeline

pycoral.pipeline.pipelined_model_runner

The pipeline API allows you to run a segmented model across multiple Edge TPUs.

For more information, see Pipeline a model with multiple Edge TPUs.

class pycoral.pipeline.pipelined_model_runner.PipelinedModelRunner(interpreters)[source]¶

Manages the model pipeline.

To create an instance:

interpreter_a = tflite.Interpreter(model_path=model_segment_a,
                                   experimental_delegates=delegate_a)
interpreter_a.allocate_tensors()
interpreter_b = tflite.Interpreter(model_path=model_segment_b,
                                   experimental_delegates=delegate_b)
interpreter_b.allocate_tensors()
interpreters = [interpreter_a, interpreter_b]
runner = PipelinedModelRunner(interpreters)

Be sure you first call allocate_tensors() on each interpreter.

Parameters: interpreters – A list of tf.lite.Interpreter objects, one for each segment in the pipeline.

interpreters()[source]¶: Returns list of interpreters that constructed PipelinedModelRunner.

pop()[source]¶

Returns a single inference result.

This function blocks the calling thread until a result is returned.

Returns: Dictionary with key of type string, and value of type numpy.array representing the model’s output tensors, where keys are the tensor names. Returns None when a push() receives an empty dict input, indicating there are no more output tensors available.
Raises: RuntimeError – error during retrieving pipelined model inference results.

push(input_tensors)[source]¶

Pushes input tensors to trigger inference.

Pushing an empty dict is allowed, which signals the class that no more inputs will be added (the function will return false if inputs were pushed after this special push). This special push allows the pop() consumer to properly drain unconsumed output tensors.

Caller will be blocked if the current input queue size is greater than the queue size max (use set_input_queue_size()). By default, input queue size threshold is unlimited, in this case, call to push() is non-blocking.

Parameters: input_tensors – A dictionary with key of type string, and value of type numpy.array representing the model’s input tensors, where keys are the tensor names.
Raises: RuntimeError – error during pushing pipelined model inference request.

set_input_queue_size(size)[source]¶

Sets the maximum number of inputs that may be queued for inference.

By default, input queue size is unlimited.

Note: It’s OK to change the queue size max when PipelinedModelRunner is active. If the new max is smaller than current queue size, pushes to the queue are blocked until the current queue size drops below the new max.

Parameters: size (int) – The input queue size max

set_output_queue_size(size)[source]¶

Sets the maximum number of outputs that may be unconsumed.

By default, output queue size is unlimited.

Parameters: size (int) – The output queue size max

API version 2.0

Is this content helpful?