Run multiple models with multiple Edge TPUs

The Edge TPU includes a small amount of RAM that's used to store the model's parameter data locally, enabling faster inference speed compared to fetching the data from external memory. Typically, this means performance is best when running just one model per Edge TPU, because running a second model requires swapping the model's parameter data in RAM, which slows down the entire pipeline. One solution is to simply run each model on a different Edge TPU, as described on this page.

Alternatively, you might reduce the overhead cost of swapping parameter data by co-compiling your models. Co-compiling allows the Edge TPU to store the parameter data for multiple models in RAM together, which means it typically works well only for small models. To learn more about this option, read about parameter data caching and co-compiling. Otherwise, keep reading here if you want to distribute multiple models across multiple Edge TPUs.

Performance considerations

Before you add more Edge TPUs in your system, consider the following possible performance issues:

Python does not support real multi-threading for CPU-bounded operations (read about the Python global interpreter lock (GIL)). However, we have optimized the Edge TPU Python API (but not TensorFlow Lite Python API) to work within Python’s multi-threading environment for all Edge TPU operations—they are IO-bounded, which can provide performance improvements. But beware that CPU-bounded operations such as image downscaling will probably encounter a performance impact when you run multiple models because these operations cannot be multi-threaded in Python.
When using multiple USB Accelerators, your inference speed will eventually be bottlenecked by the host USB bus’s speed, especially when running large models.
If you connect multiple USB Accelerators through a USB hub, be sure that each USB port can provide at least 500mA when using the reduced operating frequency or 900mA when using the maximum frequency (refer to the USB Accelerator performance settings). Otherwise, the device might not be able to draw enough power to function properly.

If you use an external USB hub, connect the USB Accelerator to the primary ports only. Some USB hubs include sub-hubs with secondary ports that are not compatible—our API cannot establish an Edge TPU context on these ports. For example, if you type lsusb -t, you should see ports printed as shown below. The first 2 USB ports (usbfs) will work fine but the last one will not.

/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/7p, 5000M
| Port 3: Dev 36, If 0, Class=Hub, Driver=hub/4p, 5000M
    | Port 1: Dev 51, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M  # WORKS
    | Port 2: Dev 40, If 0, Class=Hub, Driver=hub/4p, 5000M
        | Port 1: Dev 41, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M  # WORKS
        |__ Port 2: Dev 39, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M  # DOESN'T WORK

Using the PyCoral API

If you have multiple Edge TPUs and you want to run a specific model on each one, you must specify which device to use with each model. When using the PyCoral Python API, you just need to specify the device argument in make_interpreter().

Caution: If you don't specify an Edge TPU when multiple are available, the same Edge TPU will probably be used for all models, which seriously affects your performance, as described in the introduction.

The device argument takes a string to indicate the device index position or the device type (USB or PCIe), or a combination of both. For example, this is how you can ensure each Interpreter is using a different Edge TPU (regardless of type):

# Use the first enumerated Edge TPU
interpreter_1 = make_interpreter(model_1_path, device=':0')
# Use the second enumerated Edge TPU
interpreter_2 = make_interpreter(model_2_path, device=':1')

Or if you want to specify USB vs PCIe device types, you can do the following:

# Use the first USB-based Edge TPU
interpreter_usb1 = make_interpreter(model_1_path, device='usb:0')
# Use the first PCIe-based Edge TPU
interpreter_pcie1 = make_interpreter(model_2_path, device='pci:0')

For more details, see the make_interpreter() documentation. Also check out the two_models_inference.py example.

Using the TensorFlow Lite Python API

If you have multiple Edge TPUs and you want to run a specific model on each one, you must specify which device to use with each model. When using the TensorFlow Lite Python API, you can do so with the options argument in load_delegate().

The options argument takes a dictionary and you need just one entry, "device", to specify the Edge TPU you want to use. Accepted values are the following:

":<N>" : Use N-th Edge TPU
"usb" : Use any USB Edge TPU
"usb:<N>" : Use N-th USB Edge TPU
"pci" : Use any PCIe Edge TPU
"pci:<N>" : Use N-th PCIe Edge TPU

For example, this is how you can ensure each Interpreter is using a different Edge TPU (regardless of type):

# Use the first enumerated Edge TPU
interpreter_1 = Interpreter(model_1_path,
  experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": ":0"})])
# Use the second enumerated Edge TPU
interpreter_2 = Interpreter(model_2_path,
  experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": ":1"})])

Or if you want to specify USB vs PCIe device types, you can do the following:

# Use the first USB-based Edge TPU
interpreter_usb1 = Interpreter(model_1_path,
  experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": "usb:0"})])
# Use the first PCIe-based Edge TPU
interpreter_pci2 = Interpreter(model_2_path,
  experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": "pci:0"})])

Note: If you're not running Linux, your delegate filename (libedgetpu.so.1) will be different (see how to add the delegate).

Using the TensorFlow Lite C++ API

If you're using the TensorFlow Lite C++ API to run inference and you have multiple Edge TPUs, you can specify which Edge TPU each Interpreter should use when you create the EdgeTpuContext via EdgeTpuManager::OpenDevice().

The OpenDevice() method includes a parameter for device_type, which accepts one of two values:

DeviceType.kApexUsb: Use the default USB-connected Edge TPU.
DeviceType.kApexPci: Use the default PCIe-connected Edge TPU.

If you have multiple Edge TPUs of the same type, then you must specify the second parameter, device_path. To get the specific device path for each available Edge TPU, call EdgeTpuManager.EnumerateEdgeTpu().

For an example, see two_models_two_tpus_threaded.cc.

Using the Edge TPU Python API (deprecated)

The Edge TPU Python API is deprecated. Instead try the PyCoral API.

If you're using the Edge TPU Python API to run inference and you have multiple Edge TPUs, the Edge TPU API automatically assigns each inference engine (such as ClassificationEngine and DetectionEngine) to a different Edge TPU. So you don't need to write any extra code if you have an equal number of inference engines and Edge TPUs—unlike the TensorFlow Lite API above.

For example, if you have two Edge TPUs and two models, you can run each model on separate Edge TPUs by simply creating the inference engines as usual:

# Each engine is automatically assigned to a different Edge TPU
engine_a = ClassificationEngine(classification_model)
engine_b = DetectionEngine(detection_model)

Then they'll automatically run on separate Edge TPUs.

If you have just one Edge TPU, then this code still works and they both use the same Edge TPU.

However, if you have multiple Edge TPUs (N) and you have N + 1 (or more) models, then you must specify which Edge TPU to use for each additional inference engine. Otherwise, you'll receive an error that says your engine does not map to an Edge TPU device.

For example, if you have two Edge TPUs and three models, you must set the third engine to run on the same Edge TPU as one of the others (you decide which). The following code shows how you can do this for engine_c by specifying the device_path argument to be the same device used by engine_b:

# The second engine is purposely assigned to the same Edge TPU as the first
engine_a = ClassificationEngine(classification_model)
engine_b = DetectionEngine(detection_model)
engine_c = DetectionEngine(other_detection_model, engine_b.device_path())

For example code, see two_models_inference.py.

Note: All Edge TPUs connected over USB are treated equally; there's no prioritization when distributing the models. But if you attach a USB Accelerator to a Dev Board, the system always prefers the on-board (PCIe) Edge TPU before using the USB devices.

Is this content helpful?