Run multiple models with multiple Edge TPUs
The Edge TPU includes a small amount of RAM that's used to store the model's parameter data locally, enabling faster inference speed compared to fetching the data from external memory. Typically, this means performance is best when running just one model per Edge TPU, because running a second model requires swapping the model's parameter data in RAM, which slows down the entire pipeline. One solution is to simply run each model on a different Edge TPU, as described on this page.
Alternatively, you might reduce the overhead cost of swapping parameter data by co-compiling your models. Co-compiling allows the Edge TPU to store the parameter data for multiple models in RAM together, which means it typically works well only for small models. To learn more about this option, read about parameter data caching and co-compiling. Otherwise, keep reading here if you want to distribute multiple models across multiple Edge TPUs.
Performance considerations
Before you add more Edge TPUs in your system, consider the following possible performance issues:
-
Python does not support real multi-threading for CPU-bounded operations (read about the Python global interpreter lock (GIL)). However, we have optimized the Edge TPU Python API (but not TensorFlow Lite Python API) to work within Python’s multi-threading environment for all Edge TPU operations—they are IO-bounded, which can provide performance improvements. But beware that CPU-bounded operations such as image downscaling will probably encounter a performance impact when you run multiple models because these operations cannot be multi-threaded in Python.
-
When using multiple USB Accelerators, your inference speed will eventually be bottlenecked by the host USB bus’s speed, especially when running large models.
-
If you connect multiple USB Accelerators through a USB hub, be sure that each USB port can provide at least 500mA when using the reduced operating frequency or 900mA when using the maximum frequency (refer to the USB Accelerator performance settings). Otherwise, the device might not be able to draw enough power to function properly.
-
If you use an external USB hub, connect the USB Accelerator to the primary ports only. Some USB hubs include sub-hubs with secondary ports that are not compatible—our API cannot establish an Edge TPU context on these ports. For example, if you type
lsusb -t
, you should see ports printed as shown below. The first 2 USB ports (usbfs
) will work fine but the last one will not./: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/7p, 5000M | Port 3: Dev 36, If 0, Class=Hub, Driver=hub/4p, 5000M | Port 1: Dev 51, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M # WORKS | Port 2: Dev 40, If 0, Class=Hub, Driver=hub/4p, 5000M | Port 1: Dev 41, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M # WORKS |__ Port 2: Dev 39, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M # DOESN'T WORK
Using the PyCoral API
If you have multiple Edge TPUs and you want to run a specific model on each one, you must specify
which device to use with each model. When using the PyCoral Python API, you
just need to specify the device
argument in
make_interpreter()
.
The device
argument takes a string to indicate the device index position or the device type (USB
or PCIe), or a combination of both. For example, this is how you can ensure each
Interpreter
is using a different
Edge TPU (regardless of type):
# Use the first enumerated Edge TPU
interpreter_1 = make_interpreter(model_1_path, device=':0')
# Use the second enumerated Edge TPU
interpreter_2 = make_interpreter(model_2_path, device=':1')
Or if you want to specify USB vs PCIe device types, you can do the following:
# Use the first USB-based Edge TPU
interpreter_usb1 = make_interpreter(model_1_path, device='usb:0')
# Use the first PCIe-based Edge TPU
interpreter_pcie1 = make_interpreter(model_2_path, device='pci:0')
For more details, see the
make_interpreter()
documentation. Also check out the two_models_inference.py
example.
Using the TensorFlow Lite Python API
If you have multiple Edge TPUs and you want to run a specific model on each one, you must specify
which device to use with each model. When using the TensorFlow Lite Python
API, you can do so with
the options
argument in load_delegate()
.
The options
argument takes a dictionary and you need just one entry, "device"
, to specify the
Edge TPU you want to use. Accepted values are the following:
":<N>"
: Use N-th Edge TPU"usb"
: Use any USB Edge TPU"usb:<N>"
: Use N-th USB Edge TPU"pci"
: Use any PCIe Edge TPU"pci:<N>"
: Use N-th PCIe Edge TPU
For example, this is how you can ensure each
Interpreter
is using a different
Edge TPU (regardless of type):
# Use the first enumerated Edge TPU
interpreter_1 = Interpreter(model_1_path,
experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": ":0"})])
# Use the second enumerated Edge TPU
interpreter_2 = Interpreter(model_2_path,
experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": ":1"})])
Or if you want to specify USB vs PCIe device types, you can do the following:
# Use the first USB-based Edge TPU
interpreter_usb1 = Interpreter(model_1_path,
experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": "usb:0"})])
# Use the first PCIe-based Edge TPU
interpreter_pci2 = Interpreter(model_2_path,
experimental_delegates=[load_delegate('libedgetpu.so.1', options={"device": "pci:0"})])
Note: If you're not running Linux, your delegate filename
(libedgetpu.so.1
) will be different (see how to
add the delegate).
Using the TensorFlow Lite C++ API
If you're using the TensorFlow Lite C++ API to run inference and
you have multiple Edge TPUs, you can specify which Edge TPU each Interpreter
should use when
you create the EdgeTpuContext
via EdgeTpuManager::OpenDevice()
.
The OpenDevice()
method includes a parameter for device_type
, which accepts one of
two values:
DeviceType.kApexUsb
: Use the default USB-connected Edge TPU.DeviceType.kApexPci
: Use the default PCIe-connected Edge TPU.
If you have multiple Edge TPUs of the same type, then you must specify the second parameter,
device_path
. To get the specific device path for each available Edge TPU, call
EdgeTpuManager.EnumerateEdgeTpu()
.
For an example, see
two_models_two_tpus_threaded.cc
.
Using the Edge TPU Python API (deprecated)
If you're using the Edge TPU Python API to run inference and you have
multiple Edge TPUs, the Edge TPU API automatically assigns each inference engine (such as
ClassificationEngine
and DetectionEngine
) to a different Edge TPU. So you don't need to write
any extra code if you have an equal number of inference engines and Edge TPUs—unlike the TensorFlow
Lite API above.
For example, if you have two Edge TPUs and two models, you can run each model on separate Edge TPUs by simply creating the inference engines as usual:
# Each engine is automatically assigned to a different Edge TPU
engine_a = ClassificationEngine(classification_model)
engine_b = DetectionEngine(detection_model)
Then they'll automatically run on separate Edge TPUs.
If you have just one Edge TPU, then this code still works and they both use the same Edge TPU.
However, if you have multiple Edge TPUs (N) and you have N + 1 (or more) models, then you must specify which Edge TPU to use for each additional inference engine. Otherwise, you'll receive an error that says your engine does not map to an Edge TPU device.
For example, if you have two Edge TPUs and three models, you must set the third engine to
run on the same Edge TPU as one of the others (you decide which). The following code shows how you
can do this for engine_c
by specifying the device_path
argument to be the same device used
by engine_b
:
# The second engine is purposely assigned to the same Edge TPU as the first
engine_a = ClassificationEngine(classification_model)
engine_b = DetectionEngine(detection_model)
engine_c = DetectionEngine(other_detection_model, engine_b.device_path())
For example code, see two_models_inference.py
.
Is this content helpful?