Edge TPU inferencing overview

If you already know how to run an inference with TensorFlow Lite, then running your model on the Edge TPU requires only a few lines of new code in Python or C/C++.

You can run your model on the Edge TPU directly from the TensorFlow Lite APIs. When you run an inference using a model that's compiled for the Edge TPU, TensorFlow Lite delegates the compiled operations to the Edge TPU instead of running them on the host CPU (once you've made the appropriate code changes).

Note: The Edge TPU is compatible with TensorFlow Lite models only. For details about how to create a model for the Edge TPU, read TensorFlow models on the Edge TPU.

With a compatible model in-hand, you can perform inferencing on the Edge TPU using one of three API options. Each API and the software required is described below and illustrated in figure 1.

Figure 1. The three options for inferencing and the corresponding software dependencies
  1. Use the TensorFlow Lite Python API:

    This is the standard Python API for running TensorFlow Lite models. With just a few lines of code, you can make existing TensorFlow Lite code run on the Edge TPU.

    All you need is the TensorFlow Lite Python API and the Edge TPU runtime.

    For details, read Run inference with TensorFlow Lite in Python.

  2. Use the Edge TPU Python API:

    This is a Python library we built on top of the TensorFlow Lite C++ API. It's helpful if you don't have experience with the TensorFlow Lite API and you just want to perform image classification or object detection, because it abstracts-away the code required to prepare the input tensors and parse the results. It also provides unique APIs to perform accelerated transfer-learning for classification models on the Edge TPU.

    All you need is the Edge TPU Python API and the Edge TPU runtime.

    For details, read the Edge TPU Python API overview.

  3. Use C or C++ APIs:

    This uses the standard C/C++ API for running TensorFlow Lite models. With just a few lines of code, you can make existing TensorFlow Lite code run on the Edge TPU.

    For this, you need the Edge TPU runtime API (edgetpu_c.h or edgetpu.h), the Edge TPU runtime, plus the compiled TensorFlow Lite C++ API.

    For details, read Run inference with TensorFlow Lite in C++.

    Optionally, you can also use APIs from our Coral C++ API, which adds features such as pipelining a model with multiple Edge TPUs.