Run inference with TensorFlow Lite in Python

To simplify development, we've made the Edge TPU compatible with the standard TensorFlow Lite API for inferencing—no additional APIs are required. However, TensorFlow Lite's default behavior is to execute each model on the CPU, so this page shows you how to make it run your model on the Edge TPU, using Python.

Note: If you want to use C++, instead read Run inference with TensorFlow Lite in C++.

Overview

To execute a TensorFlow Lite model, you must run it through an "interpreter." In the Python API, that's available with the Interpreter class.

By default, TensorFlow Lite executes each model on the CPU. Of course, this fails if your model is compiled for the Edge TPU, so you must instruct the interpreter to run your model using the Edge TPU. You can do that by specifying a TensorFlow Lite delegate for the Edge TPU. Then, whenever the interpreter encounters a graph node that's compiled for the Edge TPU, it sends that operation to the Edge TPU.

Essentially, using the TensorFlow Lite API with the Edge TPU requires that you change just one line of code: When you instantiate the Interpreter, you need to specify libedgetpu.so as a delegate.

Load TensorFlow Lite and run an inference

To use TensorFlow Lite with the Edge TPU delegate, follow these steps:

  1. First, be sure you've set up your device with the latest software.

  2. Install the latest version of the TensorFlow Lite API.

    Although you can access the TensorFlow Lite API from the full tensorflow Python package, we recommend you instead use the tflite_runtime package. This package includes only the Interpreter class and load_delegate() function, which is all that's required to run inference, saving you a lot of disk space.

    To install the tflite_runtime package, follow the TensorFlow Lite Python quickstart.

  3. Load the tflite_runtime package.

    Open the Python file where you'll run inference with the Interpreter API. (For an example, see label_image.py).

    Instead of using import tensorflow as tf, load the tflite_runtime package like this:

    import tflite_runtime.interpreter as tflite
    
  4. Add the delegate when constructing the Interpreter.

    For example, your TensorFlow Lite code will ordinarily have a line like this:

    interpreter = tflite.Interpreter(model_path)
    

    So change it to this:

    interpreter = tflite.Interpreter(model_path,
      experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
    

    Note: The libedgetpu.so.1 file was installed during device setup.

That's it.

Now when you run a model that's compiled for the Edge TPU, TensorFlow Lite delegates the compiled portions of the graph to the Edge TPU.

If you started with the label_image.py example, try passing it this model that's compiled for the Edge TPU.

Next steps

Checkout our other code examples using the TensorFlow Lite API.

If you have multiple Edge TPUs, read how to run multiple models with multiple Edge TPUs.

Or learn more about how to create models compatible with the Edge TPU.