Run inference with TensorFlow Lite in Python

To simplify development, we've made the Edge TPU compatible with the standard TensorFlow Lite API for inferencing—no additional APIs are required. However, TensorFlow Lite's default behavior is to execute each model on the CPU, so this page shows you how to make it run your model on the Edge TPU, using Python.

Note: If you want to use C++, instead read Run inference with TensorFlow Lite in C++.


To execute a TensorFlow Lite model, you must run it through an "interpreter." In the Python API, that's available with the Interpreter class.

By default, TensorFlow Lite executes each model on the CPU. Of course, this fails if your model is compiled for the Edge TPU, so you must instruct the interpreter to run your model using the Edge TPU. You can do that by specifying a TensorFlow Lite delegate for the Edge TPU. Then, whenever the interpreter encounters a graph node that's compiled for the Edge TPU, it sends that operation to the Edge TPU.

Essentially, using the TensorFlow Lite API with the Edge TPU requires that you change just one line of code: When you instantiate the Interpreter, you need to specify the Edge TPU runtime library as a delegate.

Load TensorFlow Lite and run an inference

To use TensorFlow Lite with the Edge TPU delegate, follow these steps:

  1. First, be sure you've set up your device with the latest software.

  2. Install the latest version of the TensorFlow Lite API.

    Although you can access the TensorFlow Lite API from the full tensorflow Python package, we recommend you instead use the tflite_runtime package. This package includes only the Interpreter class and load_delegate() function, which is all that's required to run inference, saving you a lot of disk space.

    To install the tflite_runtime package, follow the TensorFlow Lite Python quickstart.

  3. Load the tflite_runtime package.

    Open the Python file where you'll run inference with the Interpreter API. (For an example, see

    Instead of using import tensorflow as tf, load the tflite_runtime package like this:

    import tflite_runtime.interpreter as tflite
  4. Add the delegate when constructing the Interpreter.

    For example, your TensorFlow Lite code will ordinarily have a line like this:

    interpreter = tflite.Interpreter(model_path)

    So change it to this:

    interpreter = tflite.Interpreter(model_path,

    The file passed to load_delegate() is the Edge TPU runtime library, and you installed it when you first set up your device. The filename you must use here depends on your host operating system, as follows:

    • Linux:
    • macOS: libedgetpu.1.dylib
    • Windows: edgetpu.dll

That's it.

Now when you run a model that's compiled for the Edge TPU, TensorFlow Lite delegates the compiled portions of the graph to the Edge TPU.

If you started with the example, try passing it this model that's compiled for the Edge TPU. You can run that model using the same labels file and test image from the label_image README.

Next steps

Checkout our other code examples using the TensorFlow Lite API.

If you have multiple Edge TPUs, read how to run multiple models with multiple Edge TPUs.

Or learn more about how to create models compatible with the Edge TPU.