Coral is a hardware and software platform for building intelligent devices with fast neural network inferencing.
At the heart of our devices is the Edge TPU coprocessor. This is a small ASIC built by Google that's specially-designed to execute state-of-the-art neural networks at high speed, with a low power cost.
Jump to a section:
The Edge TPU is capable of performing 4 trillion operations (tera-operations) per second (TOPS), using 0.5 watts for each TOPS (2 TOPS per watt).
The following chart compares the inference time for several popular vision models in TensorFlow Lite format, when executed either on a modern embedded CPU or on the Coral Dev Board (lower is better).
For more numbers like the ones above, see our benchmarks page.
As a part of Google Research, our team is working with other machine learning teams to help build the next generation of neural networks for low-power devices. We're constantly making progress to optimize models for embedded devices, and designing new neural network architectures that are specially-designed to provide fast inferencing speeds in a small package.
For example, the new EfficientNet-EdgeTPU model provides new levels of performance that balance low latency with high accuracy on the Edge TPU. It comes in three sizes (small, medium, and large), offering increasing levels of accuracy with trade-offs in inference latency.
Flexibility and scalability
We offer the Edge TPU in multiple form factors to suit various prototyping and production environments—from embedded systems deployed in the field, to network systems operating on-premise.
For example, our USB Accelerator simply plugs into a desktop, laptop, or embedded system such as a Raspberry Pi so you can quickly prototype your application. From there, you can scale to production systems by adding our Mini PCIe or M.2 Accelerator to your hardware system.
If you're looking for a fully-integrated system, you can get started with our Dev Board—a single-board computer based on NXP's i.MX 8M system-on-chip. Then you can scale to production by connecting our System-on-Module (included on the Dev Board) to your own baseboard.
The Edge TPU supports a variety of model architectures built with TensorFlow, including models built with Keras.
Our workflow to create models for Coral is based on the TensorFlow framework. No additional APIs are required to build or run your model. You only need a small runtime package, which delegates the execution of your model to the Edge TPU.
To build a compatible model, you need to convert a trained model into the TensorFlow Lite format and quantize all parameter data (you can use either quantization-aware training or full integer post-training quantization). Then pass the model to our Edge TPU Compiler and it's ready to execute using the TensorFlow Lite API.
Read more about how to create a model for the Edge TPU.
We have verified many popular model architectures for image classification, object detection, semantic segmentation, pose estimation, keyphrase detection, and more to come.
If you want to try your application using one of these models, you can download a pre-trained version of our models.
We've optimized Mendel for embedded systems by making it very lightweight. So although you can connect a keyboard and monitor to get a shell interface, you won't find any desktop apps. You will find a familiar Linux interface and a Debian packaging system, providing access the extensive Debian software archives and a huge range of customizations.
Mendel also comes bundled with the tools you need to build your headless ML applications, including standard Python and C++ libraries, the Edge TPU API, and the Edge TPU runtime. Additionally, we include a tool called MDT (Mendel Development Tool) that makes it easy to connect securely (using SSH/mDNS), transfer files, and run other commands from a remote computer.
The following illustration provides a basic overview of the Mendel system and software stack.
For applications that run multiple models, you can execute your models concurrently on a single Edge TPU by co-compiling the models so they share the Edge TPU scratchpad memory. Or, if you have multiple Edge TPUs in your system, you can increase performance by assigning each model to a specific Edge TPU and run them in parallel.
Learn more about running multiple models.
For applications that require very fast throughput or large models, pipelining your model allows you to execute different segments of the same model on different Edge TPUs. This can improve throughput for high-speed applications and can reduce total latency for large models that otherwise cannot fit into the cache of a single Edge TPU.
Learn more about pipelining a model with multiple Edge TPUs.
Although the Edge TPU is primarily intended for inferencing, you can also use it to accelerate transfer-learning with a pre-trained model. To simplify this process, we've created a Python API that executes the backbone of your model on the Edge TPU during training, and then calculates and saves new weight parameters for the final layer.
Learn more about on-device retraining.