Audio classification

An audio classification model can recognize sounds or spoken words and phrases.

Currently, all audio classification models that are compatible with the Edge TPU are feed-forward CNN architectures, so they're effective for short audio samples only.

Trained models

These models are trained and compiled for the Edge TPU.

Notice: These are not production-quality models; they are for demonstration purposes only.

Model name	Detections/Dataset	Input size	Micro ¹	Model size	Downloads
YamNet	520+ sounds	15600x1 (WAV)	No	4.2 MB	Edge TPU model, Labels file
YamNet without frontend	520+ sounds	96x64x1 (spectrogram)	Yes	4.1 MB	Edge TPU model, Labels file
Keyword Spotter v0.7	140+ speech phrases	198x32x1 (spectrogram)	Yes	578 KB	Edge TPU model, Labels file
Keyword Spotter v0.8	140+ speech phrases	198x32x1 (spectrogram)	Yes	578 KB	Edge TPU model, Labels file

¹ Indicates compatibility with the Dev Board Micro. Some models are not compatible because they require a CPU-bound op that is not supported by TensorFlow Lite for Microcontrollers or they require more memory than available on the board. (All models are compatible with all other Coral boards.)

Example code

Keyphrase detector

A few examples using the Keyword Spotter model to detect over 140 short phrases such as "start game" and "next song." Includes a snake game and a YouTube player that respond to voice commands.

Languages: Python

View on GitHub

Trained models link

Example code link

Trained models

Example code