Audio classification

An audio classification model can recognize sounds or spoken words and phrases.

Currently, all audio classification models that are compatible with the Edge TPU are feed-forward CNN architectures, so they're effective for short audio samples only.

Trained models link

These models are trained and compiled for the Edge TPU.

Notice: These are not production-quality models; they are for demonstration purposes only.
Model name Detections/Dataset Input size Micro 1 Model size Downloads


520+ sounds

15600x1 (WAV) No 4.2 MB

YamNet without frontend

520+ sounds

96x64x1 (spectrogram) Yes 4.1 MB

Keyword Spotter v0.7

140+ speech phrases

198x32x1 (spectrogram) Yes 578 KB

Keyword Spotter v0.8

140+ speech phrases

198x32x1 (spectrogram) Yes 578 KB

1 Indicates compatibility with the Dev Board Micro. Some models are not compatible because they require a CPU-bound op that is not supported by TensorFlow Lite for Microcontrollers or they require more memory than available on the board. (All models are compatible with all other Coral boards.)

Example code link

Keyphrase detector

A few examples using the Keyword Spotter model to detect over 140 short phrases such as "start game" and "next song." Includes a snake game and a YouTube player that respond to voice commands.

Languages: Python