Retrain an object detection model
This tutorial shows you how to retrain an object detection model to recognize a new set of classes. You'll use a technique called transfer learning to retrain an existing model and then compile it to run on any device with an Edge TPU, such as the Coral Dev Board or USB Accelerator.
Specifically, this tutorial shows you how to retrain a MobileNet V1 SSD model (originally trained to detect 90 objects from the COCO dataset) so that it detects two pets: Abyssinian cats and American Bulldogs (from the Oxford-IIIT Pets Dataset). But you can reuse these procedures with your own image dataset, and with a different pre-trained model.
Note that this tutorial runs the training scripts on your computer using a Docker virtual environment, so the training time (and even the ability to complete the training) depends on your system specs. As an alternative, we also offer retraining tutorials that run in the cloud, using Google Colab:
- Retrain EfficientDet-Lite object detector on Google Colab (TF2)
- Retrain SSDLite MobileDet object detector on Google Colab (TF1)
- Retrain SSD MobileNet V1 object detector on Google Colab (TF1) (uses the same scripts as this Docker tutorial)
What is transfer learning?
Ordinarily, training an object detection model can take several days on a CPU, but transfer learning is a technique that takes a model already trained for a related task and uses it as the starting point to create a new model. Depending on your system and training parameters, this instead takes a few hours or less. (This process is sometimes also called "fine-tuning" the model.)
Transfer learning can be done in two ways:
- Last layers-only retraining: This approach retrains only the last few layers of the model, where the final classification occurs. This is fast and it can be done with a small dataset.
- Full model retraining: This approach retrains each layer of the neural network using the new dataset. It can result in a model that is more accurate, but it takes more time, and you must retrain using a dataset of significant sample size to avoid overfitting the model.
Transfer learning is most effective when the features learned in the pre-trained model are general, not highly specialized. For example, a pre-trained model that can recognize household objects might be re-trained to recognize new office supplies, but a model pre-trained to recognize different dog breeds might not.
The steps below show you how to perform transfer-learning using either last-layers-only or full-model retraining. Most of the steps are the same; just keep an eye out for the different commands depending on the technique you desire.
Requirements
You need the following for this tutorial:
- Any computer supported by Docker (such as Linux, Mac, or Windows).
- At least 10 GB of RAM.
- A device with an Edge TPU, such as the Coral Dev Board or USB Accelerator (these each have their own list of requirements).
Set up the Docker container
Docker is a virtualization platform that makes it easy to set up an isolated environment for this tutorial. Using our Docker container, you can easily set up the required environment, which includes TensorFlow, Python, Object Detection API, and the the pre-trained checkpoints for MobileNet V1 and V2.
To set up your container, follow these steps:
-
First install Docker on your desktop machine (this link is for Ubuntu; select your appropriate platform from the Docker left navigation).
-
Open a command line and create a directory for all the files in this project. You will clone the Coral tutorials repo into it, so name it accordingly. For example:
CORAL_DIR=${HOME}/google-coral && mkdir -p ${CORAL_DIR}
-
Move into that directory and clone our tutorials repo, which has all the training scripts:
cd ${CORAL_DIR}
git clone https://github.com/google-coral/tutorials.git -
Move into the directory for this tutorial and build the Docker image:
cd tutorials/docker/object_detection
docker build . -t detect-tutorial-tf1 -
Specify the location for the training output files. For example:
DETECT_DIR=${PWD}/out && mkdir -p $DETECT_DIR
You'll use this as the mount location for a directory in the Docker container, thus saving the training files and final model to your file system (instead of leaving them inside the Docker container).
-
Start the Docker container:
docker run --name edgetpu-detect \ --rm -it --privileged -p 6006:6006 \ --mount type=bind,src=${DETECT_DIR},dst=/tensorflow/models/research/learn_pet \ detect-tutorial-tf1
When that's finished, your command prompt should be inside the Docker container
and in the path /tensorflow/models/research/
.
You're ready to start training your model.
Download and configure the training data
Now you'll download the images, labels, and the model checkpoints used in the retraining.
We've prepared the following script (in the research/
directory) to take care of that for you.
This script also updates the training configuration file at
/tensorflow/models/research/learn_pet/ckpt/pipeline.config
to match the new dataset in several
ways, such as the number of classes, the path to your checkpoint file, and whether to retrain the
last few layers or the whole model. As such, the script accepts arguments to specify the model type
with network_type
and whether you retrain the whole model or last few layers with
train_whole_model
. (Beware that setting train_whole_model
to true requires a lot more time
for training—over 10 hours.)
# Run this from within the Docker container (at tensorflow/models/research/): ./prepare_checkpoint_and_dataset.sh --network_type mobilenet_v1_ssd --train_whole_model false
The network_type
can be either mobilenet_v1_ssd
, or mobilenet_v2_ssd
. This example and those
below use MobileNet V1; if you decide to use V2, be sure you update the model name in other commands
below, as appropriate.
You can ignore the warning about the missing Abyssinian_104.xml file.
prepare_checkpoint_and_dataset.sh
script handles all the training data setup and
configuration, which is designed to train a pet detector model. If you want to know more about what
the script does and how to create your own dataset, see the section below about how to
configure your own training data.
Start training
Now you can begin the transfer-learning process as follows:
-
Set some training variables, based on your training strategy:
-
If you're retraining just the last few layers, we suggest the following numbers:
NUM_TRAINING_STEPS=500 && \ NUM_EVAL_STEPS=100
-
If you're retraining the whole-model, we suggest the following numbers:
NUM_TRAINING_STEPS=50000 && \ NUM_EVAL_STEPS=2000
-
-
Start the training job:
# From the /tensorflow/models/research/ directory ./retrain_detection_model.sh \ --num_training_steps ${NUM_TRAINING_STEPS} \ --num_eval_steps ${NUM_EVAL_STEPS}
-
To monitor training progress, start tensorboard in a new terminal:
-
Start bash in a separate terminal to join the same Docker container.
sudo docker exec -it edgetpu-detect /bin/bash
-
In the new Docker terminal, execute the following command to start tensorboard in
/tensorflow/models/research/
directory. After you execute the command, tensorboard visualizes the model accuracy throughout training in your local machine's browser at http://localhost:6006/.tensorboard --logdir=./learn_pet/train/
-
This takes a long time to train. Depending on your machine, it can take 1 - 4 hours to retrain the last few layers, or up to 10 hours to retrain the whole model (based on a 6-core CPU with 64GB memory).
As training progresses, you can see the transfer-learned checkpoint files begin to appear in the
/tensorflow/models/research/learn_pet/train
directory, which is mounted to the local $DETECT_DIR
location you created when starting the Docker container.
Compile the model for the Edge TPU
To run your retrained model on the Edge TPU, you need to convert your checkpoint file to a frozen graph, convert that graph to a TensorFlow Lite flatbuffer file, then compile the model for the Edge TPU. The following steps guide you through it all.
-
To freeze the graph and convert it to TensorFlow Lite, use the following script and specify the checkpoint number you want to use (this example uses checkpoint 500):
# From the Docker /tensorflow/models/research directory ./convert_checkpoint_to_edgetpu_tflite.sh --checkpoint_num 500
Your converted TensorFlow Lite model is named
output_tflite_graph.tflite
and is output in the Docker container attensorflow/models/research/learn_pet/models/
, which is the mounted directory available on your host filesystem at$DETECT_DIR
. -
Open a new terminal (outside the Docker container) and install the Edge TPU Compiler:
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
sudo apt update
sudo apt-get install edgetpu-compiler -
Make sure your user has ownership of the
out
directory:sudo chown -R $USER ${HOME}/google-coral/tutorials/docker/object_detection/out
-
Now change directories to where the trained model is and compile it:
cd ${HOME}/google-coral/tutorials/docker/object_detection/out/models
edgetpu_compiler output_tflite_graph.tfliteThe compiled file is named
output_tflite_graph_edgetpu.tflite
and saved in the current directory. -
Finally, rename the compiled model to something more specific:
mv output_tflite_graph_edgetpu.tflite ssd_mobilenet_v1_catsdogs_quant_edgetpu.tflite
Run the model
You can now use the retrained model to run an inference on the Edge
TPU. Below, you can see how to use this model with the
detect_image.py
example, which performs object detection using the TensorFlow Lite Python API.
Remember that you've trained this model to recognize just two classes: Abyssinian cats and American Bulldogs. So here are a couple images that should provide results (provided by the Open Images Dataset):
wget https://c3.staticflickr.com/8/7028/6595489185_60fb5cd274_z.jpg -O dog.jpg && \ wget https://c6.staticflickr.com/9/8534/8652503705_687d957a29_z.jpg -O cat.jpg
Using the Coral Dev Board
-
First, be sure your Dev Board software is up to date.
-
Use MDT to push the files to the Dev Board and switch to the Dev Board shell:
mdt push ssd_mobilenet_v1_catsdogs_quant_edgetpu.tflite labels.txt dog.jpg
mdt shell -
Now from the Dev Board shell, download the
detect_image.py
code from GitHub:mkdir google-coral && cd google-coral
git clone https://github.com/google-coral/tflite --depth 1 -
Install the example's requirements:
cd tflite/python/examples/detection
./install_requirements.sh -
Run the example using the files you pushed in step 2:
python3 detect_image.py \ --model ${HOME}/ssd_mobilenet_v1_catsdogs_quant_edgetpu.tflite \ --labels ${HOME}/labels.txt \ --input ${HOME}/dog.jpg \ --output dog_result.jpg
Using the Coral USB Accelerator
-
First, be sure your USB Accelerator is set up.
-
Although this is also part of the device setup, here's how to get the
detect_image.py
code from GitHub:mkdir google-coral && cd google-coral
git clone https://github.com/google-coral/tflite --depth 1 -
Install the project requirements:
cd tflite/python/examples/detection
./install_requirements.sh -
Run the example using the retrained model:
python3 detect_image.py \ --model ${HOME}/google-coral/tutorials/docker/object_detection/out/models/ssd_mobilenet_v1_catsdogs_quant_edgetpu.tflite \ --labels ${HOME}/google-coral/tutorials/docker/object_detection/out/models/labels.txt \ --input ${HOME}/google-coral/tutorials/docker/object_detection/out/models/dog.jpg \ --output dog_result.jpg
Configure your own training data
If you finished all the previous steps, then you've completed transfer-learning to create a model that detects cats and dogs. But chances are, you'd prefer that your model detect other things. So this section describes how to prepare your own training data to retrain an object detection model.
If you look back at what you've done, you'll see that the bulk of the work is done for you
through the script prepare_checkpoint_and_dataset.sh
. This script does three important things:
-
Organize the dataset photos, annotations, and label map (the training data), and then convert it all into TFRecord format.
The images and annotations used above come from the Oxford-IIIT Pets Dataset; the labels map is
pet_label_map.pbtxt
; and the script to convert it all to TFRecord iscreate_pet_tf_record.py
. -
Download the model checkpoint files (the neural network graph to retrain).
The MobileNet files used above (and more) are available from our Models download page.
-
Configure the
pipeline.config
file.This file is included with the model checkpoint files. It's required by the TensorFlow Object Detection API and you need to modify various properties in here to customize the training pipeline for your dataset and training strategy.
So to create your own dataset, you need to prepare this stuff yourself.
Organize your dataset
The first of the three items above is the most time-consuming and the most important: you need to gather and organize all the photos, annotations, and labels to use for training.
This process is also the least documented here; it requires a fair amount of experience with ML data preparation and some experience with TensorFlow APIs. We recommend you follow this TensorFlow guide to preparing inputs.
Also take a look at this tutorial for using TFRecords and the code that converts the pets dataset in
create_pet_tf_record.py
.
Select your model
Once you have your dataset, you need the checkpoint files for the quantized TensorFlow Lite (object detection) model you want to retrain. (You must use either quantization-aware training (recommended) or full integer post-training quantization.)
We have some Edge TPU compatible models available on our Models download page that you can retrain, but you can use any other object detection model that's compatible with the Edge TPU.
Configure your training pipeline
Now reconfigure the existing pipeline.config
file that came with the model, as appropriate.
What changes you make depends entirely on your model and your training strategy. You should
read more about the config file here.
For demonstration purposes, the following shows the pipeline.config
changes required for
the retraining performed above (when using the MobileNet V1 SSD
model to retrain the last-few-layers only):
-
At the top of the file, change
num_classes
for the number of classes in your dataset.For example, change
num_classes: 90
tonum_classes: 2
for a dataset with 2 classes. -
Specify your checkpoint file with
fine_tune_checkpoint
and enable a couple other properties.For example, change this line:
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
To this:
fine_tune_checkpoint: "/tensorflow/models/research/learn_pet/ckpt/model.ckpt" from_detection_checkpoint: true load_all_detection_checkpoint_vars: true
-
Specify your training data location.
For example, change this:
train_input_reader { label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100" } }
To this:
train_input_reader { label_map_path: "/tensorflow/models/research/learn_pet/pet_label_map.pbtxt" tf_record_input_reader { input_path: "/tensorflow/models/research/learn_pet/pet_faces_train.record-?????-of-00010" } }
-
Specify the evaluation data location.
For example, change this:
eval_input_reader { label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010" } }
To this:
eval_input_reader { label_map_path: "/tensorflow/models/research/learn_pet/pet_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "/tensorflow/models/research/learn_pet/pet_faces_val.record-?????-of-00010" } }
-
Specify the layers you want to freeze in the model.
For example (when using MobileNet V1 SSD model to retrain the last-few-layers only), change this:
max_number_of_boxes: 100 unpad_groundtruth_tensors: false
To this:
max_number_of_boxes: 100 unpad_groundtruth_tensors: false freeze_variables: ['Conv2d_0', 'Conv2d_1_pointwise', 'Conv2d_1_depthwise', 'Conv2d_2_pointwise', 'Conv2d_2_depthwise', 'Conv2d_3_pointwise', 'Conv2d_3_depthwise', 'Conv2d_4_pointwise', 'Conv2d_4_depthwise', 'Conv2d_5_pointwise', 'Conv2d_5_depthwise', 'Conv2d_6_pointwise', 'Conv2d_6_depthwise', 'Conv2d_7_pointwise', 'Conv2d_7_depthwise', 'Conv2d_8_pointwise', 'Conv2d_8_depthwise', 'Conv2d_9_pointwise', 'Conv2d_9_depthwise']
That should be it. But again, you should read more about the config file.
Initiate retraining
So far, everything described here about how to
configure your own training data has merely described how
to replicate the steps that are performed in the prepare_checkpoint_and_dataset.sh
script used above, which prepares training data for a
pet detector.
So now that you've prepared your own training data, all that's left is to run the
retraining. And for that, you can use
the retrain_detection_model.sh
script as shown above in Start training.
Is this content helpful?