Teachable Sorter

A machine that you can teach to rapidly recognize and sort objects using your own custom machine learning models.


Project intro

Project summary

This project shows you how to build a machine to classify and sort objects. It leverages Teachable Machine, a web-based tool that lets you easily train your own image classification model without writing any code, and then uses the Coral USB Accelerator for inferencing based on that model to classify and sort objects.

The first thing we’ll do is help you build the physical sorting machine itself, consisting of a Raspberry Pi, a USB Accelerator, and a camera housing (The Decider). We will then cover how to use the images from the camera to create new classification models using Teachable Machine. Once you have a trained model, your sorter will be able to classify and sort objects as they fall through the air, because of the Edge TPU’s fast inference time. Finally, we’ll run through other tips and tricks for improved classification and sorting accuracy throughout the whole process.

How it works

The Decider (the part determining what is a marshmallow and what isn’t) relies on two key components:

  • Teachable Machine, a web-based tool that allows you to train an image classification model rapidly right from your browser.

  • The Coral USB Accelerator, which enables you to run the trained model and classify an image with very low latency (< 10 ms), using an Edge TPU chip.

These two capabilities make it possible to acquire training data on different types of objects, train a model to differentiate between them using Teachable Machine, and then use that model to sort the objects as they fall in front of a camera.

What you'll do in this tutorial

  • Learn how to use Teachable Machine to train an image classification model for use with an Edge TPU
  • Learn how to run that model on the USB Accelerator in a sorter.

What you’ll need

Note: You can instead build this project using the Dev Board in place of the USB Accelerator and Raspberry Pi.

Required system hardware

Description Example part number
Coral USB Accelerator 212-842776110077
Raspberry Pi 4 Product ID: 4292
Raspberry Pi 4 Power Supply 1101001000045
8GB micro SD CARD SDSDQAB-008G
NeoPixel Rings Adafruit 1586
Camera Described Below

Optional system hardware

Description Example part number
Display Adafruit 1933
FLIR Blackfly S Camera BFS-U3-04S2C-CS
4 mm UC Series Fixed Focal Length Lens Stock #33-300
Micro HDMI - HDMI Cable Adafruit 4302
USB 3.0 Cable 1AA2

There are many different cameras you can use for sorting, from super-expensive industrial cameras all the way to consumer USB webcams. You can actually use almost any modern webcam to sort objects, but getting a specialty camera will greatly increase the accuracy and possible throughput of the sorter.

In our case, we wanted to test the limits of sorting performance, so we chose a mid-range machine vision camera called the FLIR Blackfly S. This camera has the following features, which made it a good candidate for our sorter:

  • Global shutter - Since we are building a machine that will sort objects in flight, we need to capture objects in midair. Rolling shutter cameras we tested produced significant distortion due to the rolling shutter effect, so we chose a camera with a global shutter.

  • High frame rate - We got good results running at 120 fps. The camera in this system is always running, but most of the time it sees only a static background and does nothing. When an object falls in front of it, that happens only for a few frames. We need to pick the frame that has the best image of the object; one where the object is right in the center of the image. The more frames we have, the better our selection can be and the more accurate the classification produced can be.

  • Fast shutter speed - We used a shutter speed of 100 microseconds or less. Since we are capturing objects in flight, we need a super fast shutter speed to avoid motion blur.

  • Interchangeable lenses - We used a 4 mm lens, so we can get a macro view of the objects with minimal distortion. This is crucial for filling up as many pixels as possible with each object we are sorting so that the machine learning model can make inferences based on small details.

  • Granular manual controls - The more aspects of the camera that we can control and keep consistent, the better and more accurate our sorter will be. Any change in the image—like autofocus or auto exposure—can throw off the model.

So, again, you don’t need to have a super fancy machine vision camera to sort stuff, but a high quality camera will improve the speed and accuracy of your sorter.

Mechanical hardware

For the physical build in this example, you will need a 3D printer. We will provide the STL files for the hardware we used as a starting point, but the concepts in this project could easily be applied to other physical setups. We will also lay out the principles that guided our design, so that you can create your own.

Decider hardware

Decider hardware component STL file/Example part
Bottom LED holder Link to repository
Camera holder Link to repository
Main body Link to repository
Top LED holder Link to repository
½” acrylic rod McMaster-Carr 8528K32

Tippy Thing hardware

We redirect the cereals to the desired location by having a solenoid change the fall direction of the object. This solenoid is triggered in our software by using board pin 7 on the Raspberry Pi. You can wire the output of this pin to control the type of actuator you chose.

Tippy Thing hardware component STL file/Example part
Body Link to repository
Solenoid Adafruit link
Solenoid holder Link to repository
Track Link to repository

How to build it

Skills required

  • For the technical part of the project you will need some familiarity with working on the command line on Linux, and with Python.

  • For the physical build, we used 3D prints. You can use the same STL files or fabricate this chamber in any number of other ways, but some knowledge of basic circuitry like GPIO pins, LEDs, solenoids, and servo motors is recommended.

Step 1: Build the mechanical body

The physical design of the sorter is heavily influenced by what provides the best performance when you train your model. The body of the sorter in this tutorial uses a 3D-printed structure but it's purely a demonstration — you should plan your own design carefully, based on what you want to sort.

The files we have provided are for creating the Decider unit and Tippy Thing shown in the videos, and you can create them or use them as a starting point for your designs.

When it comes to optimizing the performance of a sorter, the following design goals should be kept in mind.

1.1 Ensure the images are consistent

The most important part of building a good sorting mechanism is to make sure that all settings (the photo background, lighting, camera settings) stay as consistent as possible during both the training periods and regular use.

In our setup, we isolated the camera as much as we could, to ensure that the video demonstration is clear and visible. In an ideal design, the chamber would be sealed except for openings on the top and bottom to let objects through.

Additionally, the camera’s background should not be subject to conditions such as changing sunlight or the sorter being moved between different lighting conditions. The more isolated and artificially lit you can make your camera chamber, the better and more consistently your sorter will perform. We created a small printed insert that would allow us to maintain a consistent, high-contrast backdrop even if we had to swap the sorter chamber or change the color of parts. The finished example images and video below illustrate how the dark paper background insert improves the contrast with the cereal.

1.2 Provide enough light

In order to capture as clean an image as possible, you must keep exposure time to a minimum. The longer your camera’s shutter is open, the greater the chance that the object shifts in the frame and blurs the picture.

To capture a clear image in this very small window, we need to pump the frame with a lot of light. We’ve found that running with addressable LED’s generally worked well for us, as they allow lots of finite control over brightness and positioning.

In our example, the objects were lit up using NeoPixel Light Rings.

Flooding the entire camera chamber with light works pretty well, but be aware that it can draw a lot of power over time (at full brightness, the NeoPixel rings use 1.6 A!). You could get even more efficient with LEDs directed at a specific area you want to capture.

Step 2: Setup Pi and install libraries

Before you can start acquiring images of the objects you want to sort, you need to set up the software and install software libraries.

2.1 Set up your Pi

If you have not already set up your Raspberry Pi, you can do so by following these instructions from Raspberry Pi.

2.2 Set up the USB Accelerator

Follow the instructions to set up your USB Accelerator.

Install the Edge TPU Python library as follows:

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update

sudo apt-get install python3-edgetpu

2.3 Install libraries

Download the Teachable Sorter repository and installing its requirements using the following commands:

git clone https://github.com/google-coral/project-teachable-sorter.git

cd project-teachable-sorter

sh install_requirements.sh

Step 3: Setup the image pipeline

Note: While we built a sorter that captures objects while they fall, the same principles apply to any type of sorter, be it a ramp, a conveyor, or whatever else you come up with.

Once you have a well-lit, consistent camera chamber set up, it’s time to start acquiring images of the stuff you want to sort so that you can train a model with Teachable Machine.

There are three stages to the sorting pipeline:

  1. Acquisition: Getting a steady stream of images from your camera.

  2. Detection: Figuring out if an object is captured in the center of the frame.

  3. Classification: Having the Edge TPU (in the USB Accelerator) classify the image.

We find that the acquisition and classification stages are generally well understood and documented online, but the detection stage was the hardest for us. There are a number of ways we could have performed the detection, such as training a machine learning model on "Is this a good picture or not?" However, we chose to go with a cheaper, easier approach that basically looks for changes in the pixel values of the image, indicating an object in the center of the frame.

3.1 Acquire images from your camera

The first part of the pipeline is getting your raw camera stream to the Raspberry Pi. There are several different ways to do this. The best way really depends on the camera you use and we’ll provide three different ways: using the FLIR Camera we used in our example; using an Arducam camera; and using OpenCV. The OpenCV method should work with most webcams, but not every camera is supported out of the box.

To start, plug your camera into the Pi, and start the sorter.py script to start your live view. You will need to pass the appropriate flag for the camera you are using. The command to use is this:

python3 sorter.py --flir

The flags you need for different types of cameras are as listed in this table:

Type of camera Flag to pass
Flir Blackfly S --flir
Arducam --arducam
Most other webcams --opencv

If everything is set up correctly and your camera is streaming to the Pi, you should be able to see a live window (either on your Pi’s display or over X forwarding).

3.2 Detect when a sorting target passes in front of the camera

We will now begin the detection phase, determining when we should run the classification once there is an object in the center of the frame.

Begin the process by running the sorter script in train mode by passing the --train flag. This means that images will not be sent to the Edge TPU for processing, as we haven’t trained our model yet. Instead, images analyzed to have an object in the center will be sent to Teachable Machine and displayed locally. This mode is for testing how well your acquisition settings are working, and for sending images to Teachable Machine for analysis later on. The commands to initiate this are:

cd teachable-sorter

python3 sorter.py --flir --train --zone-activation

The second flag, --zone-activation, refers to the algorithm being used to determine if there is an object in the center of the frame. The zone activation flag looks for changes in a group of pixels right at the center of the frame. This one is the best for noisy backgrounds. If you find you are getting a lot of false positives (pictures of the empty camera chamber) or false negatives (pictures of the falling objects), changing this flag can often help. The following table lists the other options:

Flag Details/scenario
--zone-activation Best for noisy backgrounds
--center-of-mass Best performance on very clean, evenly lit backgrounds
--biquad Zone activation with biquad filter smoothing—can help if zone activation produces images with objects too high or low
--biquad2D Slight performance penalty, but can work more accurately than --biquad for the same conditions

More details about how these filters work is provided in the How it actually works section below.

If you are running the above script with your Pi connected to a display, it will show an image each time the camera saves a picture. If you are working remotely over SSH, make sure to have X forwarding enabled to see the pictures.

Step 4: Connect to Teachable Machine

Now it’s time to train a model so we can perform the third step in the pipeline: classification.

The first step is to send a bunch of images of each class of object you want to sort, to Teachable Machine. We will then create our classification model, load it onto the Raspberry Pi and just like that, we can start sorting.

4.1 Gather training data

The first step towards training your model is hand sorting a small subset of the objects you want to sort. This will enable you to train the classes you need on Teachable Machine. In cases where there is a clear binary distinction (such as either "marshmallow" or "cereal"), the task is easy.

Once you have your training objects sorted, the next section shows you how to capture the training images.

Tips:

  • Try and start with at least 30 samples of each class. If you find that you are getting inconsistent classifications, add more samples.

  • If you have a lot of variations within each of the different class of objects you want to sort into, try to get a representative sample set with examples of each variation.

  • If your target isn’t as binary as cereal or marshmallow, we find that the Teachable Machine image classifier performs best when shown the extreme examples of each class.

4.2 Connect your Raspberry Pi’s camera output to Teachable Machine

Okay, so you are acquiring images on the Pi, have the presorted stuff you want to train on, and are ready to create your first model. The models are trained on Teachable Machine and you need to send it images of each class of object in order to generate a model that can distinguish between them.

But first, you need to create an SSH tunnel between the Raspberry Pi and your computer. Open a new terminal window on your computer and run the following command:

ssh -L 8889:localhost:8889 100.107.63.160
Note: Remember to use the IP address of your own Raspberry Pi in the command above.

Enter your password when prompted. This establishes the tunnel between your Raspberry Pi and your computer. Leave that terminal running for the duration of the training.

Now run the sorter script again in train mode from a terminal window on your Raspberry Pi like this:

python3 sorter.py --flir --train --zone-activation

You can refer above to find the appropriate flags for your camera and set up.

To start sending training images to Teachable Machine, we need to enable network input on the tool by adding ?network=true to the end of the Teachable Machine URL. When you go to the teachable machine page with this URL flag enabled, you will see a new input option appear, titled ‘Network’:

Click the network input option and connect to localhost:8889. You should see the camera feed from the Raspberry Pi.

4.3 Gather training image samples

Once you have connected your Raspberry Pi to Teachable Machine, it’s time to train! Name your classes and connect to the network input shown above. Fill each class with images of the things you want to sort.

4.4 Export and transfer model to Coral

Once you feel confident about the number of samples you have given to Teachable Machine and feel the model is trained enough, perform the following steps as illustrated in the animation below:

  1. Go to the Export Model panel inside Teachable Machine and select the Tensorflow Lite tab.
  2. Select Edge TPU as the Model conversion type.
  3. Click the Download my model button to download the model you’ve just created.

When the model finishes downloading, unzip it and cd to it in a terminal on your computer. Then copy it over to the root of the teachable-sorter repo on your Raspberry Pi using scp like so:

scp model_edgetpu.tflite pi@100.107.63.160:~/teachable-sorter
Note: Remember to use the IP address of your own Raspberry Pi in the command above.

You will be prompted for your password and should then see the file transfer executing. Once you have transferred the model, you’re ready to move on to the next step.

How to use it

Once you have your model exported from Teachable Machine back on your Raspberry Pi, you can begin sorting! Make sure you have plugged in your Coral USB Accelerator, otherwise you will get an error saying that no Edge TPU has been detected.

The command to initiate sorting is:

python3 sorter.py --sort --zone-activation --display

The same acquisition algorithm options that you used in training are available in sort mode. However, this time instead of just displaying the images, the script will run each image through the model and try and classify it. The output of this classification is printed to the console.

If everything is running well, each time you drop something in front of the camera, you will see something like the following console output.

Image 1 Captured: Class 1 Detected

Inference time: 10ms 

Additionally, the example project will by default set GPIO pin 7 high if the classification is Class 1 and keep it low if the classification is Class 2. This is the signal the sorter uses to redirect an object into the right bin.

How it actually works

The code for the sorter does three principal things:

  1. Grab frames from the Camera.

  2. Run the frames through a filter to determine which frames contain objects that need to be sorted.

  3. If the frame contains an object to be sorted, either send that object to Teachable Machine to train or classify that object and send a signal to redirect it into the right bin.

Determining when an object is in view of the camera

For optimal sorting performance, we can’t run classification on each frame. Firstly, that would make training a nightmare, having to pick through the empty frames trying to find the one good shot of the object. Secondly, we don’t want to spend unnecessary milliseconds waiting for a classification when we could simply throw the image out if there is nothing in it to classify. We provide four different ways to determine when an object is centered in the frame from the camera. We are now going to discuss one of those ways: the zone activation filter.

The core part of this filter is the following function:

def calculate_average_in_zone(img):
    detection_zone_height = 20
    detection_zone_interval = 5

    return img[height // 2 : (height // 2) + detection_zone_height : 
               detection_zone_interval, 0:-1:3].mean()

It returns the mean value of the pixels in an image within a given region. Since we are trying to find when an object is in the center of the frame, we look at four horizontal rows of pixels, starting from the middle row and skipping every 5. Within each row, we look at every third pixel.

We take the average of the values of these pixels over 30 frames when the script is running without any objects falling in its view, to determine what the value is when the camera is just looking at the background:

    detection_zone_avg = calculate_average_in_zone(img)

    if len(sliding_window) > 30:
        mean[0] = utils.mean_arr(sliding_window)
        sliding_window.pop(0)

        sliding_window.append(detection_zone_avg)

Finally, we check the difference between the average value of that area (the background) and the current value. If there is a large change caused by an object falling in that area, we return True indicating that there is an image worth sending to Teachable Machine for training or classification.

   if mean[0] != None and abs(detection_zone_avg - mean[0]) > threshold:
        print("Target Detected Taking Picture")
        return True

Sending images to Teachable Machine

To get images to the Teachable Machine website, we rely on a websocket connection between your browser and the Raspberry Pi. This is why we use an SSH tunnel between the Pi and you computer, so we have more reliability.

In order to train our model, we need to format our images for Teachable Machine. We can send our images to Teachable Machine as an image or as JSON. When we use JSON, we can send a secondary field that allows Teachable Machine to automatically store the image as a class example rather than relying on the Hold to record button.

In order to use JSON, we need to encode the images in a way that can be converted to a string, such as base64. This conversion does create a slight increase in network overhead, but since the images are so small it does not affect system performance in any noticeable way.

def format_img_tm2(cv_mat):
    ret, buf = cv2.imencode('.jpg', cv_mat)
    encoded  =  base64.b64encode(buf)
    return encoded.decode('ascii')

The output of this function is to send each frame to Teachable Machine in the following code:

message = dict()
message['image'] = format_img_tm2(cv_mat)
message['shouldTakePicture'] = True
send_over_ws(message, cam_sockets)

Note that the shouldTakePicture boolean field that can be set whether we want the image to just show up in the preview window or actually be recorded as a sample.

Classifying images

Since we are using openCV to filter and analyze the images, we need to convert them into a format that works with the Edge TPU Python API. To do that, we simply use the Image.fromarray() function built into the Python Image Library:

img_pil = Image.fromarray(cv_mat)

This then feeds into the Edge TPU engine which outputs the classification:

classification_result = engine.ClassifyWithImage(img_pil)

Extending the project

We’ve provided resources on how to build the most difficult part of the chamber as well as the photography and inference components, but there are a few more parts that can help make your sorter even more comprehensive and useful. Here are a few such items:

Singulators

In our experience, isolating individual elements from an unsorted set proved to be the most complicated mechanical step in the whole process. Each material we tried had different physical and mechanical characteristics that demanded custom solutions. In our process, we stuck to two approaches: vibrating tables and bucket wheels.

  • Vibrating tables: We used off-the-shelf vibrating motors in a variety of different configurations to make our materials move in single file. Vibrating tables with a narrow, tapered shoot at the end proved to work well for things like grains, beans, and cereals.

  • Bucket wheel: A wheel with buckets on it allowed us to collect a single item from the unsorted set.

Ejectors

To actually sort the desired or undesired materials we needed some method of ejection that would allow us to redirect, or tip, the material into its matching set. We call ours the Tippy Thing.

You can experiment with various types of ejectors.

Redirector: In the case of our Tippy Thing, we redirect the cereals to the desired location by having a solenoid change the fall direction of the object. This is not the fastest sorting method but worked well for our example. You can work on improving that aspect of the machine.

Solenoid Ejector: Another approach we used was a solenoid punch with a ramp to redirect the falling objects in mid air.

Air Ejector: The fastest and most effective method of separating out classified objects from an unsorted set was air ejection.

About the creators

Gautam Bose & Lucas Ochoa

We are two creative technologists currently working on projects at Google Creative Lab. We are passionate about making technology more accessible for all makers, especially when it comes to machine learning and physical computing. That’s why this project writeup is one we’re especially excited to share with everyone, as there’s potential beyond marshmallows to sorting all kinds of things, at all kinds of scales — things that are practical and helpful.

When we’re not playing around with ML and other projects you can find Gautam cheering on his favorite Formula 1 Team, while Lucas enjoys going for long runs on the Hudson River.