Chapter 4. Computer Vision Apps with ML Kit on Android

Chapter 3 gave you an introduction to ML Kit and how it can be used for face detection in a mobile app. But ML Kit is far more than that—it gives you the ability to rapidly prototype common vision scenarios, host custom models, or implement other turnkey solution scenarios such as barcode detection also. In this chapter, we will explore some of the models that are available in ML Kit to provide computer vision scenarios, including image labeling and classification, and object detection in both still and moving images. We’ll do this on Android, using Kotlin as the programming language. Chapter 6 will cover the equivalent content using Swift for iOS development.

Image Labeling and Classification

The concept of image classification is a well-known one in machine learning circles, and the staple of computer vision. In its simplest sense, image classification happens when you show an image to a computer and it tells you what that image contains. For example, you show it a picture of a cat, like that in Figure 4-1, and it will label it as a cat.

Image labeling in ML Kit takes this a little further and gives you a list of things seen in the image with levels of probability, so instead of Figure 4-1 just showing a cat, it might say that it sees a cat, flowers, grass, daisies, and more.

Let’s explore how to create a very simple Android app that can label this image! We’ll use Android Studio and Kotlin. If you don’t have them already, you can download them at https://developer.android.com/studio/.

Figure 4-1. An image of a cat

Step 1: Create the App and Configure ML Kit

If you haven’t gone through Chapter 3 yet, or if you aren’t familiar with getting up and running with an Android app, I’d recommend that you do! Once you’ve created the app, you’ll need to edit your build.gradle file as demonstrated in that chapter. However, in this case, instead of adding the face detection libraries, you’ll need to add the image labeling ones, like this:

dependencies {
    implementation "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version"
    implementation 'androidx.core:core-ktx:1.2.0'
    implementation 'androidx.appcompat:appcompat:1.2.0'
    implementation 'com.google.android.material:material:1.1.0'
    implementation 'androidx.constraintlayout:constraintlayout:2.0.4'
    testImplementation 'junit:junit:4.+'
    androidTestImplementation 'androidx.test.ext:junit:1.1.2'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
    implementation 'com.google.mlkit:image-labeling:17.0.1'
}

Once you’ve done this, Android Studio will likely ask you to sync given that your Gradle files have changed. This will trigger a build with the new ML Kit dependencies included.

Step 2: Create the User Interface

We’ll just create a super simple UI for this app to allow us to get straight down to using the image labeling. In your res->layout directories within Android View you’ll see a file called activity_main.xml. Refer back to Chapter 3 if this isn’t familiar.

Update the UI to contain a linear layout with an ImageView, a Button, and a TextView like this:

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <LinearLayout
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:orientation="vertical"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent">

        <ImageView
            android:id="@+id/imageToLabel"
            android:layout_width="match_parent"
            android:layout_height="match_parent"
            android:layout_gravity="center"
            android:adjustViewBounds="true"
        />
        <Button
            android:id="@+id/btnTest"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:text="Label Image"
            android:layout_gravity="center"/>
        <TextView
            android:id="@+id/txtOutput"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:ems="10"
            android:gravity="start|top" />
    </LinearLayout>
</androidx.constraintlayout.widget.ConstraintLayout>

At runtime, the ImageView will load an image, and when the user presses the button, ML Kit will be invoked to get image label data back for the displayed image. The results will be rendered in the TextView. You can see this in Figure 4-3 a little later.

Step 3: Add the Images as Assets

Within your project, you’ll need an assets folder. Again, if you aren’t familiar with this step, check back to Chapter 3, where you’ll be stepped through the process. Once you have an assets folder and have added some images to it, you’ll see them in Android Studio. See Figure 4-2.

Figure 4-2. Images in the assets folder

Step 4: Load an Image to the ImageView

Now let’s write some code! We can go to the MainActivity.kt file, and within that add an extension that lets us load images from the assets folder as bitmaps:

fun Context.assetsToBitmap(fileName: String): Bitmap?{
    return try {
        with(assets.open(fileName)){
            BitmapFactory.decodeStream(this)
        }
    } catch (e: IOException) { null }
}

Then, update the onCreate function that was made for you by Android Studio to find the ImageView control based on its ID, and load one of the images from the assets folder into it:

val img: ImageView = findViewById(R.id.imageToLabel)
// assets folder image file name with extension
val fileName = "figure4-1.jpg"
// get bitmap from assets folder
val bitmap: Bitmap? = assetsToBitmap(fileName)
bitmap?.apply {
    img.setImageBitmap(this)
}

You can run your app now to test if it loads the image properly. If it does, you should see something like Figure 4-3.

Figure 4-3. Running the app with an image loaded

Pressing the button won’t do anything yet because we haven’t coded it. Let’s do that next!

Step 5: Write the Button Handler Code

Let’s start by writing code to give us variables that can represent the text view (for writing out the labels) as well as the button itself:

val txtOutput : TextView = findViewById(R.id.txtOutput)
val btn: Button = findViewById(R.id.btnTest)

Now that we have the button, we can create a button handler for it. This will be achieved by typing btn.setOnClickListener; autocomplete will create a stub function for you. Then, you can update it for image labeling with this complete code. We’ll go through it piece by piece next:

btn.setOnClickListener {
          val labeler =
            ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)
          val image = InputImage.fromBitmap(bitmap!!, 0)
          var outputText = ""
          labeler.process(image).addOnSuccessListener { labels ->
                     // Task completed successfully
                     for (label in labels) {
                          val text = label.text
                          val confidence = label.confidence
                          outputText += "$text : $confidence\n"
                     }
                     txtOutput.text = outputText
          }
       .addOnFailureListener { e ->
                        // Task failed with an exception
                        // ...
       }
}

When the user clicks the button, this code will create an image labeler from ML Kit with default options like this:

val labeler = ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)

Once it has done this, it will then create an image object (which ML Kit can understand) from the bitmap (used to display the image) with this code:

val image = InputImage.fromBitmap(bitmap!!, 0)

The labeler will be called to process the image, with two listeners added to it. A success listener will fire if the processing was successful, and a failure listener if it wasn’t. When an image labeler succeeds, it will return a list of labels. These labels will have a text property with text describing the label, and a confidence property with a value from 0 to 1 containing the probability that the labeled item is present.

So, within the success listener, the code will parse through the full list of labels, and add the text and confidence to a variable called outputText. Once it has completed, it can then set the text property of the TextView (now called txtOutput) to the value of the outputText variable:

for (label in labels) {
          val text = label.text
          val confidence = label.confidence
          outputText += "$text : $confidence\n"
}
txtOutput.text = outputText

It’s really as simple as that. Running the app with the cat image from earlier in this chapter will then give you output like Figure 4-4.

Figure 4-4. Labeling the image from earlier in this chapter

Next Steps

The built-in image labeling model from ML Kit recognizes over 400 classes within an image. At the time of writing it was 447, but this may change. The full label map for ML Kit is published at https://developers.google.com/ml-kit/vision/image-labeling/label-map. Should you want to train a model to recognize different classes you’ll use TensorFlow, which we’ll explore in Chapter 9.

Object Detection

The previous section showed you how to do image classification and labeling where the computer was able to detect what was in an image, but not necessarily where the item was within the image. The concept of object detection is used here. In this case, when you pass an image to the object detector, you’ll get back a list of objects, including bounding boxes that may be used to determine where in the image the object may be. The ML Kit default model for object detection is excellent at picking out objects in an image, but the number of classes it can classify is limited to only five before you need to use a custom model. However, when combining it with image labeling (previous section), you can get a classification of the individual objects within the image to get labels for them! You can see an example of this in Figure 4-5.

Figure 4-5. Performing object detection

Let’s look at this step by step.

Step 1: Create the App and Import ML Kit

Create the app as before as a single view application. We’ll try to keep this as similar to the image labeling app you’ve already built so that things can be familiar.

When you’re done, edit your build.gradle file to use both object detection and image labeling, like this:

implementation 'com.google.mlkit:object-detection:16.2.2'
implementation 'com.google.mlkit:image-labeling:17.0.1'
Note

Your version numbers may be different, so check the latest at https://developers.google.com/ml-kit.

Step 2: Create the Activity Layout XML

The layout file for the activity is super simple and exactly the same as what we saw earlier. You’ll have a LinearLayout that lays out an ImageView, a Button, and a TextView. The ImageView will display the image, the Button will run the object detection and labeling code, and the TextView will render the results of the labeling. Instead of relisting the code here, just use the same layout code as the previous example.

Step 3: Load an Image into the ImageView

As before, you’ll use an extension to load an image from the assets folder into the ImageView. For convenience, I’ve repeated the code to do that here:

// extension function to get bitmap from assets
fun Context.assetsToBitmap(fileName: String): Bitmap?{
    return try {
        with(assets.open(fileName)){
            BitmapFactory.decodeStream(this)
        }
    } catch (e: IOException) { null }
}

Create an assets folder like before, and put some images in it. For the screenshot in Figure 4-5, I used an image from Pixabay, which I renamed to bird.jpg for easier code.

Then, within the onCreate function, you can get the image from the assets using the preceding extension function and load it into your bitmap like this:

val img: ImageView = findViewById(R.id.imageToLabel)
// assets folder image file name with extension
val fileName = "bird.jpg"
// get bitmap from assets folder
val bitmap: Bitmap? = assetsToBitmap(fileName)
bitmap?.apply {
    img.setImageBitmap(this)
}

You can also set up the Button and TextView controls like this:

val txtOutput : TextView = findViewById(R.id.txtOutput)
val btn: Button = findViewById(R.id.btnTest)

Step 4: Set Up the Object Detector Options

You’ll use a number of ML Kit classes in this section. Here are the imports:

import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.label.ImageLabeling
import com.google.mlkit.vision.label.defaults.ImageLabelerOptions
import com.google.mlkit.vision.objects.DetectedObject
import com.google.mlkit.vision.objects.ObjectDetection
import com.google.mlkit.vision.objects.defaults.ObjectDetectorOptions

The ML Kit object detector gives you a variety of ways to do object detection, and these are controlled by an ObjectDetectorOptions object. We will use it in one of its simplest modes, which is to detect based on a single image and enable detecting multiple objects within that image:

val options =
        ObjectDetectorOptions.Builder()
        .setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
        .enableMultipleObjects()
        .build()

The object detector is a powerful API, which can also do things like tracking objects in a video stream—detecting them and maintaining them from frame to frame. That’s beyond the scope of what we’re doing in this book, but you can learn more about it in the ML Kit documentation.

The mode option is used to determine this—you can learn more about the SINGLE_IMAGE_MODE used in this example at https://oreil.ly/WFSZD.

Additionally, the object detector can be enabled to detect the most prominent object, or all objects within the scene. We’ve set it here to detect multiple objects (using .enableMultipleObjects() ), so we can see multiple items as demonstrated in Figure 4-5.

Another common option is to enable classification. As the default object detector can only detect five classes of object, and it gives them a very generic label, I haven’t turned it on here, and we will “roll our own” labeling of the objects using the image labeling APIs discussed earlier in the chapter. If you want to use more than the base five classes of object, you can do so with a custom TensorFlow model, and we’ll explore using custom models in Chapters 9 through 11.

Step 5: Handling the Button Interaction

When the user touches the button, you’ll want to invoke the object detector, get its response, and from there get the bounding boxes for the objects within the image. Later we’ll also use those bounding boxes to crop the image into the subimage defined by the bounding box, so we can pass that to the labeler. But for now, let’s just implement the object detection handler. It should look something like this:

btn.setOnClickListener {
            val objectDetector = ObjectDetection.getClient(options)
            var image = InputImage.fromBitmap(bitmap!!, 0)
            objectDetector.process(image)
                    .addOnSuccessListener { detectedObjects ->
                        // Task completed successfully
                    }
                    .addOnFailureListener { e ->
                        // Task failed with an exception
                        // ...
                    }
        }

So, similar to what you did earlier with image labeling, the pattern is to create an instance of the object detection API with the options. You’ll then convert the bitmap into an InputImage, and process this with the object detector.

This will return on success with a list of detected objects, or on failure with an exception object.

The detectedObjects returned to the onSuccessListener will contain details about the object including its bounding boxes. So let’s next create a function to draw the bounding boxes on the image.

Step 6: Draw the Bounding Boxes

The easiest way is to extend the Bitmap object to draw rectangles on top of it using a Canvas. We’ll pass the detected object to this, so it can establish the bounding boxes, and from there draw them on top of the bitmap.

Here’s the complete code:

fun Bitmap.drawWithRectangle(objects: List<DetectedObject>):Bitmap?{
    val bitmap = copy(config, true)
    val canvas = Canvas(bitmap)
    var thisLabel = 0
    for (obj in objects){
        thisLabel++
        val bounds = obj.boundingBox
        Paint().apply {
            color = Color.RED
            style = Paint.Style.STROKE
            textSize = 32.0f
            strokeWidth = 4.0f
            isAntiAlias = true
            // draw rectangle on canvas
            canvas.drawRect(
                    bounds,
                    this
            )
            canvas.drawText(thisLabel.toString(),
                            bounds.left.toFloat(),
                            bounds.top.toFloat(), this )
        }

    }
    return bitmap
}

The code will first create a copy of the bitmap, and a new Canvas based on it. It will then iterate through all of the detected objects.

The bounding box returned by ML Kit for the object is in the boundingBox property, so you can get its details with:

val bounds = obj.boundingBox

This can then be used to draw a bounding box using a Paint object on the canvas like this:

canvas.drawRect(
       bounds,
           this
)

The rest of the code just handles things like the color of the rectangle and the size and color of the text, which just contains a number, and as you saw in Figure 4-5, we write 1, 2, 3 on the boxes in the order in which they were detected.

You then call this function within the onSuccessListener like this:

bitmap?.apply{
    img.setImageBitmap(drawWithRectangle(detectedObjects))
}

So, upon a successful return from ML Kit, you’ll now have bounding boxes drawn on the images. Given the limitations of the object detector, you won’t get very useful labels for these boxes, so in the next step, you’ll see how to use image labeling calls to get the details for what is within the bounding box.

Step 7: Label the Objects

The base model, for simplicity, only handles five very generic classes when it comes to labeling the contents of the image. You could use a custom model that is trained on more, or you could use a simple multistep solution. The process is simple—you already have the bounding boxes, so create a new temporary image with just what’s within a bounding box, pass that to the image labeler, and then get the results back. Repeat this for each bounding box (and thus each object), and you’ll get detailed labels for each detected object!

Here’s the complete code:

fun getLabels(bitmap: Bitmap,
              objects: List<DetectedObject>, txtOutput: TextView){
    val labeler = ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)
    for(obj in objects) {
        val bounds = obj.boundingBox
        val croppedBitmap = Bitmap.createBitmap(
            bitmap,
            bounds.left,
            bounds.top,
            bounds.width(),
            bounds.height()
        )
        var image = InputImage.fromBitmap(croppedBitmap!!, 0)

        labeler.process(image)
            .addOnSuccessListener { labels ->
                // Task completed successfully
                var labelText = ""
                if(labels.count()>0) {
                    labelText = txtOutput.text.toString()
                    for (thisLabel in labels){
                        labelText += thisLabel.text + " , "
                    }
                    labelText += "\n"
                } else {
                    labelText = "Not found." + "\n"
                }
                txtOutput.text = labelText.toString()
            }
    }
}

This code loops through each of the detected objects and uses the bounding box to create a new bitmap called croppedBitmap. It will then use an image labeler (called labeler) that is set up with default options to process that new image. On a successful return it will have a number of labels, which it will then write into a comma-separated string that will be rendered in txtOutput. I’ve noticed occasionally that even though it succeeds in labeling, it returns an empty labeled list, so I added code to only construct the string if there are labels within the return.

To call this function, just add this code to the onSuccessListener for the object detection call, immediately after where you called the code to set the rectangles on the bitmap:

getLabels(bitmap, detectedObjects, txtOutput)
Note

When running this code, you are making a number of asynchronous calls, first to the object detector, and later to the image labeler. As a result, you’ll likely see delayed behavior after pressing the button. You’ll likely see the bounding boxes drawn first, and then a few moments later the list of labels will be updated. Android and Kotlin offer a lot of asynchronous functionality to make the user experience here a bit better, but they’re beyond the scope of this book, as I wanted to keep the example simple and focused on what you can do with the functionality present in ML Kit.

Detecting and Tracking Objects in Video

The ML Kit object detector can also operate on video streams, giving you the ability to detect objects in a video and track that object in successive video frames. For example, see Figure 4-6, where I moved the camera across a scene, and the Android figurine was not only detected, and a bounding box given, but a tracking ID was assigned. While the object stayed in the field of view, subsequent frames get different bounding boxes based on the new position, but the tracking ID is maintained—i.e., it was recognized as the same object despite looking different because of the placement within the frame and the different camera angle.

We’ll explore how an app like this can be built using ML Kit in this section. Note that to test this you should use a physical device—the nature of moving the camera around to track devices doesn’t translate well to using the emulator.

There are a lot of steps in building an app like this that aren’t ML-specific, like handling CameraX, using an overlay, and managing drawing of the boxes between frames, etc., that I won’t go into in depth in this chapter, but the book download has the complete code that you can dissect.

Figure 4-6. Using a video-based object detector

Exploring the Layout

Naturally, the layout of an app like the preceding is a little more complex than what we’ve been seeing. It needs you to draw a camera preview, and then, on top of the preview, to draw bounding boxes that update in near real time as you move the camera around the frame to track the object. In this app I used CameraX, a support library in Android that is designed to make using the camera much easier—and it did! You can learn more about CameraX at https://developer.android.com/training/camerax.

Repeat the earlier steps for creating a new Android app. When ready, open the layout file and edit it. For an app like this, you’ll need to use a FrameLayout, which is typically only used for a single item, to block out a particular area of the screen for it, but I like using it in a circumstance like this, where I have two items but one will completely overlay the other:

<FrameLayout android:layout_width="fill_parent"
    android:layout_height="fill_parent"
    android:layout_weight="2"
    android:padding="5dip"
    tools:ignore="MissingConstraints">
    <androidx.camera.view.PreviewView
        android:id="@+id/viewFinder"
        android:layout_width="fill_parent"
        android:layout_height="fill_parent"
        android:layout_weight="1"
        android:layout_gravity="center" />
    <com.odmlbook.liveobjectdetector.GraphicOverlay
        android:id="@+id/graphicOverlay"
        android:layout_gravity="center"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content" />
</FrameLayout>

Within the FrameLayout, the first control is the androidx.camera.view.PreviewView on which the stream of video from the camera will be rendered. On top of this is a custom control called a GraphicOverlay, which, as its name suggests, provides an overlay on top of the Preview on which graphics can be drawn. This overlay control has been adapted from the open source ML Kit sample.

Note that in the listing I’m calling the GraphicOverlay com.odmlbook.liveobjectdetector.GraphicOverlay; this is because the GraphicOverlay from the preceding Google sample was added directly to my app, and I’m using my app’s namespace. You’ll likely have a different namespace, so be sure to use the correct naming for your GraphicOverlay.

I’ve kept the layout as simple as possible so you can focus on the aspects of object detection—so that’s pretty much it—a preview for CameraX on top of which is a GraphicOverlay on which you can draw the bounding boxes. You’ll see more of this a little later.

The GraphicOverlay Class

In the layout you saw a custom GraphicOverlay class. It’s the job of this class to manage a collection of graphic objects—which will be made up of the bounding boxes and their labels—and draw them on a canvas. One thing to note is that often you’ll encounter differences between coordinates between the camera preview (at the camera’s resolution) and a canvas that’s placed on top of it (at the screen resolution) like in this case. Thus, a coordinate translation may also be necessary for you to draw on top of the preview in the appropriate place. You can find the code for that, as well as for managing performance of drawing graphics when operating frame by frame, in the GraphicOverlay class. The bounding boxes, represented as graphic objects, will simply be added in the onDraw event:

    @Override
    protected void onDraw(Canvas canvas) {
        super.onDraw(canvas);

        synchronized (lock) {
            updateTransformationIfNeeded();

            for (Graphic graphic : graphics) {
                graphic.draw(canvas);
            }
        }
    }

Capturing the Camera

When using CameraX, you access a camera provider, which will then allow you to set various subproviders on it, including a surface provider that lets you define where to put the preview, as well as an analyzer that lets you do stuff with the frames that come in from the camera. These are perfect for our needs—the surface provider can give us the preview window, and the analyzer can be used to call the ML Kit object detector. In the MainActivity for the app, you will find this code (in the startCamera() function).

First, we set up the preview view (notice that the control in the layout listing was called viewFinder) to render the stream of frames from the camera:

val preview = Preview.Builder()
    .build()
    .also {
        it.setSurfaceProvider(viewFinder.surfaceProvider)
    }

Next comes the image analyzer. CameraX calls this frame by frame, giving you the ability to do some kind of processing on the image. This is perfect for our needs. When you call setAnalyzer, you specify a class that will handle the analysis. Here I specified a class called ObjectAnalyzer, which, as its name suggests, will use the object detection APIs with the frame:

val imageAnalyzer = ImageAnalysis.Builder()
    .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
    .build()
    .also {
        it.setAnalyzer(cameraExecutor, ObjectAnalyzer(graphicOverlay))
    }

Then, once you have these, you can bind them to the life cycle of the camera so that CameraX knows to use them to render the preview and manage frame-by-frame processing respectively:

cameraProvider.bindToLifecycle(
    this, cameraSelector, preview, imageAnalyzer
)

You can learn more about the life cycle of camera applications using CameraX in the CameraX documentation. I just want to highlight the important parts when it comes to using object detection with it here.

The ObjectAnalyzer Class

The full code for this class is in the book’s repo. I recommend you clone that and use it to understand how object analysis works for tracking objects in video. This section just shows the important parts of the code, and won’t really work for coding along!

Earlier you saw that you could hook into CameraX’s analyzer ability to do the object detection, and we specified a class called ObjectAnalyzer to handle it. We also passed a reference to the graphic overlay to this class.

An analyzer class has to override ImageAnalysis.Analyzer, so the signature for this class should look something like:

public class ObjectAnalyzer(graphicOverlay: GraphicOverlay) :
                            ImageAnalysis.Analyzer {}

It’s the job of this class to do the object detection, so we’ll need to create our ObjectDetector instance as we did before:

val options =
           ObjectDetectorOptions.Builder()
                   .setDetectorMode(ObjectDetectorOptions.STREAM_MODE)
                   .enableMultipleObjects()
                   .enableClassification()
                   .build()
   val objectDetector = ObjectDetection.getClient(options)

Note the difference in the detector mode setting though—ObjectDetectorOptions.STREAM_MODE—it’s now using stream mode because we’re going to be streaming images to it. This turns on the object tracking feature that we saw in Figure 4-6 where it “remembers” the same object across different frames, even if it looks different because of camera placement.

When you create an analyzer class like this, you’ll need to override the function analyze, which takes an ImageProxy object representing the image. To use a CameraX image with the image proxy, there’s some processing you’ll need to do to manage the rotation, etc. I won’t go into the detail on that here, but the important thing to manage is if the camera is providing frames in landscape or portrait mode, in which case we need to inform the overlay about the appropriate height and width of the image, flipping them where necessary—so that the ML Kit API always receives images in the same orientation:

if (rotationDegrees == 0 || rotationDegrees == 180) {
    overlay.setImageSourceInfo(
        imageProxy.width, imageProxy.height, isImageFlipped
    )
} else {
    overlay.setImageSourceInfo(
        imageProxy.height, imageProxy.width, isImageFlipped
    )
}

Then, we can pass the frame to the object detector, and if we get success, the callback will have detected objects like before. At this point we should clear the overlay, and then add new graphic objects to the overlay for each of the detected objects. These graphic objects are a custom class within this app. You’ll see them in a moment. Once we’re done, we call postInvalidate() on the overlay, which will trigger a redraw of the overlay:

objectDetector.process(frame)
    .addOnSuccessListener { detectedObjects ->
    overlay.clear()
          for (detectedObject in detectedObjects){
               val objGraphic = ObjectGraphic(this.overlay, detectedObject)
               this.overlay.add(objGraphic)
          }
          this.overlay.postInvalidate()
}

The ObjectGraphic Class

As the bounding boxes are composed of three elements—the box, the text of the label, and the background for the label—instead of just drawing each one of these individually, a single class is used to represent each. This class will be initialized using the detectedObject that is returned from ML Kit, so we can get the tracking ID and the coordinates of the bounding box. The ObjectGraphic class manages all of this—you can see it being used in the preceding code, where a new instance of it is created using the overlay and the detectedObject.

Putting It All Together

That’s generally how an app like this would work. Using CameraX, you specify a preview surface and an analyzer. The analyzer calls the ML Kit object detector with stream mode enabled. The detected objects that it returns are used to create objects that represent the bounding boxes, and these are added to the overlay. This uses the generic model in ML Kit, so there’s not much by way of classification—just that it detected an object and that object is assigned an ID. To further classify each object detected, you’ll need a custom model, and we’ll discuss that in Chapter 9.

Summary

Building apps that use vision is very straightforward using ML Kit for Android. In this chapter, you explored several scenarios for this using the built-in generic models, including image classification and labeling, where a single image can have its contents determined by the computer, and then using object detection, where multiple images within an image can be detected and their location determined by a bounding box. You wrapped up the chapter with a brief exploration on how this could be extended to video—where not only would you detect an object, but you could also track it in real time. All these scenarios were based on the generic built-in models in ML Kit but could easily be extended with custom models. We’ll explore that more in Chapter 9.

Get AI and Machine Learning for On-Device Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.