Chapter 4. Vision

This chapter explores the practical side of implementing vision-related artificial intelligence (AI) features in your Swift apps. Taking a top-down approach, we explore seven vision tasks, and how to implement them by using Swift and various AI tools.

Practical AI and Vision

Here are the seven practical AI tasks related to vision that we explore in this chapter:

Face detection

This uses image analysis techniques to count faces in an image and perform various actions with that information, such as applying other images on top of the face, with the correct rotation.

Barcode detection

This uses Apple’s frameworks to find barcodes in images.

Saliency detection

This task finds the most salient area of an image using Apple’s frameworks.

Image similarity

How similar are two images? We build an app that lets the user pick two images and determine how similar they are.

Image classification

Classification is a classic AI problem. We build a classification app than can tell us what we’ve taken a photo of.

Drawing recognition

Recognition is basically classification, no matter what you’re classifying, but in the interest of exploring a breadth of practical AI topics with you, here we build an app that lets you take a photo of a line-drawing and identify the drawing.

Style classification

We update our Image Classification app to support identifying the style of a supplied image by converting a model built with another set of tools into Apple’s CoreML format.

Note

We’ve called this chapter “Vision,” but it’s not solely about the framework that Apple provides for vision-related programming which, helpfully, is also called Vision. We do use Vision a fair bit, throughout the book, though! Check the Index for details.

Task: Face Detection

Whether you need to check if there is, in fact, a face present to help a user validate and verify their profile photo, or you want to actually start drawing things on top of a supplied photo SnapChat-style, face detection is a useful feature for lots of apps.

For the first task, we’re going to look at how easy it is to add practical face detection features to your Swift iOS apps. We’re going to do this without any model training, using Apple’s provided AI frameworks (“Apple’s Other Frameworks”).

Because of this, this task is a little different from many of the others in this book in that the toolkit for performing face recognition is largely provided by Apple. We follow a similar process, using Apple’s frameworks, in “Task: Image Similarity” and “Task: Speech Recognition”, among others.

You could go and train a model that understands what a face is, but Apple has done the work for you: look no further than the camera app on iOS, and how it can identify a face.

Problem and Approach

Much like many of the practical AI tasks in this book, face detection is everywhere. The authors’ collective favorite media depiction of facial detection is in the fabulously forward-looking fictional TV show, Person of Interest.

Tip

Seriously, we cannot recommend Person of Interest more highly. Stop reading this and go watch it and then come back and continue reading. We’ll still be here.

In this task, we’re going to explore the practical side of face detection by doing the following:

  • Making an app that can detect human faces in images, allowing us to confirm that a user has supplied a useful profile picture

  • Using Apple’s tools for doing this without training a model

  • Exploring the next steps for improved face detection

We’re going to build an app that can count the number of faces in a photo chosen by the user. You can see the app in Figure 4-1.

Building the App

We’re going to use Apple’s newest user interface (UI) framework, SwiftUI, to build the user interface for this app.

We use both SwiftUI and UIKit for different examples in this book to give you a practical grasp of the use of both of Apple’s iOS UI frameworks in building AI-driven apps. We often chose which framework to use fairly arbitrarily, just like in the real world (don’t tell clients that, though).

pais 0401
Figure 4-1. The final version of our face counting app

The final form of the app in Figure 4-1 consists of the following SwiftUI components:

  • A NavigationView in which to display the title of the app as well as the button to select a photo

  • An Image to display the chosen image in which the app will count the faces

  • A Button to trigger the face counting

  • Some Text to display the count of faces

Tip

If you need a refresher on SwiftUI, check Apple’s documentation as well as our website.

However, we construct this view from multiple subviews, and the way we do this might be a little unfamiliar compared to how we use SwiftUI elsewhere in this book. We’ve done this to help demonstrate the breadth of approaches that you can take to constructing a UI (for much the same reason as we use SwiftUI and UIKit, for different practical examples, throughout the book). This approach gives you maximum exposure to the real-world ways of doing things.

Tip

If you don’t want to manually build the face-counting iOS app, you can download the code from our website; look for the project named FDDemo-Starter. After you have that, follow along through the rest of this section (we don’t recommend skipping it) and then meet us at “What Just Happened? How Does This Work?”.

To make the face counting iOS app yourself, you’ll need to do the following:

  1. Fire up Xcode.

  2. Create an iOS app project in Xcode, choosing the “Single View App” template. The project should be Swift and the SwiftUI checkbox should be selected, as shown in Figure 4-2.

pais 0402
Figure 4-2. Creating a new project with SwiftUI
  1. Add a new Swift file called Faces.swift to the project (File menu → New → File), and add the following imports:

    import UIKit
    import Vision

    Nothing particularly interesting here: we’re importing UIKit because we’re using UIImage, and it comes with it, and we’re importing Vision, because that’s the Apple framework that we’ll be using to detect faces.

  2. Below the imports, add the following extension on UIImage:

    extension UIImage {
        func detectFaces(completion: @escaping ([VNFaceObservation]?) -> ()) {
    
            guard let image = self.cgImage else { return completion(nil) }
            let request = VNDetectFaceRectanglesRequest()
    
            DispatchQueue.global().async {
                let handler = VNImageRequestHandler(
                    cgImage: image,
                    orientation: self.cgImageOrientation
                )
    
                try? handler.perform([request])
    
                guard let observations =
                    request.results as? [VNFaceObservation] else {
                        return completion(nil)
                }
    
                completion(observations)
            }
        }
    }

This extension on UIImage adds a detectFaces function to UIImage, allowing us to ask any UIImage to detect the faces in it. The code within the function creates VNDetectFaceRectanglesRequest and dispatches it on a queue.

What does VNDetectFaceRectanglesRequest do? It returns the bounding box (rectangular box) for any faces detected in the image that it’s analyzing. You can learn more about it in Apple’s documentation. We run the VNDetectFaceRectanglesRequest as part of a VNImageRequestHandler, which is an object that allows us to run image analysis requests.

Note

This book isn’t here to teach Swift, but just in case you need a reminder: an extension allows you to add new functionality to existing classes, structures, enumerations, or protocols. This new functionality, as you might have guessed, includes functions. You can read more about extensions in Swift in the Swift documentation.

The call to DispatchQueue.global().async { } allows us to run the call to VNImageRequestHandler (in which we run our VNDetectFaceRectanglesRequest) on a global thread so that our UI is not locked. You can learn more about the DispatchQueue class in Apple’s documentation.

Next, create a new file in the project (ours is called Views.swift), which we use to define some SwiftUI elements for our app:

  1. import SwiftUI and then add a new View struct, called TwoStateButton:

    struct TwoStateButton: View {
        private let text: String
        private let disabled: Bool
        private let background: Color
        private let action: () -> Void
    
    }

    The TwoStateButton struct defines a Button that can be enabled or disabled, change color, and otherwise do button-y things. Very useful.

  2. The TwoStateButton will also need a body:

    var body: some View {
        Button(action: action) {
            HStack {
                Spacer()
                Text(text).font(.title).bold().foregroundColor(.white)
                Spacer()
                }.padding().background(background).cornerRadius(10)
            }.disabled(disabled)
    }

    The body handles the drawing of the TwoStateButton (which actually just draws a Button and some Text, based on the values of the variables).

  3. It will also need an init() function:

    init(text: String,
        disabled: Bool,
        background: Color = .blue,
        action: @escaping () -> Void) {
    
        self.text = text
        self.disabled = disabled
        self.background = disabled ? .gray : background
        self.action = action
    }

    The init() function initializes a new ThreeStateButton to certain parameters (text, whether it’s disabled, a background color, and an action when the button is pressed).

  4. Next, create another View struct, called MainView:

     struct Main View: View {
         private let image: Ullmage
         private let text: String
         private let button: TwoStateButton

    This View has some variables to store a UIImage, a String, and a TwoStateButton (which we created a moment ago!).

  5. The MainView will need a body:

    var body: some View {
        VStack {
            Image(uiImage: image)
                .resizable()
                .aspectRatio(contentMode: .fit)
            Spacer()
            Text(text).font(.title).bold()
            Spacer()
            self.button
        }
    }

    The body draws an Image, some Spacers, some Text, and a TwoStateButton (defined by the variable).

  6. The MainView will also need an init():

    init(image: UIImage, text: String, button: () -> TwoStateButton) {
        self.image = image
        self.text = text
        self.button = button()
    }

    The init() function creates the MainView, setting the value of the image, the text, and the button.

  7. We also need to add a rather long struct, inheriting from UIViewControllerRepresentable, in order to be able to summon a UIImagePicker, which is part of the older UIKit framework, from within SwiftUI:

    struct ImagePicker: UIViewControllerRepresentable {
        typealias UIViewControllerType = UIImagePickerController
        private(set) var selectedImage: UIImage?
        private(set) var cameraSource: Bool
        private let completion: (UIImage?) -> ()
    
        init(camera: Bool = false, completion: @escaping (UIImage?) -> ()) {
            self.cameraSource = camera
            self.completion = completion
        }
    
        func makeCoordinator() -> ImagePicker.Coordinator {
            let coordinator = Coordinator(self)
            coordinator.completion = self.completion
            return coordinator
        }
    
        func makeUIViewController(context: Context)
            -> UIImagePickerController {
    
            let imagePickerController = UIImagePickerController()
            imagePickerController.delegate = context.coordinator
            imagePickerController.sourceType =
                cameraSource ? .camera : .photoLibrary
    
            return imagePickerController
        }
    
        func updateUIViewController(
            _ uiViewController: UIImagePickerController, context: Context) {}
    
        class Coordinator: NSObject, UIImagePickerControllerDelegate,
            UINavigationControllerDelegate {
    
            var parent: ImagePicker
            var completion: ((UIImage?) -> ())?
    
            init(_ imagePickerControllerWrapper: ImagePicker) {
                self.parent = imagePickerControllerWrapper
            }
    
            func imagePickerController(_ picker: UIImagePickerController,
                didFinishPickingMediaWithInfo info:
                    [UIImagePickerController.InfoKey: Any]) {
    
                print("Image picker complete...")
    
                let selectedImage =
                    info[UIImagePickerController.InfoKey.originalImage]
                        as? UIImage
    
                picker.dismiss(animated: true)
                completion?(selectedImage)
            }
    
            func imagePickerControllerDidCancel(
                    _ picker: UIImagePickerController) {
    
                print("Image picker cancelled...")
                picker.dismiss(animated: true)
                completion?(nil)
            }
        }
    }

    This is a lot of code that allows SwiftUI to provide enough of UIKit’s functionality to summon a UIImagePicker.

    You can learn more about UIViewControllerRepresentable in Apple’s documentation: you use it to fake the abilities of a UIKit view when you’re using SwiftUI. Essentially, it’s a way to bridge features of the older UI framework with the new one.

  1. Finally, still in Views.swift, we need to add an extension to UIImage that allows us to manipulate the orientation as needed:

    extension UIImage {
        func fixOrientation() -> UIImage? {
            UIGraphicsBeginImageContext(self.size)
            self.draw(at: .zero)
            let newImage = UIGraphicsGetImageFromCurrentImageContext()
            UIGraphicsEndImageContext()
            return newImage
        }
    
        var cgImageOrientation: CGImagePropertyOrientation {
            switch self.imageOrientation {
                case .up: return .up
                case .down: return .down
                case .left: return .left
                case .right: return .right
                case .upMirrored: return .upMirrored
                case .downMirrored: return .downMirrored
                case .leftMirrored: return .leftMirrored
                case .rightMirrored: return .rightMirrored
            }
        }
    }

    Next, we move over to ContentView.swift:

  2. First, update the imports as follows:

    import SwiftUI
    import Vision
    Tip

    ContentView.swift is, kind of, sort of, the equivalent of a ViewController in UIKit, but for SwiftUI.

  3. Add an extension on ContentView to the end of the ContentView.swift file:

    extension ContentView {
    
    }
  4. Within, add a function to return our main view:

    private func mainView() -> AnyView {
        return AnyView(NavigationView {
            MainView(
                image: image ?? placeholderImage,
                text: "\(faceCount) face\(faceCount == 1 ? "" : "s")") {
                    TwoStateButton(
                        text: "Detect Faces",
                        disabled: !detectionEnabled,
                        action: getFaces
                    )
            }
            .padding()
            .navigationBarTitle(Text("FDDemo"), displayMode: .inline)
            .navigationBarItems(
                leading: Button(action: summonImagePicker) {
                    Text("Select")
                },
                trailing: Button(action: summonCamera) {
                    Image(systemName: "camera")
                }.disabled(!cameraEnabled)
            )
        })
    }

    This function not only returns our main view, but also creates it. SwiftUI magic!

  5. Add a function to return the image picker:

    private func imagePickerView() -> AnyView {
        return  AnyView(ImagePicker { result in
            self.controlReturned(image: result)
            self.imagePickerOpen = false
        })
    }
  6. And add a function to return a camera view:

    private func cameraView() -> AnyView {
        return  AnyView(ImagePicker(camera: true) { result in
            self.controlReturned(image: result)
            self.cameraOpen = false
        })
    }
  7. Back near the top, add some @State variables to the ContentView:

    struct ContentView: View {
        @State private var imagePickerOpen: Bool = false
        @State private var cameraOpen: Bool = false
        @State private var image: UIImage? = nil
        @State private var faces: [VNFaceObservation]? = nil
    
    }

    These define the things that can change: whether the image picker is open, whether the camera is open, the image itself, and the faces detected.

Note

You can learn more about States in the SwiftUI documentation.

  1. Add some private variables, too:

    private var faceCount: Int { return faces?.count ?? 0 }
    private let placeholderImage = UIImage(named: "placeholder")!
    
    private var cameraEnabled: Bool {
        UIImagePickerController.isSourceTypeAvailable(.camera)
    }
    
    private var detectionEnabled: Bool { image != nil && faces == nil }

    These store the face count, the placeholder image (displayed until the user chooses an image), the availability of a camera, and whether detection (which is reflected in the availability of the button) is enabled.

  2. Update the body to look as follows:

    var body: some View {
        if imagePickerOpen { return imagePickerView() }
        if cameraOpen { return cameraView() }
        return mainView()
    }

    The body View returns the image picker if the image picker should be open, the camera likewise; otherwise, it returns mainView(), which is the function that we added to the ContentView by way of an extension, earlier.

  3. Add a function to getFaces():

    private func getFaces() {
        print("Getting faces...")
        self.faces = []
        self.image?.detectFaces { result in
            self.faces = result
        }
    }

    This function calls the detectFaces() function, which we added earlier, as an extension on UIImage in the Faces.swift file, calling it on the current image.

  4. We also need a function to display the image picker:

    private func summonImagePicker() {
        print("Summoning ImagePicker...")
        imagePickerOpen = true
    }
  5. As well as the camera:

    private func summonCamera() {
        print("Summoning camera...")
        cameraOpen = true
    }

    Add a launch screen and icon if you want, and launch your app! You can select a photo from the photo library or take a photo if you’re running it on a real device, press the Detect Faces button, and the app will tell you how many faces it finds. You can see it working earlier, in Figure 4-1.

What Just Happened? How Does This Work?

There’s not much to say here. We’re building an app that can detect faces. For our first pass, we’ve used SwiftUI to create an iOS app that lets the user select a photo from their library, or take a new photo, and count the faces in it. As we said, not much to say.

We didn’t have to train any machine-learning models to do this as we made use of Apple’s supplied frameworks. If you’re curious about how Apple’s frameworks might work, we discuss that later in Chapter 11.

But what if we want to do more?

Improving the App

In this section, we improve our face-counting app to not only count the faces in a chosen image, but draw a box around them, as well, as shown earlier, in Figure 4-1.

You’ll need to have completed the app described in “Building the App” to follow from here. If you don’t want to do that, or need a clean starting point, you can download the resources for this book from our website and find the project FDDemo-Starter.

If you don’t want to follow the instructions in this section, you can also find the project FDDemo-Completed, which is the end result of this section. If you go down that route, we strongly recommend reading the code as we discuss it in this section and comparing it with the code in FDDemo-Completed so that you understand what we’re adding.

There are not too many code changes to make here, so let’s get started and get those boxes drawn around some faces:

  1. Open the Faces.swift file and add the following extension on Collection below the existing extension:

    extension Collection where Element == VNFaceObservation {
    
    }
  2. The extension to Collection is valid only where the elements of the Collection are of type VNFaceObservation.

  3. Within this extension add the following:

    func drawnOn(_ image: UIImage) -> UIImage? {
        UIGraphicsBeginImageContextWithOptions(image.size, false, 1.0)
    
        guard let context = UIGraphicsGetCurrentContext() else {
            return nil
        }
    
        image.draw(in: CGRect(
            x: 0,
            y: 0,
            width: image.size.width,
            height: image.size.height))
    
        context.setStrokeColor(UIColor.red.cgColor)
        context.setLineWidth(0.01 * image.size.width)
    
        let transform = CGAffineTransform(scaleX: 1, y: -1)
            .translatedBy(x: 0, y: -image.size.height)
    
        for observation in self {
            let rect = observation.boundingBox
    
            let normalizedRect =
                VNImageRectForNormalizedRect(rect,
                    Int(image.size.width),
                    Int(image.size.height))
                .applying(transform)
    
            context.stroke(normalizedRect)
        }
    
        let result = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
    
        return result
    }
  4. This extension on Collection allows us to work with the VNFaceObservations we get back and adds a function called drawnOn(), which draws a box around each face in the image.

  5. Update the getFaces() function in ContentView.swift to call the new drawnOn() function we added a moment ago:

    private func getFaces() {
        print("Getting faces...")
        self.faces = []
        self.image?.detectFaces { result in
            self.faces = result
    
            if let image = self.image,
            let annotatedImage = result?.drawnOn(image) {
                self.image =  annotatedImage
            }
        }
    }
    Note

    You might be wondering why we’re using extensions for everything. We’re doing it for a couple of reasons, but first and foremost we’re doing it to make sure our code is split up into relatively easily digestible pieces. We don’t want to overcomplicate things by having enormous classes. There’s enough code to digest already.

You can now run your app, choose an image, tap the button, and observe that any faces in the image have a box around them, as shown in Figure 4-3.

pais 0403
Figure 4-3. The improved face detector

Even More Improvements

We’d normally quit while we we’re ahead and talk about how and why everything works at this point, but we’re not going to do that here. Face detection is just too much fun. So far in this chapter, we’ve looked at how you can build an app that counts faces in a supplied image and then modified the app to draw a red box around the faces it detected.

In this section, let’s take that a step further, and render an emoji on top of detected faces. You can’t get much more practical than that, as shown in Figure 4-4.

pais 0404
Figure 4-4. The face detection app applying an emoji on top of faces

You’ll need to have completed the app described in “Improving the App” to follow from here. If you don’t want to do that or need a clean starting point, you can download the resources for this book from our website, and find the project FDDemo-Complete. We build on the app from that point.

If you don’t want to follow the instructions in this section, you can also find the project FDDemo-Improved, which is the end result of this section. If you go down that route, we strongly recommend reading the code as we discuss it in this section and comparing it with the code in FDDemo-Improved so that you understand what we’re adding.

The only changes we need to make this time occur in Faces.swift:

  1. Below the detectFaces() function, add a new function named rotatedBy() to the extension we created on UIImage:

    func rotatedBy(degrees: CGFloat, clockwise: Bool = false) -> UIImage? {
        var radians = (degrees) * (.pi / 180)
    
        if !clockwise {
            radians = -radians
        }
    
        let transform = CGAffineTransform(rotationAngle: CGFloat(radians))
    
        let newSize = CGRect(
            origin: CGPoint.zero,
            size: self.size
            ).applying(transform).size
    
        let roundedSize = CGSize(
            width: floor(newSize.width),
            height: floor(newSize.height))
    
        let centredRect = CGRect(
            x: -self.size.width / 2,
            y: -self.size.height / 2,
            width: self.size.width,
            height: self.size.height)
    
        UIGraphicsBeginImageContextWithOptions(
            roundedSize,
            false,
            self.scale)
    
        guard let context = UIGraphicsGetCurrentContext() else {
            return nil
        }
    
        context.translateBy(
            x: roundedSize.width / 2,
            y: roundedSize.height / 2
        )
    
        context.rotate(by: radians)
        self.draw(in: centredRect)
    
        let result = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
    
        return result
    }

    This function returns a UIImage that’s been rotated by the degrees specified as a CGFloat, in either a clockwise or counterclockwise direction.

  2. Add an extension on VNFaceLandmarks2D, which contains a function anchorPointInImage() that allows us to center each set of points that may have been detected in a face (representing eyes, eyebrows, lips, and such):

    extension VNFaceLandmarks2D {
        func anchorPointInImage(_ image: UIImage) ->
            (center: CGPoint?, angle: CGFloat?) {
    
            // centre each set of points that may have been detected, if
            // present
            let allPoints =
                self.allPoints?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let leftPupil =
                self.leftPupil?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let leftEye =
                self.leftEye?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let leftEyebrow =
                self.leftEyebrow?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let rightPupil =
                self.rightPupil?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let rightEye =
                self.rightEye?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let rightEyebrow =
                self.rightEyebrow?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let outerLips =
                self.outerLips?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let innerLips =
                self.innerLips?.pointsInImage(imageSize: image.size)
                .centerPoint
    
            let leftEyeCenter = leftPupil ?? leftEye ?? leftEyebrow
            let rightEyeCenter = rightPupil ?? rightEye ?? rightEyebrow
            let mouthCenter = innerLips ?? outerLips
    
            if let leftEyePoint = leftEyeCenter,
                let rightEyePoint = rightEyeCenter,
                let mouthPoint = mouthCenter {
    
                let triadCenter =
                    [leftEyePoint, rightEyePoint, mouthPoint]
                    .centerPoint
    
                let eyesCenter =
                    [leftEyePoint, rightEyePoint]
                    .centerPoint
    
                return (eyesCenter, triadCenter.rotationDegreesTo(eyesCenter))
            }
    
            // else fallback
            return (allPoints, 0.0)
        }
    }
    Note

    VNFaceLandmarks2D represents all of the landmarks that Apple’s Vision framework can detect in a face, exposed as properties. You can learn more about it in Apple’s documentation.

  3. We also need an extension on CGRect that returns a CGRect centered on a CGPoint provided:

    extension CGRect {
        func centeredOn(_ point: CGPoint) -> CGRect {
            let size = self.size
            let originX = point.x - (self.width / 2.0)
            let originY = point.y - (self.height / 2.0)
            return CGRect(
                x: originX,
                y: originY,
                width: size.width,
                height: size.height
            )
        }
    }
  4. While we’re at it, let’s add an extension on CGPoint:

    extension CGPoint {
        func rotationDegreesTo(_ otherPoint: CGPoint) -> CGFloat {
            let originX = otherPoint.x - self.x
            let originY = otherPoint.y - self.y
    
            let degreesFromX = atan2f(
                Float(originY),
                Float(originX)) * (180 / .pi)
    
            let degreesFromY = degreesFromX - 90.0
    
            let normalizedDegrees = (degreesFromY + 360.0)
                .truncatingRemainder(dividingBy: 360.0)
    
            return CGFloat(normalizedDegrees)
        }
    }

    This extension adds a function called rotationDegreesTo() that returns some degrees to rotate by, given another point. This helps orient facial features with the emoji we’ll be drawing on the face.

  5. We also need an extension on Array, for arrays of CGPoints:

    extension Array where Element == CGPoint {
        var centerPoint: CGPoint {
            let elements = CGFloat(self.count)
            let totalX = self.reduce(0, { $0 + $1.x })
            let totalY = self.reduce(0, { $0 + $1.y })
            return CGPoint(x: totalX / elements, y: totalY / elements)
        }
    }

    This adds a function, centerPoint(), which returns a CGPoint for an array of points.

  6. Because we’re working with emojis, which are actually text, we also need an extension on String:

    extension String {
        func image(of size: CGSize, scale: CGFloat = 0.94) -> UIImage? {
            UIGraphicsBeginImageContextWithOptions(size, false, 0)
            UIColor.clear.set()
            let rect = CGRect(origin: .zero, size: size)
            UIRectFill(CGRect(origin: .zero, size: size))
            (self as AnyObject).draw(
                in: rect,
                withAttributes: [
                    .font: UIFont.systemFont(ofSize: size.height * scale)
                ]
            )
    
            let image = UIGraphicsGetImageFromCurrentImageContext()
    
            UIGraphicsEndImageContext()
    
            return image
        }
    }

    This allows us to get a UIImage from a String, which is useful because we want to be able to display emojis on top of an image, and we want those emojis to be images.

  7. Replace the extension on Collection with the following:

    extension Collection where Element == VNFaceObservation {
        func drawnOn(_ image: UIImage) -> UIImage? {
    
            UIGraphicsBeginImageContextWithOptions(image.size, false, 1.0)
            guard let _ = UIGraphicsGetCurrentContext() else { return nil }
    
            image.draw(in: CGRect(
                x: 0,
                y: 0,
                width: image.size.width,
                height: image.size.height)
            )
    
            let imageSize: (width: Int, height: Int) =
                (Int(image.size.width), Int(image.size.height))
    
            let transform = CGAffineTransform(scaleX: 1, y: -1)
                .translatedBy(x: 0, y: -image.size.height)
    
            let padding: CGFloat = 0.3
    
            for observation in self {
                guard let anchor =
                    observation.landmarks?.anchorPointInImage(image) else {
                        continue
                }
    
                guard let center = anchor.center?.applying(transform) else {
                    continue
                }
    
                let overlayRect = VNImageRectForNormalizedRect(
                    observation.boundingBox,
                    imageSize.width,
                    imageSize.height
                ).applying(transform).centeredOn(center)
    
                let insets = (
                    x: overlayRect.size.width * padding,
                    y: overlayRect.size.height * padding)
    
                let paddedOverlayRect = overlayRect.insetBy(
                    dx: -insets.x,
                    dy: -insets.y)
    
                let randomEmoji = [
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    ""
                ].randomElement()!
    
                if var overlayImage = randomEmoji
                    .image(of: paddedOverlayRect.size) {
    
                    if let angle = anchor.angle,
                        let rotatedImage = overlayImage
                            .rotatedBy(degrees: angle) {
    
                        overlayImage = rotatedImage
                    }
    
                    overlayImage.draw(in: paddedOverlayRect)
                }
            }
    
            let result = UIGraphicsGetImageFromCurrentImageContext()
            UIGraphicsEndImageContext()
    
            return result
        }
    }
    

    To cut a long story short, this extension (and its new drawnOn() function) draws a random emoji on top of the face.

And with that, we’re done. You can launch your app, choose an image, and watch it apply a random emoji to the faces detected in the image. Show your friends and family and annoy them with it; we’ll be here when you get back. You can see an example of the final app in Figure 4-5.

pais 0404
Figure 4-5. Our final face detector, replete with emoji

Task: Barcode Detection

We’re not going to unpack this task much, especially not after looking at “Task: Face Detection”, because it’s both similar to face detection and very simple.

We’re going to do this one in a Playground because it’s so simple to step through. It does require a fair bit of boilerplate code, though:

  1. Fire up Xcode and create a new iOS-flavor Playground, as shown in Figure 4-6.

    pais 0406
    Figure 4-6. Creating a new iOS-flavor Playground in Xcode
  1. Add a new source file called Extensions.swift to the Playground. In Extensions.swift, import the following:

    import UIKit
Tip

To find this code in our resources, head to our website, download the resources, and find the Playground in the BarcodeAndSaliencyDetection folder.

  1. Add the following extension on CGSize:

    public extension CGSize {
        func scaleFactor(to size: CGSize) -> CGFloat {
            let horizontalScale =  self.width / size.width
            let verticalScale =  self.height / size.height
    
            return max(horizontalScale, verticalScale)
        }
    }

    This extension will allow us to call our function, scaleFactor(), on a CGSize, to return the scaling factor that would make the CGRect fit in a box of the indicated size.

  2. Add an extension on CGRect:

    public extension CGRect {
        func scaled(by scaleFactor: CGFloat) -> CGRect {
            let horizontalInsets =
                (self.width - (self.width * scaleFactor)) / 2.0
            let verticalInsets =
                (self.height - (self.height * scaleFactor)) / 2.0
    
            let edgeInsets = UIEdgeInsets(
                top: verticalInsets,
                left: horizontalInsets,
                bottom: verticalInsets,
                right: horizontalInsets
            )
    
            let leftOffset = min(self.origin.x + horizontalInsets, 0)
            let upOffset = min(self.origin.y + verticalInsets, 0)
    
            return self
                .inset(by: edgeInsets)
                .offsetBy(dx: -leftOffset, dy: -upOffset)
        }
    
        func cropped(to size: CGSize, centering: Bool = true) -> CGRect {
            if centering {
                let horizontalDifference = self.width - size.width
                let verticalDifference = self.height - size.height
                let newOrigin = CGPoint(
                    x: self.origin.x + (horizontalDifference / 2.0),
                    y: self.origin.y + (verticalDifference / 2.0)
                )
                return CGRect(
                    x: newOrigin.x,
                    y: newOrigin.y,
                    width: size.width,
                    height: size.height
                )
            }
    
            return CGRect(x: 0, y: 0, width: size.width, height: size.height)
        }
    }

    This extension allows us to call scaled() on a CGRect to likewise scale it by a size (a scale factor), or call cropped() on a CGRect to crop it to a specified CGSize.

  3. Create an extension on UIImage:

    public extension UIImage {
        var width: CGFloat {
            return self.size.width
        }
    
        var height: CGFloat {
            return self.size.height
        }
    
        var rect: CGRect {
            return CGRect(x: 0, y: 0, width: self.width, height: self.height)
        }
    
        var invertTransform: CGAffineTransform {
            return CGAffineTransform(scaleX: 1, y: -1)
                .translatedBy(x: 0, y: -self.height)
        }
    
    }

    This extension has a few variables to store width and height and the like.

  4. Within the UIImage extension, we need to add some code to properly handle the orientation of the image:

    var cgImageOrientation: CGImagePropertyOrientation {
        switch self.imageOrientation {
            case .up: return .up
            case .down: return .down
            case .left: return .left
            case .right: return .right
            case .upMirrored: return .upMirrored
            case .downMirrored: return .downMirrored
            case .leftMirrored: return .leftMirrored
            case .rightMirrored: return .rightMirrored
        }
    }
  5. Crop the image, based on a CGSize:

    func cropped(to size: CGSize, centering: Bool = true) -> UIImage? {
        let newRect = self.rect.cropped(to: size, centering: centering)
        return self.cropped(to: newRect, centering: centering)
    }
  6. And based on a CGRect:

    func cropped(to rect: CGRect, centering: Bool = true) -> UIImage? {
        let newRect = rect.applying(self.invertTransform)
        UIGraphicsBeginImageContextWithOptions(newRect.size, false, 0)
    
        guard let cgImage = self.cgImage,
            let context = UIGraphicsGetCurrentContext() else { return nil }
    
        context.translateBy(x: 0.0, y: self.size.height)
        context.scaleBy(x: 1.0, y: -1.0)
    
        context.draw(
            cgImage,
            in: CGRect(
                x: -newRect.origin.x,
                y: newRect.origin.y,
                width: self.width,
                height: self.height),
            byTiling: false)
    
        context.clip(to: [newRect])
    
        let croppedImage = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
    
        return croppedImage
    }
  7. Scale the image by using a CGFloat:

    func scaled(by scaleFactor: CGFloat) -> UIImage? {
        if scaleFactor.isZero { return self }
    
        let newRect = self.rect
            .scaled(by: scaleFactor)
            .applying(self.invertTransform)
    
        UIGraphicsBeginImageContextWithOptions(newRect.size, false, 0)
    
        guard let cgImage = self.cgImage,
            let context = UIGraphicsGetCurrentContext() else { return nil }
    
        context.translateBy(x: 0.0, y: newRect.height)
        context.scaleBy(x: 1.0, y: -1.0)
        context.draw(
            cgImage,
            in: CGRect(
                x: 0,
                y: 0,
                width: newRect.width,
                height: newRect.height),
            byTiling: false)
    
        let resizedImage = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
    
        return resizedImage
    }
  8. Back in the main body of the Playground, import the following:

    import UIKit
    import Vision
  9. Create an extension on VNImageRequestHandler with a convenience initializer:

    extension VNImageRequestHandler {
        convenience init?(uiImage: UIImage) {
            guard let cgImage = uiImage.cgImage else { return nil }
            let orientation = uiImage.cgImageOrientation
    
            self.init(cgImage: cgImage, orientation: orientation)
        }
    }

    A VNImageRequestHandler is used to work with images in Apple’s Vision framework. It acts as a handle for an image that we’re working with, so we don’t need to mess with the real definitive copy of an image. Our convenience initializer allows us to create one with a UIImage because VNImageRequestHandler typically requires a CGImage, which is a different way of storing an image in Apple’s frameworks.

Tip

A UIImage is a very high-level way of storing an image, and is easy to create from files, for example. UIImages are safe to use in threaded environments, and are immutable. CGImage’s are not immutable, and can be used if you need to meddle with the contents of an image. You can learn about UIImage and CGImage in Apple’s documentation, if you’re curious.

  1. Insert an extension on VNRequest, adding a queueFor() function:

    extension VNRequest {
        func queueFor(image: UIImage,  completion: @escaping ([Any]?) -> ()) {
            DispatchQueue.global().async {
                if let handler = VNImageRequestHandler(uiImage: image) {
                    try? handler.perform([self])
                    completion(self.results)
                } else {
                    return completion(nil)
                }
            }
        }
    }

    This queues up requests for the VNImageRequestHandler: it allows us to push things into Vision to be processed.

  2. Add an extension on UIImage, and a function to dectect rectangles (just in case we want to look for those) and to detect barcodes:

    extension UIImage {
        func detectRectangles(
            completion: @escaping ([VNRectangleObservation]) -> ()) {
    
            let request = VNDetectRectanglesRequest()
            request.minimumConfidence = 0.8
            request.minimumAspectRatio = 0.3
            request.maximumObservations = 3
    
            request.queueFor(image: self) { result in
                completion(result as? [VNRectangleObservation] ?? [])
            }
        }
    
        func detectBarcodes(
            types symbologies: [VNBarcodeSymbology] = [.QR],
            completion: @escaping ([VNBarcodeObservation]) ->()) {
    
            let request = VNDetectBarcodesRequest()
            request.symbologies = symbologies
    
            request.queueFor(image: self) { result in
                completion(result as? [VNBarcodeObservation] ?? [])
            }
        }
    
        // can also detect human figures, animals, the horizon, all sorts of
        // things with inbuilt Vision functions
    }

Both of these functions work the same way: they add a function to UIImage that lets us ask for barcodes or rectangles. When called, the function creates a request with Vision and looks for the type of thing we’re asking for.

To test it, drag an image with a barcode (or a QR code) into the Resources folder of the Playground, as shown in Figure 4-7 and then add some code to the Playground to call our barcode-finding code:

let barcodeTestImage = UIImage(named: "test.jpg")!

barcodeTestImage.detectBarcodes { barcodes in
    for barcode in barcodes {
        print("Barcode data: \(barcode.payloadStringValue ?? "None")")
    }
}
pais 0407
Figure 4-7. Resources for the barcode finder

This code first specifies an image (the one we dragged in, which we know has a barcode in it) and then calls the detectBarcodes() function we created on it. You should see something resembling Figure 4-8 when it works. That’s it!

pais 0408
Figure 4-8. Our barcode has been detected

Task: Saliency Detection

Closely related to barcode detection is saliency detection: finding the most interesting, or salient, bit of an image. For this task, we take the Playground we wrote for “Task: Barcode Detection”, and add support for saliency detection.

Confused by what we mean by saliency detection? Check out Figure 4-9 for an example.

pais 0409
Figure 4-9. An example of saliency detection. A box is drawn around the salient bit of this image (Paris with an owl cocktail mug!).
Tip

Detecting saliency is, for all intents, generating a heatmap of an image that can be used to highlight areas of interest.

Open the Playground we created in “Task: Barcode Detection”:

  1. Working in the main body of the Playground, we’ll need to add an extension on UIImage:

    extension UIImage {
    
    }
  2. Within this extension, let’s first add an enumeration for the type of saliency we want to look at:

    enum SaliencyType {
        case objectnessBased, attentionBased
    
        var request: VNRequest {
            switch self {
            case .objectnessBased:
                return VNGenerateObjectnessBasedSaliencyImageRequest()
            case .attentionBased:
                return VNGenerateAttentionBasedSaliencyImageRequest()
            }
        }
    }

    This gives us a nice shorthand way of accessing either VNGenerateObjectnessBasedSaliencyImageRequest or VNGenerateAttentionBasedSaliencyImageRequest. VNGenerateObjectnessBasedSaliencyImageRequest relates to detecting the parts of an image that are most likely to be objects, whereas VNGenerateAttentionBasedSaliencyImageRequest relates to detecting the parts of an image that are likely to be most interesting.

Note

To find this code in our resources, head to our website, download the resources, and find the Playground in the BarcodeAndSaliencyDetection folder.

  1. While still within the UIImage extension, add a function called detectSalientRegions():

    func detectSalientRegions(
        prioritising saliencyType: SaliencyType = .attentionBased,
        completion: @escaping (VNSaliencyImageObservation?) -> ()) {
    
        let request = saliencyType.request
    
        request.queueFor(image: self) { results in
            completion(results?.first as? VNSaliencyImageObservation)
        }
    }

    This function allows us to ask a UIImage to give us its salient regions (this sounds far more exciting than it actually is) based on the type of saliency we want.

  2. Add a cropped() function, which crops the image based on the saliency request, cropping to the salient bit:

    func cropped(
        with saliencyObservation: VNSaliencyImageObservation?,
        to size: CGSize? = nil) -> UIImage? {
    
        guard let saliencyMap = saliencyObservation,
            let salientObjects = saliencyMap.salientObjects else {
                return nil
        }
    
        // merge all detected salient objects into one big rect of the
        // overaching 'salient region'
        let salientRect = salientObjects.reduce(into: CGRect.zero) {
            rect, object in
            rect = rect.union(object.boundingBox)
        }
        let normalizedSalientRect =
            VNImageRectForNormalizedRect(
                salientRect, Int(self.width), Int(self.height)
            )
    
        var finalImage: UIImage?
    
        // transform normalized salient rect based on larger or smaller
        // than desired size
        if let desiredSize = size {
            if self.width < desiredSize.width ||
                self.height < desiredSize.height { return nil }
    
            let scaleFactor = desiredSize
                .scaleFactor(to: normalizedSalientRect.size)
    
            // crop to the interesting bit
            finalImage = self.cropped(to: normalizedSalientRect)
    
            // scale the image so that as much of the interesting bit as
            // possible can be kept within desiredSize
            finalImage = finalImage?.scaled(by: -scaleFactor)
    
            // crop to the final desiredSize aspectRatio
            finalImage = finalImage?.cropped(to: desiredSize)
        } else {
            finalImage = self.cropped(to: normalizedSalientRect)
        }
    
        return finalImage
    }

We can test this by dragging some images into the Resources folder of the Playground (as we did in “Task: Barcode Detection”) and then do the following:

  1. Define an image (pointing to one of those we dragged to the Resources folder) and a size to which to crop it:

    let saliencyTestImage = UIImage(named: "test3.jpg")!
    let thumbnailSize = CGSize(width: 80, height: 80)
  2. Define some UIImages to store the two different types of saliency crops we want (attention and object):

    var attentionCrop: UIImage?
    var objectsCrop: UIImage?
  3. Call our detectSalientRegions() function (twice; once for each type of saliency):

    saliencyTestImage.detectSalientRegions(prioritising: .attentionBased) {
        result in
    
        if result == nil {
            print("The entire image was found equally interesting!")
        }
    
        attentionCrop = saliencyTestImage
            .cropped(with: result, to: thumbnailSize)
    
        print("Image was \(saliencyTestImage.width) * " +
            "\(saliencyTestImage.height), now " +
            "\(attentionCrop?.width ?? 0) * \(attentionCrop?.height ?? 0).")
    }
    
    saliencyTestImage
        .detectSalientRegions(prioritising: .objectnessBased) { result in
        if result == nil {
            print("The entire image was found equally interesting!")
        }
    
        objectsCrop = saliencyTestImage
            .cropped(with: result, to: thumbnailSize)
    
        print("Image was \(saliencyTestImage.width) * " +
        "\(saliencyTestImage.height), now " +
        "\(objectsCrop?.width ?? 0) * \(objectsCrop?.height ?? 0).")
    }

You should see something that looks like Figure 4-10. Try it with different images to see what the app thinks is salient.

pais 0410
Figure 4-10. The saliency detector is working

Task: Image Similarity

Comparing two images to determine how similar they are is, at its core, a straightforward application of AI. Whether you need this for a game or to see how similar a user’s profile pictures are, there’s a variety of uses for checking how similar an image is.

In this task, we explore how you can quickly and easily compare two images in your Swift applications and, again, without any model training involved.

This task is similar to the previous ones in that there is a toolkit for checking image similarity provided by Apple. You could build a machine-learning application that understands how to inform you of the distance between two images, but Apple has done the work for you, so why would you? This book is practical.

Problem and Approach

Image similarity is one of those subtle practical AI things that’s super useful when you need it, but difficult to quantify why you might need it in advance. In this task, we look at the practical side of image similarity by doing the following:

  • Building an app that allows the user to select, or take, two pictures, and determine how similar they are (by percentage)

  • Using Apple’s tools for doing this without training a model

  • Exploring the potential next steps for image similarity, and other ways to tackle this and similar problems

To demonstrate how to do this, we’re going to build the app shown in Figure 4-11. Let’s get started.

Building the App

We’re again going to be using Apple’s newest UI framework, SwiftUI, to build the app for determining image similarity as a practical AI task.

The final form of the app we’re going to build in this task can be seen in (Figure 4-11) and consists of the following SwiftUI components:

  • A NavigationView, with an app title and some Buttons (as .navigationBarItems) to allow the user to pick a photo from their library, or take a photo with their camera

  • Two Image views, which will actually be OptionalResizableImage classes (we create these in a moment) to display the two images that we want to get the similarity of

  • A Button to trigger the comparison of the two images, and another to clear the two images

  • Some Text to display the similarity percentages

pais 0411
Figure 4-11. The final version of the Image Similarity app
Note

This book is here to teach you the practical side of using AI and machine-learning features with Swift and on Apple’s platforms. Because of this, we don’t explain the fine details of how to build apps; we assume that you mostly know that (although if you don’t, we think you’ll be able to follow along just fine if you pay attention). If you want to learn Swift, we recommend picking up Learning Swift (also by us!) from the lovely folks at O’Reilly.

If you don’t want to manually build the iOS app, you can download the code from our website and then find the project named ISDemo-Complete. After you have that, we strongly recommend that you still proceed through this section, comparing the notes here with the code you downloaded.

To create the app yourself, you’ll need to do the following:

  1. Create an iOS app project in Xcode, choosing the Single View App template, and selecting the SwiftUI checkbox.

  2. Add a new filed named Views.swift and import the following:

    import SwiftUI
  3. Create a new View for an image that can resize:

    struct OptionalResizableImage: View {
        let image: UIImage?
        let placeholder: UIImage
    
        var body: some View {
            if let image = image {
                return Image(uiImage: image)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
            } else {
                return Image(uiImage: placeholder)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
            }
        }
    }
  4. Create a View for a ButtonLabel:

    struct ButtonLabel: View {
        private let text: String
        private let background: Color
    
        var body: some View {
            HStack {
                Spacer()
                Text(text).font(.title).bold().foregroundColor(.white)
                Spacer()
                }.padding().background(background).cornerRadius(10)
        }
    
        init(_ text: String, background: Color) {
            self.text = text
            self.background = background
        }
    }

    Our ButtonLabel is some text of a certain color.

  5. Create a View so that we can work with a UIImagePicker:

    struct ImagePickerView: View {
        private let completion: (UIImage?) -> ()
        private let camera: Bool
    
        var body: some View {
            ImagePickerControllerWrapper(
                camera: camera,
                completion: completion
            )
        }
    
        init(camera: Bool = false, completion: @escaping (UIImage?) -> ()) {
            self.completion = completion
            self.camera = camera
        }
    }
  6. Create a wrapper for UIViewControllerRepresentable so that we can actually use a UIImagePicker:

    struct ImagePickerControllerWrapper: UIViewControllerRepresentable {
        typealias UIViewControllerType = UIImagePickerController
        private(set) var selectedImage: UIImage?
        private(set) var cameraSource: Bool
        private let completion: (UIImage?) -> ()
    
        init(camera: Bool, completion: @escaping (UIImage?) -> ()) {
            self.cameraSource = camera
            self.completion = completion
        }
    
        func makeCoordinator() -> ImagePickerControllerWrapper.Coordinator {
            let coordinator = Coordinator(self)
            coordinator.completion = self.completion
            return coordinator
        }
    
        func makeUIViewController(context: Context) ->
            UIImagePickerController {
    
            let imagePickerController = UIImagePickerController()
            imagePickerController.delegate = context.coordinator
            imagePickerController.sourceType =
                cameraSource ? .camera : .photoLibrary
            return imagePickerController
        }
    
        func updateUIViewController(
            _ uiViewController: UIImagePickerController, context: Context) {
            //uiViewController.setViewControllers(?, animated: true)
        }
    
        class Coordinator: NSObject,
            UIImagePickerControllerDelegate, UINavigationControllerDelegate {
    
            var parent: ImagePickerControllerWrapper
            var completion: ((UIImage?) -> ())?
    
            init(_ imagePickerControllerWrapper:
                ImagePickerControllerWrapper) {
                self.parent = imagePickerControllerWrapper
            }
    
            func imagePickerController(_ picker: UIImagePickerController,
                didFinishPickingMediaWithInfo info:
                    [UIImagePickerController.InfoKey: Any]) {
    
                print("Image picker complete...")
                let selectedImage =
                    info[UIImagePickerController.InfoKey.originalImage]
                    as? UIImage
                picker.dismiss(animated: true)
                completion?(selectedImage)
            }
    
            func imagePickerControllerDidCancel(
                _ picker: UIImagePickerController) {
    
                print("Image picker cancelled...")
                picker.dismiss(animated: true)
                completion?(nil)
            }
        }
    }
  7. In the Views.swift file, add the following extension on UIImage so that we can fix an image’s orientation:

    extension UIImage {
        func fixOrientation() -> UIImage? {
            UIGraphicsBeginImageContext(self.size)
            self.draw(at: .zero)
            let newImage = UIGraphicsGetImageFromCurrentImageContext()
            UIGraphicsEndImageContext()
            return newImage
        }
    }

Next, we make a file called Similarity.swift in which we perform the actual image similarity test:

  1. Add some imports:

    import UIKit
    import Vision
  2. Add an extension on UIImage:

    extension UIImage {
    
    }
  3. Within the extension, add the following function to compare similarity:

    func similarity(to image: UIImage) -> Float? {
        var similarity: Float = 0
        guard let firstImageFPO = self.featurePrintObservation(),
            let secondImageFPO = image.featurePrintObservation(),
            let _ = try? secondImageFPO.computeDistance(
                &similarity,
                to: firstImageFPO
            ) else {
                return nil
        }
    
        return similarity
    }

    The similarity is calculated by computing the distance between the two images in question.

  4. Add the following function to generate a feature print observation, which will assist in deriving image similarity:

    private func featurePrintObservation() -> VNFeaturePrintObservation? {
        guard let cgImage = self.cgImage else { return nil }
    
        let requestHandler =
            VNImageRequestHandler(cgImage: cgImage,
            orientation: self.cgImageOrientation,
            options: [:]
        )
    
        let request = VNGenerateImageFeaturePrintRequest()
        if let _ = try? requestHandler.perform([request]),
            let result = request.results?.first
                as? VNFeaturePrintObservation {
            return result
        }
    
        return nil
    }

    Notice that we called the featurePrintObservation() function that we wrote here earlier, in the similarity() function. The VNFeaturePrintObservations are the things that the distance is computed between in similarity().

  5. At the end of the Similarity.swift file, we need another extension on UIImage in order to obtain its orientation:

    extension UIImage {
        var cgImageOrientation: CGImagePropertyOrientation {
            switch self.imageOrientation {
                case .up: return .up
                case .down: return .down
                case .left: return .left
                case .right: return .right
                case .upMirrored: return .upMirrored
                case .downMirrored: return .downMirrored
                case .leftMirrored: return .leftMirrored
                case .rightMirrored: return .rightMirrored
            }
        }
    }

Finally, we need to move to the ContentView.swift file:

  1. Add our States to the top of the ContentView struct:

    @State private var imagePickerOpen: Bool = false
    @State private var cameraOpen: Bool = false
    
    @State private var firstImage: UIImage? = nil
    @State private var secondImage: UIImage? = nil
    @State private var similarity: Int = -1
  2. Below them, add the following attributes:

    private let placeholderImage = UIImage(named: "placeholder")!
    
    private var cameraEnabled: Bool {
        UIImagePickerController.isSourceTypeAvailable(.camera)
    }
    
    private var selectEnabled: Bool {
        secondImage == nil
    }
    
    private var comparisonEnabled: Bool {
        secondImage != nil && similarity < 0
    }
  3. Within the ContentView struct, but outside of the body View, add a function to clear our images and similarity rating:

    private func clearImages() {
        firstImage = nil
        secondImage = nil
        similarity = -1
    }
  4. And another to get the similarity:

    private func getSimilarity() {
        print("Getting similarity...")
        if let firstImage = firstImage, let secondImage = secondImage,
            let similarityMeasure = firstImage.similarity(to: secondImage){
            similarity = Int(similarityMeasure)
        } else {
            similarity = 0
        }
        print("Similarity: \(similarity)%")
    }
  5. And another for when control is returned from getting a similarity:

    private func controlReturned(image: UIImage?) {
        print("Image return \(image == nil ? "failure" : "success")...")
        if firstImage == nil {
            firstImage = image?.fixOrientation()
        } else {
            secondImage = image?.fixOrientation()
        }
    }
  6. And one more to summon an image picker:

    private func summonImagePicker() {
        print("Summoning ImagePicker...")
        imagePickerOpen = true
    }
  7. And one to summon a camera view:

    private func summonCamera() {
        print("Summoning camera...")
        cameraOpen = true
    }
  8. Update your body View as follows:

    var body: some View {
        if imagePickerOpen {
            return  AnyView(ImagePickerView { result in
                self.controlReturned(image: result)
                self.imagePickerOpen = false
            })
        } else if cameraOpen {
            return  AnyView(ImagePickerView(camera: true) { result in
                self.controlReturned(image: result)
                self.cameraOpen = false
            })
        } else {
            return AnyView(NavigationView {
                VStack {
                    HStack {
                        OptionalResizableImage(
                            image: firstImage,
                            placeholder: placeholderImage
                        )
                        OptionalResizableImage(
                            image: secondImage,
                            placeholder: placeholderImage
                        )
                    }
    
                    Button(action: clearImages) { Text("Clear Images") }
                    Spacer()
                    Text(
                        "Similarity: " +
                        "\(similarity > 0 ? String(similarity) : "...")%"
                    ).font(.title).bold()
                    Spacer()
    
                    if comparisonEnabled {
                        Button(action: getSimilarity) {
                            ButtonLabel("Compare", background: .blue)
                        }.disabled(!comparisonEnabled)
                    } else {
                        Button(action: getSimilarity) {
                            ButtonLabel("Compare", background: .gray)
                        }.disabled(!comparisonEnabled)
                    }
                }
                .padding()
                .navigationBarTitle(Text("ISDemo"), displayMode: .inline)
                .navigationBarItems(
                    leading: Button(action: summonImagePicker) {
                        Text("Select")
                    }.disabled(!selectEnabled),
                    trailing: Button(action: summonCamera) {
                        Image(systemName: "camera")
                    }.disabled(!cameraEnabled))
            })
        }
    }

We don’t need to touch the ContentView_Previews struct in this case.

You now can run the app, pick two images, take two photos (or some combination thereof), and then tap the button to get a rating of how similar they are. Brilliant.

What Just Happened? How Does This Work?

You might have noticed that we didn’t go through the process of finding data to train a model, training a model, and integrating the model into an app. Instead, we just built an app, and it all just worked. (You might also be seeing a theme in our tasks so far…)

Wouldn’t it be nice if everything were like this?

So far, we’ve been using features of Apple’s Vision framework, which is a suite of computer vision algorithms, to compare two images. (We introduced Vision back in “Apple’s Other Frameworks”.)

The feature we used to perform the image similarity comparison in this chapter is called VNFeaturePrintObservation. Computing a feature print allows two images to have a pair-wise distance computed: this allows us to ask for a similarity (a distance) between images. You can learn more about what might be happening under the hood later, in Chapter 11.

Tip

You can learn more about this feature in Apple’s documentation.

Next Steps

What’s next depends on what you want to do next. As mentioned in Chapter 2, Apple’s Vision framework has a variety of uses to address practical AI needs in your projects.

As supplied, and without any work from you other than using the appropriate bits of the framework, you can use Vision to detect faces and landmarks in faces such as the nose, mouth, eyes, and similar; text, barcodes, and other types of two-dimensional codes; and track features in video and beyond.

Vision also makes it easier to work with CoreML for image classification and object detection with your own machine-learning models.

Note

You could also do a different kind of image similarity. For example, Apple’s Turi Create library adopts an entirely different approach.

Task: Image Classification

In this first substantive practical task for which we build our own model, we take a look at an all-time classic practical application of AI: image classification.

Tip

Think of an image classifier like a hat that sorts images, as if it were from a certain popular magic-based fictional universe.

A classifier is a machine-learning model that takes input and classifies it into a category based on what it thinks the input is. An image classifier takes this with an image, and informs you as to which label (or class) it thinks the image belongs to, based on however many predefined labels it knows about.

Image classification is typically a deep-learning problem. For a refresher on what deep learning means, check back to Chapter 1.

Note

Deep learning is not the only way in which you can make an image classifier, but it’s currently one of the most effective ways.

Problem and Approach

As appropriate as it would be to tackle such a classic AI problem with a classic dataset (classifying whether a picture is of a cat or a dog), we’re a little more creative!

We’re going to build a binary image classifier that notifies us whether it thinks it sees a banana or an apple (Figure 4-12). Amazing, huh? (We’re not much more creative, it would seem.)

Tip

The importance of bananas to machine learning researchers cannot be overstated.

Dr. Alasdair Allan, 2019

For this task, we’re going to explore the practical side of image classification by doing the following:

  • Building an app that allows us to use or take photos and determine whether they contain a banana or an apple

  • Selecting a toolkit for creating a machine-learning model and assembling a dataset for the problem

  • Building and training an image classification model

  • Incorporating the model into our app

  • Improving our app

After that, we quickly touch on the theory of how it works, and point you to further resources for improvements and changes that you can make on your own.

We want this book to stay firmly rooted in the practical, task-based side of things that Apple’s platforms make so easy, so we’re going to approach this top-down. By this we mean that we start with the practical output we want: an app that can distinguish between a banana and an apple (Figure 4-12), and work down until we know how to make that work. We don’t start with an algorithm or a formula; we start with the practical desired result.

pais 0412
Figure 4-12. Our app will be able to identify images of each of these fruits

Figure 4-13 presents some images of what we’d like our resulting app to be. Let’s get started.

pais 0413
Figure 4-13. Our final app (we’ll be ready to win any game of Banana or Apple?!)

Building the App

The hottest, unicorniest startups in the world use machine learning to do things. It is known. We need to get in on this machine-learning action. We obviously need an app.

The starting point iOS app that we’re going to build first incorporates the following features:

  • Two buttons: one to pick a photo from the user photo library, and one to take a photo with the camera (if a camera is available)

  • An image view to display the chosen or taken image

  • A label to display some instructions (and eventually display what class it thinks the image chosen is)

  • A button to trigger the image classification

Figure 4-14 depicts an image of this first pass of the app. The app is going to be built using Apple’s UIKit framework, Apple’s older UI framework for iOS. You can learn more about UIKit in Apple’s documentation.

Note

This book is here to teach you the practical side of using AI and machine-learning features with Swift and on Apple’s platforms. Because of this, we don’t explain the fine details of how to build apps; we assume that you mostly know that (although if you don’t, we think you’ll be able to follow along just fine if you pay attention). If you want to learn Swift, we recommend picking up Learning Swift (also by us!) from the lovely folks at O’Reilly.

If you don’t want to manually build the starting point iOS app, you can download the code from our website and find the project named ICDemo-Starter. After you have that, skim through the rest of this section, and then meet us at “AI Toolkit and Dataset”.

pais 0414
Figure 4-14. The first phase of (what will become) our image classifier app

To make the starting point yourself, you need to do the following:

  1. Create an iOS app project in Xcode, choosing the Single View App template. We did not select any of the checkboxes below the Language drop-down (which was, of course, set to “Swift”).

  2. After you create your project, open the Main.storyboard file and create a user interface with the following components:

    • An image view to display the chosen image

    • A label to show both instructions and the classification of an image

    • A button to trigger the image classification

    • buttons to allow the user to pick an image from their photo library and take a photo (we used two navigation bar buttons for this). Figure 4-15 shows an example of our storyboard.

      pais 0415
      Figure 4-15. Our storyboard

      After you’ve laid out the necessary elements, make sure you add the proper constraints.

  1. Connect outlets for the UI objects as follows:

    @IBOutlet weak var cameraButton: UIBarButtonItem!
    @IBOutlet weak var imageView: UIImageView!
    @IBOutlet weak var classLabel: UILabel!
    @IBOutlet weak var classifyImageButton: UIButton!
  2. Connect actions for the UI objects as follows:

    @IBAction func selectButtonPressed(_ sender: Any) {
        getPhoto()
    }
    
    @IBAction func cameraButtonPressed(_ sender: Any) {
        getPhoto(cameraSource: true)
    }
    
    @IBAction func classifyImageButtonPressed(_ sender: Any) {
        classifyImage()
    }
  3. You also need to declare two variables in the ViewController class:

    private var inputImage: UIImage?
    private var classification: String?
  4. Modify the viewDidLoad() function, making it look as follows:

    override func viewDidLoad() {
        super.viewDidLoad()
    
        cameraButton.isEnabled =
            UIImagePickerController.isSourceTypeAvailable(.camera)
    
        imageView.contentMode = .scaleAspectFill
    
        imageView.image = UIImage.placeholder
    }
  5. Add the following function to enable or disable controls based on the presence of input to classify:

    private func refresh() {
        if inputImage == nil {
            classLabel.text = "Pick or take a photo!"
            imageView.image = UIImage.placeholder
        } else {
            imageView.image = inputImage
    
            if classification == nil {
                classLabel.text = "None"
                classifyImageButton.enable()
            } else {
                classLabel.text = classification
                classifyImageButton.disable()
            }
        }
    }
  6. Add another function to perform the classification which currently just sets the classification to “FRUIT!” because there’s no AI yet):

    private func classifyImage() {
        classification = "FRUIT!"
    
        refresh()
    }
  7. Add an extension to the end of the ViewController.swift file, as follows (it’s a fair chunk of code, which we explain in a moment):

    extension ViewController: UINavigationControllerDelegate,
        UIPickerViewDelegate, UIImagePickerControllerDelegate {
    
        private func getPhoto(cameraSource: Bool = false) {
            let photoSource: UIImagePickerController.SourceType
            photoSource = cameraSource ? .camera : .photoLibrary
    
            let imagePicker = UIImagePickerController()
            imagePicker.delegate = self
            imagePicker.sourceType = photoSource
            imagePicker.mediaTypes = [kUTTypeImage as String]
            present(imagePicker, animated: true)
        }
    
        @objc func imagePickerController(_ picker: UIImagePickerController,
            didFinishPickingMediaWithInfo info:
                [UIImagePickerController.InfoKey: Any]) {
    
            inputImage =
                info[UIImagePickerController.InfoKey.originalImage] as? UIImage
    
            classification = nil
    
            picker.dismiss(animated: true)
            refresh()
    
            if inputImage == nil {
                summonAlertView(message: "Image was malformed.")
            }
        }
    
        private func summonAlertView(message: String? = nil) {
            let alertController = UIAlertController(
                title: "Error",
                message: message ?? "Action could not be completed.",
                preferredStyle: .alert
            )
    
            alertController.addAction(
                UIAlertAction(
                    title: "OK",
                    style: .default
                )
            )
            present(alertController, animated: true)
        }
    }

    This code allows us to summon the camera or the user photo library. After the user has taken a photo or chosen one, the image is returned. If, for some reason, the image chosen is nil, it also provides for the display of an alert view using summonAlertView(), to notify the user what happened.

And finally, code-wise, add a new Swift file to the project and name it Utils.swift (or similar):

  1. In this new Swift file, add the following:

    import UIKit
    
    extension UIImage{
        static let placeholder = UIImage(named: "placeholder.png")!
    }
    
    extension UIButton {
        func enable() {
            self.isEnabled = true
            self.backgroundColor = UIColor.systemBlue
        }
    
        func disable() {
            self.isEnabled = false
            self.backgroundColor = UIColor.lightGray
        }
    }
    
    extension UIBarButtonItem {
        func enable() { self.isEnabled = true }
        func disable() { self.isEnabled = false }
    }

    This defines an extension on UIImage that allows us to specify a placeholder image. It also defines an extension on UIButton that allows us to enable() or disable() the button. We also add the equivalent on UIBarButtonItem, which is the navigation bar equivalent of a UIButton.

  2. Add a launch screen and an icon, if you’d like (our starter project has some), and launch the app in the simulator. You should see something like the image we showed earlier, in Figure 4-14.

You can select an image (or take a photo if you’re running it on a real device) and see the image appear in the image view. As Figure 4-16 demonstrates, when you tap the Classify Image button, you should see the label update to say “FRUIT!”.

pais 0416
Figure 4-16. Our starter app for the image classifier is ready

You’re now ready to get into the AI side of things.

AI Toolkit and Dataset

You’ll need to assemble your toolkit for this task. The primary tools we’ll be using in this case are the CreateML application and the CoreML and Vision frameworks.

First, we use the CreateML application, Apple’s task-based tool for building machine-learning models to assemble, train, and validate a model that can, hopefully, distinguish between bananas and apples.

Then, we use CoreML to work with that model.

At this point you might be thinking, “CoreML? Isn’t this entire book about CoreML? Have the authors gone off the rails? Is that why there are four authors? Did they keep replacing one another?”

Well, we can’t comment whether we’ve gone off the rails, but we promise you that even though CoreML is a central component of this book, it’s not the only one.

CoreML takes care of the using, reading from, talking to, and otherwise dealing with machine-learning models in your apps. We’re going to be using it in this scenario for exactly that: getting a model into our app and communicating with it.

For more details on the nitty-gritty of the tools, check back to Chapter 2, particularly “CreateML”.

Our final tool for Banana or Apple?! is Vision. Vision is a framework, also from Apple, that provides a whole lot of smarts to help with computer-vision problems. As it turns out, recognizing images and classifying them is a computer-vision problem. We used Vision a lot earlier in this chapter, for Face Detection, Barcode Detection, Saliency Detection, and Image Similarity. For those, we were directly using Vision. This time, we use Vision to work with our own model, and with CoreML. We discussed Apple’s other frameworks earlier, in “Apple’s Other Frameworks”, and you can see where Vision fits in with the other frameworks in Figure 4-17.

pais 0417
Figure 4-17. Where CoreML fits with our other AI tools

Before we can make an app that can classify different kinds of fruit from a picture, we need some pictures of fruit. Thankfully, as with many things, the boffins from Romania have us covered with the Fruit-360 dataset.

This dataset contains 103 different types of fruit, cleanly separated into training data, test data, as well as images with more than one fruit per image, for audacious multi-fruit classification. Figure 4-18 illustrates an example of the kinds of images that are in the dataset.

pais 0418
Figure 4-18. Examples of the fruit images
Note

At this point you might have gathered that if we used all of these images for our classification model, the app would not only be able to advise us whether we’re looking at a banana or an apple, but whether we’re what looking at is one of 103 different fruits: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Cactus fruit, Cantaloupe (two varieties), Carambula, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Dates, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango, Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine, Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Kaiser, Monster, Red, Williams), Pepper (Red, Green, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow), Walnut. Truly, we live in an age of marvels. (We’re just going to use the apples and bananas right now, though.)

Let’s get the dataset ready to train a model. All you’ll need to do is head over to the Fruit-360 dataset and download it by hitting the big green button. After you’ve extracted it, you should be looking at something that resembles the image shown in Figure 4-19.

Because we only want to look for apples or bananas, you should now copy out the apple and banana folders from the Training folder and put them in a new folder somewhere safe, as shown in Figure 4-20.

pais 0419
Figure 4-19. The Fruit-360 dataset, extracted and ready to go
pais 0420
Figure 4-20. The Apple and Banana images, ready to go

Creating a model

With our dataset ready to go, we now turn to Apple’s CreateML application to build a model. CreateML has come in a few different iterations over the years, but, here, we use the newest: the application version.

Tip

To learn more about the various incarnations of CreateML, check back to Chapter 2.

Let’s build our fruit classifier. Open CreateML: you can find CreateML by firing up Xcode, and then selecting the Xcode menu → Open Developer Tool → CreateML, and then do the following:

Tip

If you like launching macOS apps using Spotlight, you can just summon Spotlight and type CreateML. Magic.

  1. With CreateML open, select the Image Classifier template, as shown in Figure 4-21, and then click Next.

  2. Give your project some details, as shown in Figure 4-22, and again click Next.

    pais 0421
    Figure 4-21. Selecting the Image Classifier option in the CreateML template picker
    pais 0422
    Figure 4-22. Setting the project options for your new CreateML model

    You now have an empty CreateML project, ready to train an image classifier. It should look something like Figure 4-23.

    pais 0423
    Figure 4-23. Your CreateML project is ready to take some images
  3. Click the drop-down text box marked Training Data and browse to the folder where you saved the apple and banana images earlier. Select this folder.

  4. In the top bar of the CreateML app, click the Play button, and then go watch some TV, play a videogame, or go for a walk. CreateML is going to get to work training your model for you! It should look something like Figure 4-24.

pais 0424
Figure 4-24. CreateML training our fruit classifier
Note

Don’t panic! This might take a while. It took about 47 minutes to train on our 8-core i9 MacBook Pro, but it will go faster with the more CPU cores you have in your machine. However, it will always take a while. On a MacBook Air or MacBook, this could take multiple hours. This is normal.

As training approaches completion, you’ll notice the application doing an accuracy and testing pass, showing some charts about how accurate the model is. We talk about these later. The testing phase can take a while, too.

When CreateML is done, you’ll be able to drag the model file out from the Output box in the upper-right corner of the window. Drag this file somewhere safe.

Note

You might notice that the file you dragged out has the extension .mlmodel. This is CoreML’s native model format, as discussed in “CoreML”.

Now that we’ve trained and tested a model that can identify fruit (well, more accurately, CreateML has done it for us), let’s put it to work in our app.

Tip

We talk more about what the training, validation, and testing phases of this process are later on in this chapter and throughout the rest of the book. Stay tuned. (The book is called practical artificial intelligence, after all!) Also visit our website https://aiwithswift.com for articles on the topic.

Incorporating the Model in the App

Now that we have our starting point app and a trained model, we’re going to combine them and make an app that can actually perform image classification.

You’ll need to have either built the starting point yourself, following the instructions in “Building the App”, or downloaded the code and the project named ICDemo-Starter from our website. We’ll be progressing from that point in this section.

If you don’t want to follow along and manually work with the app’s code to add the AI features, you can also download the project named ICDemo-Complete.

We’re going to need to change a few things to get the app working with our model:

  1. Add a new variable, classifier alongside inputImage and classification:

    private let classifier = VisionClassifier(mlmodel: BananaOrApple().model)
  2. Assign the new variable’s delegate to self at the end of viewDidLoad(), and then call refresh():

    classifier?.delegate = self
    refresh()
  3. At the end of the first if statement of the refresh() function, add a call to disable the classifyImageButton (so that if there’s no image present, you can’t click the button to ask the model for a classification, which matters now that there will be a model connected):

    classifyImageButton.disable()
  4. Replace the definition of classifyImage() as follows, to actually do something instead of always saying “FRUIT!”:

    private func classifyImage() {
        if let classifier = self.classifier, let image = inputImage {
            classifier.classify(image)
            classifyImageButton.disable()
        }
    }

    Next, add a new Swift file to the project, called Vision.swift:

  5. Add the following code to it:

    import UIKit
    import CoreML
    import Vision
    
    extension VNImageRequestHandler {
        convenience init?(uiImage: UIImage) {
            guard let ciImage = CIImage(image: uiImage) else { return nil }
            let orientation = uiImage.cgImageOrientation
    
            self.init(ciImage: ciImage, orientation: orientation)
        }
    }
    
    class VisionClassifier {
    
        private let model: VNCoreMLModel
        private lazy var requests: [VNCoreMLRequest] = {
            let request = VNCoreMLRequest(
                model: model,
                completionHandler: {
                    [weak self] request, error in
                    self?.handleResults(for: request, error: error)
            })
    
            request.imageCropAndScaleOption = .centerCrop
            return [request]
        }()
    
        var delegate: ViewController?
    
        init?(mlmodel: MLModel) {
            if let model = try? VNCoreMLModel(for: mlmodel) {
                self.model = model
            } else {
                return nil
            }
        }
    
        func classify(_ image: UIImage) {
            DispatchQueue.global(qos: .userInitiated).async {
                guard let handler =
                    VNImageRequestHandler(uiImage: image) else {
                        return
                }
    
                do {
                    try handler.perform(self.requests)
                } catch {
                    self.delegate?.summonAlertView(
                        message: error.localizedDescription
                    )
                }
            }
        }
    
        func handleResults(for request: VNRequest, error: Error?) {
            DispatchQueue.main.async {
                guard let results =
                    request.results as? [VNClassificationObservation] else {
                        self.delegate?.summonAlertView(
                            message: error?.localizedDescription
                        )
                        return
                }
    
                if results.isEmpty {
                    self.delegate?.classification = "Don't see a thing!"
                } else {
                    let result = results[0]
    
                    if result.confidence < 0.6  {
                        self.delegate?.classification = "Not quite sure..."
                    } else {
                        self.delegate?.classification =
                            "\(result.identifier) " +
                            "(\(Int(result.confidence * 100))%)"
                    }
                }
    
                self.delegate?.refresh()
            }
        }
    }
  6. Add the following extension to the end of the Vision.swift file:

    extension UIImage {
        var cgImageOrientation: CGImagePropertyOrientation {
            switch self.imageOrientation {
                case .up: return .up
                case .down: return .down
                case .left: return .left
                case .right: return .right
                case .upMirrored: return .upMirrored
                case .downMirrored: return .downMirrored
                case .leftMirrored: return .leftMirrored
                case .rightMirrored: return .rightMirrored
            }
        }
    }

    This code comes directly from Apple’s documentation on converting between CGImage and UIImage types. We talked about the difference between CGImage and UIImage earlier in “Task: Barcode Detection”.

  7. Drag the WhatsMyFruit.mlmodel file into the root of the projects and allow Xcode to copy it in.

You can now launch the app in the simulator. You should see something that looks like Figure 4-25.

You can select an image (or take a photo if you’re running it on a real device), see the image appear in the image view, and then click tap the Classify Image button to ask the model we built for a classification. You should see the label update with the classification (or lack thereof).

Improving the App

You can, of course, make the app able to classify more than just bananas and apples. If you return to the dataset that we prepared earlier in “AI Toolkit and Dataset” and look at the complete Training folder, with all 103 different fruit classes (labels), you might be able to guess what we suggest trying next.

Train a new image classification model using Apple’s CreateML app, following the instructions in “Creating a model”, but instead, select the entire Training folder (giving you 103 different classes) from the Fruit-360 dataset.

Drop this model into your Xcode project, named appropriately, and then update the following line in ViewController.swift to point to the new model:

private let classifier = VisionClassifier(mlmodel: BananaOrApple().model)

For example, if your new model was called Fruits360.mlmodel, you’d update the line to resemble the following:

private let classifier = VisionClassifier(mlmodel: Fruits360().model)

You then can launch your app again and detect all 103 different kinds of fruit. Amazing. You’re now ready to play app-assisted “What’s My Fruit?”

pais 0425
Figure 4-25. Our image classifer works

Task: Drawing Recognition

With the advent of the iPad Pro and the Apple Pencil, drawing on Apple’s mobile devices is more popular than ever (check out Procreate, an app built in the authors’ home state of Tasmania).

Classifying a drawing could be useful for all manner of reasons, from making a drawing-based game to figuring out what someone has drawn to turn it into an emoji, and beyond.

Problem and Approach

Drawings are fun, and it’s kind of magic to be able to draw something, even if it’s all scribbly and weird, and then have a computer tell you what you’ve drawn. It’s a fun feature that could be an app or game all on its own or form the basis of a feature that makes your app a little bit more magic.

In this task, we’re going to explore the practical side of drawing detection by the following:

  • Building an app that lets users take a photo of a drawing and have the app classify it

  • Finding or assembling the data and then training a model that can classify drawings from bitmap images

  • Exploring the next steps for better drawing classification

In this task, we build an app that can identify what we’ve drawn from a photo of a black-and-white scribbly-line drawing. Figure 4-26 illustrates what the final version of the app looks like.

pais 0426
Figure 4-26. The final version of our bitmap drawing detector

AI Toolkit and Dataset

We’re going to look at our AI toolkit before we build the app for this task, because there’s really only one pass we need to do to build the app. The primary tools we’ll be using for this task are Turi Create, CoreML, and Vision. For a reminder on what these tools are, check back to Chapter 2 and “Apple’s Other Frameworks”.

First, let’s use Turi Create, Apple’s task-based Python toolkit for creating machine-learning models, to train a model that can classify drawings.

Then, we use CoreML and Vision to work with the model, classifying photos of drawings we allow the user to take photos of.

To make an app that can classify drawings, we need a dataset of drawings. We could draw a few million little sketches of different things that we might want the app to be able to identify, but that might take a while.

As you’ll find is often the case, the boffins have us covered. This time the boffins are from Google. The Quick Draw Dataset is a collection of more than 50 million sketchy drawings, categorized (345 categories), all drawn by users from around the world who were playing Google’s Quick, Draw! game online (Google is very good at getting people to contribute data), shown in Figure 4-27.

Tip

We’ve been made aware that those outside the United Kingdom and Australia might not know what a boffin is. Please consult this article for more details on boffins. As a wise thought leader once said: books are for learning. And now you know!

pais 0427
Figure 4-27. Google’s Quick, Draw! game

Because the Quick Draw Dataset has so many categories, and training a classifier with so many samples would take a while (feel free to modify our scripts and give it a go), we’re going to limit ourselves to the following 23 categories: apple, banana, bread, broccoli, cake, carrot, coffee cup, cookie, donut, grapes, hot dog, ice cream, lollipop, mushroom, peanut, pear, pineapple, pizza, potato, sandwich, steak, strawberry, and watermelon.

You can see an example of the sorts of drawings the app will be able to classify in Figure 4-28.

pais 0428
Figure 4-28. Examples of the images our drawing classifier will be able to work with
Note

You don’t need to download the Quick Draw Dataset; it’s very, very large. We download it as part of the script we make to train the model in “Creating a model”.

Creating a model

We’re going to use Apple’s Turi Create to train this model. This means that we’ll need a Python environment:

  1. Set up a Python environment following the process that we outlined in “Python”, activate the environment, and use pip to install Turi Create, as shown in Figure 4-29:

    conda create -n TuriCreateDrawingClassifierEnvironment python=3.6

    conda activate TuriCreateDrawingClassifierEnvironment

    pip install turicreate

    pais 0429
    Figure 4-29. Creating our environment
  2. Create a new Python script named train_drawing_classifier.py, and add the following:

    #!/usr/bin/env python
    
    import os
    import json
    import requests
    import numpy as np
    import turicreate as tc
  3. Add some configuration variables, including a list of categories, that we want to train:

    # THE CATEGORIES WE WANT TO BE ABLE TO DISTINGUISH
    categories = [
        'apple', 'banana', 'bread', 'broccoli', 'cake', 'carrot', 'coffee cup',
        'cookie', 'donut', 'grapes', 'hot dog', 'ice cream', 'lollipop',
        'mushroom', 'peanut', 'pear', 'pineapple', 'pizza', 'potato',
        'sandwich', 'steak', 'strawberry', 'watermelon'
    ]
    
    # CONFIGURE AS REQUIRED
    this_directory = os.path.dirname(os.path.realpath(__file__))
    quickdraw_directory = this_directory + '/quickdraw'
    bitmap_directory = quickdraw_directory + '/bitmap'
    bitmap_sframe_path = quickdraw_directory + '/bitmaps.sframe'
    output_model_filename = this_directory + '/DrawingClassifierModel'
    training_samples = 10000
  4. Add the following function to make directories in which to put the training data:

    # MAKE SOME FOLDERS TO PUT TRAINING DATA IN
    def make_directory(path):
    	try:
    		os.makedirs(path)
    	except OSError:
    		if not os.path.isdir(path):
    			raise
    
    make_directory(quickdraw_directory)
    make_directory(bitmap_directory)
  5. Fetch the bitmaps that we’re going to use to train:

    # FETCH SOME DATA
    bitmap_url = (
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap'
    )
    
    total_categories = len(categories)
    
    for index, category in enumerate(categories):
    	bitmap_filename = '/' + category + '.npy'
    
    	with open(bitmap_directory + bitmap_filename, 'w+') as bitmap_file:
    		bitmap_response = requests.get(bitmap_url + bitmap_filename)
    		bitmap_file.write(bitmap_response.content)
    
    	print('Downloaded %s drawings (category %d/%d)' %
            (category, index + 1, total_categories))
    
    random_state = np.random.RandomState(100)
  6. Add a function to make SFrames from the images:

    def get_bitmap_sframe():
        labels, drawings = [], []
        for category in categories:
            data = np.load(
                bitmap_directory + '/' + category + '.npy',
                allow_pickle=True
            )
            random_state.shuffle(data)
            sampled_data = data[:training_samples]
            transformed_data = sampled_data.reshape(
                sampled_data.shape[0], 28, 28, 1)
    
            for pixel_data in transformed_data:
                image = tc.Image(_image_data=np.invert(pixel_data).tobytes(),
                     _width=pixel_data.shape[1],
                     _height=pixel_data.shape[0],
                     _channels=pixel_data.shape[2],
                     _format_enum=2,
                     _image_data_size=pixel_data.size)
                drawings.append(image)
                labels.append(category)
            print('...%s bitmaps complete' % category)
        print('%d bitmaps with %d labels' % (len(drawings), len(labels)))
        return tc.SFrame({'drawing': drawings, 'label': labels})
  7. Add something to save out those SFrames to files:

    # Save intermediate bitmap SFrame to file
    bitmap_sframe = get_bitmap_sframe()
    bitmap_sframe.save(bitmap_sframe_path)
    bitmap_sframe.explore()
  8. Now, we actually train the drawing classifier:

    bitmap_model = tc.drawing_classifier.create(
        bitmap_sframe, 'label', max_iterations=1000)
  9. Export it to CoreML format:

    bitmap_model.export_coreml(output_model_filename + '.mlmodel')
    Tip

    If you want to make your drawing classifier capable of classifying different drawings than ours, check out the list of categories and pick some different ones.

  10. Run the script:

    python train_drawing_classifier.py

    You should see something that resembles Figure 4-30. As we mentioned earlier, you don’t need to download the Quick Draw Dataset manually, because the script does this.

pais 0430
Figure 4-30. Training the drawing classifier

After it’s grabbed them and parsed them into the Turi Create internal format, you’ll see something like Figure 4-31 pop up for you to browse the images. You can check back to “Turi Create” for more information on Turi Create.

Feel free to poke around the visualization while it trains.

pais 0431
Figure 4-31. Turi Create’s visualization of the images
Note

This training might take a while. It took several hours on our modern MacBook Pro. Make a cup of tea, and go watch Person of Interest.

When the training is done, you can take a look in the folder where you did this work, and you’ll find a brand new DrawingClassifierModel.mlmodel, as shown in Figure 4-32. You can use this model just like any other CoreML we’ve worked with; conincidentally, this is exactly what we’ll be doing next, in “Building the App”.

Tip

We mentioned Turi Create’s visualization features earlier in “Understanding the pieces of Turi Create”. We also talked about the broad importance of getting to know your dataset in “Getting to Know a Dataset”.

pais 0432
Figure 4-32. The final, converted, CoreML model

Building the App

Again, we’re going to be using Apple’s newest UI framework, SwiftUI, to build the interface for the drawing detection app.

The final form of the app we’re going to build to count faces can be seen earlier, in Figure 4-1; it consists of the following SwiftUI components:

  • A NavigationView in which to display the title of the app, as well as the button to select a photo

  • An Image to display the chosen image (containing a drawing), which the app will attempt to classify

  • A Button to trigger the drawing classification

  • Some Text to display the count of faces

However, we construct this view out of multiple subviews, as we did for “Building the App”. If you don’t want to manually build the drawing detection iOS app, you can download the code from our website and look for the project named DDDemo.

After you have that, follow along through the rest of this section (we don’t recommend skipping it) and then meet us at “What’s Next?”.

Tip

You might note that this app is very similar in structure to some of the other SwiftUI apps that we build in the book. We’re trying to keep things consistent and as simple as possible. We really hope it helps you learn. Check our website for more tips and guides.

To make the drawing-detection iOS app yourself, you’ll need to do the following:

  1. Fire up Xcode.

  2. Create a new iOS app project, choosing the “Single View App” template. We use SwiftUI for this one, as mentioned.

  3. Drag in the .mlmodel file we created earlier in “AI Toolkit and Dataset” and let Xcode copy it over as needed.

  4. Add a new Swift file to the project called Image.swift. We use this to add an extension on UIImage, so that we can filter it to be more useful for classification.

  5. First, we also need an extension on CIFilter:

    extension CIFilter {
        static let mono = CIFilter(name: "CIPhotoEffectMono")!
        static let noir = CIFilter(name: "CIPhotoEffectNoir")!
        static let tonal = CIFilter(name: "CIPhotoEffectTonal")!
        static let invert = CIFilter(name: "CIColorInvert")!
    
        static func contrast(amount: Double = 2.0) -> CIFilter {
            let filter = CIFilter(name: "CIColorControls")!
            filter.setValue(amount, forKey: kCIInputContrastKey)
            return filter
        }
    
        static func brighten(amount: Double = 0.1) -> CIFilter {
            let filter = CIFilter(name: "CIColorControls")!
            filter.setValue(amount, forKey: kCIInputBrightnessKey)
            return filter
        }
    }

    This extension lets us create a CIFilter, which is a Core Image filter that can manipulate an image, and request that it be mono, noir, or tonal. You can learn more about these filters and how to create your own hhttps://apple.co/2otBgGV[in Apple’s documentation].

  6. The extension on UIImage also:

    extension UIImage {
        func applying(filter: CIFilter) -> UIImage? {
            filter.setValue(CIImage(image: self), forKey: kCIInputImageKey)
    
            let context = CIContext(options: nil)
            guard let output = filter.outputImage,
                let cgImage = context.createCGImage(
                    output, from: output.extent) else {
                return nil
            }
    
            return UIImage(
                cgImage: cgImage,
                scale: scale,
                orientation: imageOrientation)
        }
    
        func fixOrientation() -> UIImage? {
            UIGraphicsBeginImageContext(self.size)
            self.draw(at: .zero)
            let newImage = UIGraphicsGetImageFromCurrentImageContext()
            UIGraphicsEndImageContext()
            return newImage
        }
    
        var cgImageOrientation: CGImagePropertyOrientation {
            switch self.imageOrientation {
            case .up: return .up
            case .down: return .down
            case .left: return .left
            case .right: return .right
            case .upMirrored: return .upMirrored
            case .downMirrored: return .downMirrored
            case .leftMirrored: return .leftMirrored
            case .rightMirrored: return .rightMirrored
            }
        }
    }

    This extension adds to two functions: one to apply a CIFilter, and one to fix the orientation of an image. We also add the usual orientation fixes.

  7. Make another new Swift file called Drawing.swift and then add the following imports:

    import UIKit
    import Vision
    import Foundation
  8. Add the following enum:

    enum Drawing: String, CaseIterable {
        /// These only include those the model was trained on. For others that
        /// can be included in the training phase, see the full list of
        /// categories in the dataset:
        /// https://raw.githubusercontent.com/googlecreativelab/
        ///     quickdraw-dataset/master/categories.txt
        case apple, banana, bread, broccoli, cake, carrot, coffee, cookie
        case donut, grapes, hotdog, icecream, lollipop, mushroom, peanut, pear
        case pineapple, pizza, potato, sandwich, steak, strawberry, watermelon
    
        init?(rawValue: String) {
            if let match = Drawing.allCases
                .first(where: { $0.rawValue == rawValue }) {
                self = match
            } else {
                switch rawValue {
                    case "coffee cup":  self = .coffee
                    case "hot dog":     self = .hotdog
                    case "ice cream":   self = .icecream
                    default: return nil
                }
            }
        }
    
        var icon: String {
            switch self {
                case .apple: return ""
                case .banana: return ""
                case .bread: return ""
                case .broccoli: return ""
                case .cake: return ""
                case .carrot: return ""
                case .coffee: return ""
                case .cookie: return ""
                case .donut: return ""
                case .grapes: return ""
                case .hotdog: return ""
                case .icecream: return ""
                case .lollipop: return ""
                case .mushroom: return ""
                case .peanut: return ""
                case .pear: return ""
                case .pineapple: return ""
                case .pizza: return ""
                case .potato: return ""
                case .sandwich: return ""
                case .steak: return ""
                case .strawberry: return ""
                case .watermelon: return ""
            }
        }
    }
    
    enum Drawing: String, CaseIterable {
        /// These only include those the model was trained on. For others that
        /// can be included in the training phase, see the full list of
        /// categories in the dataset:
        /// https://raw.githubusercontent.com/googlecreativelab/
        ///     quickdraw-dataset/master/categories.txt
        case apple, banana, bread, broccoli, cake, carrot, coffee, cookie
        case donut, grapes, hotdog, icecream, lollipop, mushroom, peanut, pear
        case pineapple, pizza, potato, sandwich, steak, strawberry, watermelon
    
        init?(rawValue: String) {
            if let match = Drawing.allCases
                .first(where: { $0.rawValue == rawValue }) {
                self = match
            } else {
                switch rawValue {
                    case "coffee cup":  self = .coffee
                    case "hot dog":     self = .hotdog
                    case "ice cream":   self = .icecream
                    default: return nil
                }
            }
        }
    
        var icon: String {
            switch self {
                case .apple: return ""
                case .banana: return ""
                case .bread: return ""
                case .broccoli: return ""
                case .cake: return ""
                case .carrot: return ""
                case .coffee: return ""
                case .cookie: return ""
                case .donut: return ""
                case .grapes: return ""
                case .hotdog: return ""
                case .icecream: return ""
                case .lollipop: return ""
                case .mushroom: return ""
                case .peanut: return ""
                case .pear: return ""
                case .pineapple: return ""
                case .pizza: return ""
                case .potato: return ""
                case .sandwich: return ""
                case .steak: return ""
                case .strawberry: return ""
                case .watermelon: return ""
            }
        }
    }
    

    Our enum lets us create a Drawing (which is what the enum is called) from a String (via the init() we created). Each type of the Drawing enum has an icon, which is an emoji, assigned to it.

  9. You also need an extension on VNImageRequestHandler:

    extension VNImageRequestHandler {
        convenience init?(uiImage: UIImage) {
            guard let ciImage = CIImage(image: uiImage) else { return nil }
            let orientation = uiImage.cgImageOrientation
    
            self.init(ciImage: ciImage, orientation: orientation)
        }
    }

    This extension extends VNImageRequestHandler to add a convenience initializer allowing creation with a UIImage instead of a CIImage. For a reminder on what VNImageRequestHandler does, check Apple’s documentation.

  10. Add another extension on DrawingClassifierModelBitmap, which is the name of the model we made earlier (Xcode automatically creates a class from the model we dragged in):

    extension DrawingClassifierModel {
        func configure(image: UIImage?) -> UIImage? {
            if let rotatedImage = image?.fixOrientation(),
                let grayscaleImage = rotatedImage
                    .applying(filter: CIFilter.noir),
                // account for paper photography making everything dark :/
                let brightenedImage = grayscaleImage
                    .applying(filter: CIFilter.brighten(amount: 0.4)),
                let contrastedImage = brightenedImage
                    .applying(filter: CIFilter.contrast(amount: 10.0)) {
    
                    return contrastedImage
            }
    
            return nil
        }
    
        func classify(_ image: UIImage?,
            completion: @escaping (Drawing?) -> ()) {
            guard let image = image,
                let model = try? VNCoreMLModel(for: self.model) else {
                    return completion(nil)
            }
    
            let request = VNCoreMLRequest(model: model)
    
            DispatchQueue.global(qos: .userInitiated).async {
                if let handler = VNImageRequestHandler(uiImage: image) {
                    try? handler.perform([request])
    
                    let results = request.results
                        as? [VNClassificationObservation]
    
                    let highestResult = results?.max {
                            $0.confidence < $1.confidence
                    }
    
                    print(results?.list ?? "")
    
                    completion(
                        Drawing(rawValue: highestResult?.identifier ?? "")
                    )
                } else {
                    completion(nil)
                }
            }
        }
    }

    This large piece of code extends our model, DrawingClassifierModel, adding a configure() function that takes a UIImage and returns a version of it that’s been filtered to grayscale, brightened, and had its contrast increased. It also adds a classify() function that runs a VNCoreMLRequest on a DispatchQueue to attempt to classify the image (drawing) using a VNImageRequestHandler and our model (which is self in this context, as this is an extension on the model).

  11. Add one more extension on Collection:

    extension Collection where Element == VNClassificationObservation {
        var list: String {
            var string = ""
            for element in self {
                string += "\(element.identifier): " +
                    "\(element.confidence * 100.0)%\n"
            }
            return string
        }
    }

    This extension on Collections of VNClassificationObservations (which are what you get back when you perform an image analysis using Apple’s Vision framework) adds a var called list, of type String, which allows us to get the identifier and confidence from the VNClassificationObservation.

  12. To add some custom views, add a file called Views.swift, import SwiftUI, and then add the following ImagePicker struct:

    struct ImagePicker: UIViewControllerRepresentable {
        typealias UIViewControllerType = UIImagePickerController
        private(set) var selectedImage: UIImage?
        private(set) var cameraSource: Bool
        private let completion: (UIImage?) -> ()
    
        init(camera: Bool = false, completion: @escaping (UIImage?) -> ()) {
            self.cameraSource = camera
            self.completion = completion
        }
    
        func makeCoordinator() -> ImagePicker.Coordinator {
            let coordinator = Coordinator(self)
            coordinator.completion = self.completion
            return coordinator
        }
    
        func makeUIViewController(context: Context) ->
            UIImagePickerController {
    
            let imagePickerController = UIImagePickerController()
            imagePickerController.delegate = context.coordinator
            imagePickerController.sourceType =
                cameraSource ? .camera : .photoLibrary
            imagePickerController.allowsEditing = true
            return imagePickerController
        }
    
        func updateUIViewController(
            _ uiViewController: UIImagePickerController, context: Context) {}
    
        class Coordinator: NSObject,
            UIImagePickerControllerDelegate, UINavigationControllerDelegate {
    
            var parent: ImagePicker
            var completion: ((UIImage?) -> ())?
    
            init(_ imagePickerControllerWrapper: ImagePicker) {
                self.parent = imagePickerControllerWrapper
            }
    
            func imagePickerController(
                _ picker: UIImagePickerController,
                didFinishPickingMediaWithInfo info:
                    [UIImagePickerController.InfoKey: Any]) {
    
                print("Image picker complete...")
    
                let selectedImage =
                    info[UIImagePickerController.InfoKey.originalImage]
                        as? UIImage
    
                picker.dismiss(animated: true)
                completion?(selectedImage)
            }
    
            func imagePickerControllerDidCancel(_ picker:
                UIImagePickerController) {
    
                print("Image picker cancelled...")
                picker.dismiss(animated: true)
                completion?(nil)
            }
        }
    }

    As we did when we built a face-detection app using SwiftUI in “Building the App”, this fakes a ViewController in SwiftUI, allowing us to use UIKit features to get an image picker.

  13. Add the following TwoStateButton view:

    struct TwoStateButton: View {
        private let text: String
        private let disabled: Bool
        private let background: Color
        private let action: () -> Void
    
        var body: some View {
            Button(action: action) {
                HStack {
                    Spacer()
                    Text(text).font(.title).bold().foregroundColor(.white)
                    Spacer()
                    }.padding().background(background).cornerRadius(10)
                }.disabled(disabled)
        }
    
        init(text: String,
            disabled: Bool,
            background: Color = .blue,
            action: @escaping () -> Void) {
    
            self.text = text
            self.disabled = disabled
            self.background = disabled ? .gray : background
            self.action = action
        }
    }

    This TwoStateButton should look pretty familiar at this point: it defines a SwiftUI view for a Button that can be disabled and have that visually represented.

  14. Add the following MainView View:

    struct MainView: View {
        private let image: UIImage
        private let text: String
        private let button: TwoStateButton
    
        var body: some View {
            VStack {
                Image(uiImage: image)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
    
                Spacer()
                Text(text).font(.title).bold()
                Spacer()
                self.button
            }
        }
    
        init(image: UIImage, text: String, button: () -> TwoStateButton) {
            self.image = image
            self.text = text
            self.button = button()
        }
    }

    This MainView defines a VStack with an Image, a Spacer, some Text, and a TwoStateButton.

  15. Next, open ContentView.swift, and then add the following @State variables:

    @State private var imagePickerOpen: Bool = false
    @State private var cameraOpen: Bool = false
    @State private var image: UIImage? = nil
    @State private var classification: String? = nil
  16. And the following variables:

    private let placeholderImage = UIImage(named: "placeholder")!
    private let classifier = DrawingClassifierModel()
    
    private var cameraEnabled: Bool {
        UIImagePickerController.isSourceTypeAvailable(.camera)
    }
    
    private var classificationEnabled: Bool {
        image != nil && classification == nil
    }
  17. Add a function to perform the classification:

    private func classify() {
        print("Analysing drawing...")
        classifier.classify(self.image) { result in
            self.classification = result?.icon
        }
    }
  18. Add a function to return control, after classification:

    private func controlReturned(image: UIImage?) {
        print("Image return \(image == nil ? "failure" : "success")...")
    
        // turn image right side up, resize it and turn it black-and-white
        self.image = classifier.configure(image: image)
    }
  19. Add a function to summon an image picker:

    private func summonImagePicker() {
        print("Summoning ImagePicker...")
        imagePickerOpen = true
    }
  20. Add a function to summon the camera:

    private func summonCamera() {
        print("Summoning camera...")
        cameraOpen = true
    }
  21. Add an extension on ContentView, which returns the right views, as needed:

    extension ContentView {
        private func mainView() -> AnyView {
            return AnyView(NavigationView {
                MainView(
                    image: image ?? placeholderImage,
                    text: "\(classification ?? "Nothing detected")") {
                        TwoStateButton(
                            text: "Classify",
                            disabled: !classificationEnabled, action: classify
                        )
                    }
                    .padding()
                    .navigationBarTitle(
                        Text("DDDemo"),
                        displayMode: .inline)
                    .navigationBarItems(
                        leading: Button(action: summonImagePicker) {
                            Text("Select")
                            },
                        trailing: Button(action: summonCamera) {
                            Image(systemName: "camera")
                        }.disabled(!cameraEnabled)
                    )
            })
        }
    
        private func imagePickerView() -> AnyView {
            return  AnyView(ImagePicker { result in
                self.classification = nil
                self.controlReturned(image: result)
                self.imagePickerOpen = false
            })
        }
    
        private func cameraView() -> AnyView {
            return  AnyView(ImagePicker(camera: true) { result in
                self.classification = nil
                self.controlReturned(image: result)
                self.cameraOpen = false
            })
        }
    }
  22. Update the body View to look as follows:

    var body: some View {
        if imagePickerOpen { return imagePickerView() }
        if cameraOpen { return cameraView() }
        return mainView()
    }

You can now fire up your drawing classifier app, draw some things on paper, take a photo, and watch your app identify your drawings (well, as long as the drawings match the categories you trained the model with). Figure 4-33 presents some examples of the authors’ handiwork.

What’s Next?

This is just one way you could make a drawing classification feature. Drawings are often created on iOS devices, which means we’re going through some possibly unnecessary steps of taking or selecting a photo. Why not allow the user to draw directly in our app?

Later in Chapter 7, we look at creating a drawing classifier for drawings made on the device “Task: Gestural Classification for Drawing”.

pais 0426
Figure 4-33. The fabulous artwork of the authors being identified by our app

Task: Style Classification

For our final vision-related task, we modify the app that we built for image classification in “Task: Image Classification” to make it capable of identifying the style of a supplied image. We’re going to do this quickly, and in the most straightforward and practical way we know how: by converting a preexisting model into Apple’s CoreML format.

We need a model that can identify styles. Luckily, the boffins have us covered. The “Finetuning CaffeNet on Flickr Style” is a classifier model that’s been trained on many images of different categories and can identify a variety of image styles.

Note

The styles that the model can identify are Detailed, Pastel, Melancholy, Noir, HDR, Vintage, Long Exposure, Horror, Sunny, Texture, Bright, Hazy, Bokeh, Serene, Ethereal, Macro, Depth of Field, Geometric Composition, Minimal, and Romantic. The model we’re using here is based on this research paper.

Converting the Model

We need to use Python to convert the model to something that we can use:

  1. Create a new Python environment following the instructions in “Python” and then activate it:

    conda create -n StyleClassifier python=3.6

    conda activate StyleClassifier

  2. Install Apple’s CoreML Tools (we discussed this earlier, in “CoreML Community Tools”):

    pip install coremltools

  3. Create a file called styles.txt with the following contents:

    Detailed
    Pastel
    Melancholy
    Noir
    HDR
    Vintage
    Long Exposure
    Horror
    Sunny
    Bright
    Hazy
    Bokeh
    Serene
    Texture
    Ethereal
    Macro
    Depth of Field
    Geometric Composition
    Minimal
    Romantic
  4. Download the trained model we’re using, in Caffe format, from the Berkeleyvision website.

    Save this model file (it’s a few hundred megabytes) next to the styles.txt file.

  5. Download and save this file next to it. The deploy.prototxt file specifies the parameters for the model that we need in order to be able to convert it to the CoreML format.

  6. Create a new Python script in the same folder (ours is called convert_styleclassifier.py), and then add the following code:

    import coremltools
    
    coreml_model = coremltools.converters.caffe.convert(
        ('./finetune_flickr_style.caffemodel', './deploy.prototxt'),
        image_input_names = 'data',
        class_labels = './styles.txt'
    )
    
    coreml_model.author = 'Paris BA'
    
    coreml_model.license = 'None'
    
    coreml_model.short_description = 'Flickr Style'
    
    coreml_model.input_description['data'] = 'An image.'
    
    coreml_model.output_description['prob'] = (
        'Probabilities for style type, for a given input.'
    )
    
    coreml_model.output_description['classLabel'] = (
        'The most likely style type for the given input.'
    )
    
    coreml_model.save('Style.mlmodel')

    This code imports the CoreML Tools, loads the Caffe converter that is supplied by CoreML Tools, and points to the finetune_flickr_style.caffemodel model that we downloaded. It’s also where to find the deploy.prototxt parameters file, which supplies some metadata and saves out a CoreML format model named Style.mlmodel.

Everything should look like Figure 4-34.

pais 0434
Figure 4-34. The files needed to convert the style classifier
  1. Run the Python script:

    python convert_styleclassifier.py

You’ll see something that looks like Figure 4-35, and you’ll end up with a Style.mlmodel file in the folder (Figure 4-36).

pais 0435
Figure 4-35. Converting the style classifier from Caffee to CoreML
pais 0436
Figure 4-36. Our new CoreML style classifier model

Using the Model

First, you’ll want to duplicate the final version of the project we created for the classification task in “Task: Image Classification”. If you don’t want to, you can download ours from our website; look for the project named StyleClassifier.

To use the Style.mlmodel file we just converted, do the following:

  1. Open the Xcode project that you duplicated (or downloaded from our resources).

  2. Drag Style.mlmodel into the project, allowing Xcode to copy as needed.

  3. In ViewController.swift, change the line that references the model from this

    private let classifier = VisionClassifier(mlmodel: BananaOrApple().model)

    to this:

    private let classifier = VisionClassifier(mlmodel: Style().model)

    Run the app. You can now select an image, tap the button, and receive a classification, as shown in Figure 4-37.

pais 0437
Figure 4-37. Our Style Classifier in action

We look at the use of CoreML Tools (“CoreML Community Tools”) to convert models more in later activities, such as in “Task: Image Generation with a GAN” and “Task: Using the CoreML Community Tools”.

Next Steps

That’s about it for our vision chapter. We’ve covered some common vision-related practical AI tasks that you might want to accomplish with Swift, and used a fairly wide variety of tools to do so.

We built seven apps and Playgrounds, exploring seven practical AI tasks related to vision:

Face Detection

We used Apple’s new SwiftUI for the interface, and Apple’s provided framework, Vision, to detect faces and work with that information. We didn’t even need to train a model.

Barcode Detection

We used Apple’s frameworks to find barcodes in images. Again, we didn’t need to train a model.

Saliency Detection

In this task, we found the most salient area of an image using Apple’s frameworks. Still no model training!

Image Similarity

We again used Apple’s new SwiftUI framework and again used Vision to build an app that lets us see how different (or similar) two images are. And no model training here, either.

Image Classification

This time we used Apple’s UIKit framework to build the UI, trained our own image classification model using Apple’s CreateML app and an open source dataset of fruit photos, and built an app that can recognize different fruits from photos. Finally, we trained a model!

Drawing Recognition

We again used SwiftUI to build a derivative app of our Face Detection app, creating our own drawing classification model using Apple’s Turi Create Python framework to build an app that allows users to identify what they’ve drawn on paper.

Style Classification

We updated our Image Classification app to support identifying the style of a supplied image by converting a model built with another set of tools into Apple’s CoreML format.

Note

As we mentioned in “Apple’s Models”, if you want to solve a practical AI problem regarding vision, you can also check out Apple’s Core ML Models page and see what it offers in the way of pretrained CoreML models. If you can solve your problem without having to do as much work yourself, it’s probably worth it. We also recommend checking out the Awesome CoreML Models list.

In Chapter 11, we look at what might have happened under the hood, algorithm-wise, for each of the tasks we explored in this chapter. Just because that’s the end of the chapter named “Vision,” it doesn’t mean that we won’t be working with visual things in other chapters of the book. In Chapter 5, we look at audio, though—also a very exciting topic

For more vision-related practical AI tasks, check out our website.

Get Practical Artificial Intelligence with Swift now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.