1.5. Processing a String One Character at a Time

Problem

You want to iterate through each character in a string, performing an operation on each character as you traverse the string.

Solution

Depending on your needs and preferences, you can use the map or foreach methods, a for loop, or other approaches. Here’s a simple example of how to create an uppercase string from an input string, using map:

scala> val upper = "hello, world".map(c => c.toUpper)
upper: String = HELLO, WORLD

As you’ll see in many examples throughout this book, you can shorten that code using the magic of Scala’s underscore character:

scala> val upper = "hello, world".map(_.toUpper)
upper: String = HELLO, WORLD

With any collection—such as a sequence of characters in a string—you can also chain collection methods together to achieve a desired result. In the following example, the filter method is called on the original String to create a new String with all occurrences of the lowercase letter “L” removed. That String is then used as input to the map method to convert the remaining characters to uppercase:

scala> val upper = "hello, world".filter(_ != 'l').map(_.toUpper)
upper: String = HEO, WORD

When you first start with Scala, you may not be comfortable with the map method, in which case you can use Scala’s for loop to achieve the same result. This example shows another way to print each character:

scala> for (c <- "hello") println(c)
h
e
l
l
o

To write a for loop to work like a map method, add a yield statement to the end of the loop. This for/yield loop is equivalent to the first two map examples:

scala> val upper = for (c <- "hello, world") yield c.toUpper
upper: String = HELLO, WORLD

Adding yield to a for loop essentially places the result from each loop iteration into a temporary holding area. When the loop completes, all of the elements in the holding area are returned as a single collection.

This for/yield loop achieves the same result as the third map example:

val result = for {
  c <- "hello, world"
  if c != 'l'
} yield c.toUpper

Whereas the map or for/yield approaches are used to transform one collection into another, the foreach method is typically used to operate on each element without returning a result. This is useful for situations like printing:

scala> "hello".foreach(println)
h
e
l
l
o

Discussion

Because Scala treats a string as a sequence of characters—and because of Scala’s background as both an object-oriented and functional programming language—you can iterate over the characters in a string with the approaches shown. Compare those examples with a common Java approach:

String s = "Hello";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
  char c = s.charAt(i);
  // do something with the character ...
  // sb.append ...
}
String result = sb.toString();

You’ll see that the Scala approach is more concise, but still very readable. This combination of conciseness and readability lets you focus on solving the problem at hand. Once you get comfortable with Scala, it feels like the imperative code in the Java example obscures your business logic.

Note

Wikipedia describes imperative programming like this:

Imperative programming is a programming paradigm that describes computation in terms of statements that change a program state ... imperative programs define sequences of commands for the computer to perform.

This is shown in the Java example, which defines a series of explicit statements that tell a computer how to achieve a desired result.

Understanding how map works

Depending on your coding preferences, you can pass large blocks of code to a map method. These two examples demonstrate the syntax for passing an algorithm to a map method:

// first example
"HELLO".map(c => (c.toByte+32).toChar)

// second example
"HELLO".map{ c =>
  (c.toByte+32).toChar
}

Notice that the algorithm operates on one Char at a time. This is because the map method in this example is called on a String, and map treats a String as a sequential collection of Char elements. The map method has an implicit loop, and in that loop, it passes one Char at a time to the algorithm it’s given.

Although this algorithm it still short, imagine for a moment that it is longer. In this case, to keep your code clear, you might want to write it as a method (or function) that you can pass into the map method.

To write a method that you can pass into map to operate on the characters in a String, define it to take a single Char as input, then perform the logic on that Char inside the method. When the logic is complete, return whatever it is that your algorithm returns. Though the following algorithm is still short, it demonstrates how to create a custom method and pass that method into map:

// write your own method that operates on a character
scala> def toLower(c: Char): Char = (c.toByte+32).toChar
toLower: (c: Char)Char

// use that method with map
scala> "HELLO".map(toLower)
res0: String = hello

As an added benefit, the same method also works with the for/yield approach:

scala> val s = "HELLO"
s: java.lang.String = HELLO

scala> for (c <- s) yield toLower(c)
res1: String = hello

Note

I’ve used the word “method” in this discussion, but you can also use functions here instead of methods. What’s the difference between a method and a function?

Here’s a quick look at a function equivalent to this toLower method:

val toLower = (c: Char) => (c.toByte+32).toChar

This function can be passed into map in the same way the previous toLower method was used:

scala> "HELLO".map(toLower)
res0: String = hello

For more information on functions and the differences between methods and functions, see Chapter 9, Functional Programming.

A complete example

The following example demonstrates how to call the getBytes method on a String, and then pass a block of code into a foreach method to help calculate an Adler-32 checksum value on a String:

package tests

/**
 * Calculate the Adler-32 checksum using Scala.
 * @see http://en.wikipedia.org/wiki/Adler-32
 */
object Adler32Checksum {

  val MOD_ADLER = 65521

  def main(args: Array[String]) {
    val sum = adler32sum("Wikipedia")
    printf("checksum (int) = %d\n", sum)
    printf("checksum (hex) = %s\n", sum.toHexString)
  }

  def adler32sum(s: String): Int = {
    var a = 1
    var b = 0
    s.getBytes.foreach{char =>
      a = (char + a) % MOD_ADLER
      b = (b + a) % MOD_ADLER
    }
    // note: Int is 32 bits, which this requires
    b * 65536 + a     // or (b << 16) + a
  }

}

The getBytes method returns a sequential collection of bytes from a String as follows:

scala> "hello".getBytes
res0: Array[Byte] = Array(104, 101, 108, 108, 111)

Adding the foreach method call after getBytes lets you operate on each Byte value:

scala> "hello".getBytes.foreach(println)
104
101
108
108
111

You use foreach in this example instead of map, because the goal is to loop over each Byte in the String, and do something with each Byte, but you don’t want to return anything from the loop.

See Also

  • Under the covers, the Scala compiler translates a for loop into a foreach method call. This gets more complicated if the loop has one or more if statements (guards) or a yield expression. This is discussed in detail in Recipe 3.1 and I also provide examples on my website at alvinalexander.com. The full details are presented in Section 6.19 of the current Scala Language Specification.

  • The Adler-32 checksum algorithm

Get Scala Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.