12.4. How to Process Every Character in a Text File
Problem
You want to open a text file and process every character in the file.
Solution
If performance isn’t a concern, write your code in a straightforward, obvious way:
valsource=io.Source.fromFile("/Users/Al/.bash_profile")for(char<-source){println(char.toUpper)}source.close
However, be aware that this code may be slow on large files. For instance, the following method that counts the number of lines in a file takes 100 seconds to run on an Apache access logfile that is ten million lines long:
// run time: took 100 secsdefcountLines1(source:io.Source):Long={valNEWLINE=10varnewlineCount=0Lfor{char<-sourceifchar.toByte==NEWLINE}newlineCount+=1newlineCount}
The time can be significantly reduced by using the getLines method to retrieve one line at a
time, and then working through the characters in each line. The
following line-counting algorithm counts the same ten million lines in
just 23 seconds:
// run time: 23 seconds// use getLines, then count the newline characters// (redundant for this purpose, i know)defcountLines2(source:io.Source):Long={valNEWLINE=10varnewlineCount=0Lfor{line<-source.getLinesc<-lineifc.toByte==NEWLINE}newlineCount+=1newlineCount}
Both algorithms work through each byte in the file, but by using
getLines in the second algorithm, the
run time is reduced dramatically.
Note
Notice that there’s the equivalent of two for loops in the second example. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access