Chapter 4. Local I/O

4.0. Introduction

We’ve done a lot of work in the last few chapters, but clearly, the rubber has to meet the road somewhere. How did we get all of this data into our Clojure programs, and more importantly, how do we get it out? This chapter is all about input and output to a local computer—the primary place where most applications’ data hits the road, so to speak.

There are a variety of modes and mediums for communicating with a local machine. What do we communicate with, in what way, and in what format? It’s a little like the classic board game Clue: was it plain text, in the console, with command-line arguments; or Clojure data, in a file, as configuration data? In this chapter we’ll explore files, formats, and applications of both GUI and console flavors, to name a few topics.

While it isn’t possible for us to enumerate every possible combination, it is our hope that this chapter will give you a strong idea of what is possible. Handily enough, most good solutions in Clojure compose; you should have little trouble sticking together any number of recipes in this chapter to suit your needs.

4.1. Writing to STDOUT and STDERR

Problem

You want to write to STDOUT and STDERR.

Solution

By default, the print and println functions will print content passed to them to STDOUT:

(println "This text will be printed to STDOUT.")
;; *out*
;; This text will be printed to STDOUT.

(do
  (print "a")
  (print "b"))
;; *out*
;; ab

Change the binding of *out* to *err* to print to STDERR instead of STDOUT:

(binding [*out* *err*]
  (println "Blew up!"))
;; *err*
;; Blew up!\n

Discussion

In Clojure, the dynamic binding vars *out* and *err* are bound to your application environment’s built-in STDOUT and STDERR streams, respectively.

All of the printing functions in Clojure, such as print and println, utilize the *out* binding as the destination to write to. Consequently, you can rebind that var to *err* (using binding) to change the destination of print messages from STDOUT to STDERR. Other printing functions include pr, prn, printf, and a handful of others.

The bound value of *out* is not restricted to operating system streams; *out* can be any stream-like object. This makes print functions powerful tools. They can be used to write to files, sockets, or any other pipes you desire. The built-in function clojure.java.io/writer is a versatile constructor for output streams:

;; Create a writer to file foo.txt and print to it.
(def foo-file (clojure.java.io/writer "foo.txt"))
(binding [*out* foo-file]
  (println "Foo, bar."))

;; Nothing is printed to *out*.

;; And of course, close the file.
(.close foo-file)

See Also

  • pr’s documentation and source to get a better idea of how *out*-based printing works
  • clojure.java.io/writer’s documentation for more information on creating writers

4.2. Reading a Single Keystroke from the Console

Problem

Console input via stdin is normally buffered by lines; you want to read a single, unbuffered keystroke from the console.

Solution

Use ConsoleReader from the JLine library, a Java library for handling console input.

JLine is similar to BSD editline and GNU readline. To follow along with this recipe, create a new library using the command lein new keystroke. Inside project.clj, add [jline "2.11"] to the :dependencies vector.

Inside the src/keystroke/core.clj file, use ConsoleReader to read characters from the terminal:

(ns keystroke.core
  (:import [jline.console ConsoleReader]))

(defn show-keystroke []
  (print "Enter a keystroke: ")
  (flush)
  (let [cr (ConsoleReader.)
        keyint (.readCharacter cr)]
    (println (format "Got %d ('%c')!" keyint (char keyint)))))

Discussion

As in most languages, console I/O in Java is buffered; flush writes the initial prompt to the standard output stream. However, input is buffered as well by default. The JLine library provides a ConsoleReader object whose readCharacter method lets you avoid the input buffering. Beware, however, of testing show-keystroke at the REPL:

$ lein repl
user=> (require '[keystroke.core :refer [show-keystroke]])
user=> (show-keystroke)
Enter a keystroke:
;; HANGS!

In order to connect the console’s input correctly to the REPL, use lein trampoline repl (the <r> here means the user types the letter r):

$ lein trampoline repl
user=> (require '[keystroke.core :refer [show-keystroke]])
user=> (show-keystroke)
Enter a keystroke: <r>Got 114 ('r')!
nil
user=>

lein trampoline is necessary because, by default, a Leiningen REPL actually runs the REPL and its associated console I/O in a separate JVM process from your application code. Using the trampoline option forces Leiningen to run your code in the same process as the REPL, “trampolining” control back and forth. Normally this is invisible, but it is a problem when running code that itself is attempting to use the console directly.

When running your program outside the REPL (as you typically would be, with a command-line application written in Clojure), this is not an issue.

See Also

  • If you want a richer terminal-based interface similar to what the C curses library provides, the clojure-lanterna library may be a good place to start.

4.3. Executing System Commands

Problem

You want to send a command to the underlying operating system and get its output.

Solution

Use the clj-commons-exec library to run shell commands on your local system.

To follow along, start a REPL using lein-try:

$ lein try org.clojars.hozumi/clj-commons-exec "1.0.6"

Invoking the clj-commons-exec/exec function with a command will return a promise, eventually delivering a map of the command’s output, exit status, and any errors that occurred (available via the :out, :exit, and :err keys, respectively):

(require '[clj-commons-exec :as exec])

(def p (exec/sh ["date"]))

(deref p)
;; -> {:exit 0, :out "Sun Dec  1 19:43:49 EST 2013\n", :err nil}

If your command requires options or arguments, simply append them to the command vector as strings:

@(exec/sh ["ls" "-l" "/etc/passwd"])
;; -> {:exit 0
;;     :out "-rw-r--r--  1 root  wheel  4962 May 27 07:54 /etc/passwd\n"
;;     :err nil}

@(exec/sh ["ls" "-l" "nosuchfile"])
;; -> {:exit 1
;;     :out nil
;;     :err "ls: nosuchfile: No such file or directory\n"
;;     :exception #<ExecuteException ... Process exited with an error: 1 ...)>}

Discussion

Up until this point, we’ve neglected to mention that functionality equivalent to exec/sh already exists in Clojure proper (as clojure.java.shell/sh). Now that the cat is out of the bag, it must be asked: why use a library over a built-in? Simple: clj-commons-exec is a functional veneer over the excellent Apache Commons Exec library, providing capabilities like piping not available in clojure.java.sh.

To pipe data through multiple commands, use the clj-commons-exec/sh-pipe function. Just as with regular Unix pipes, pairs of commands will have their STDOUT and STDIN streams bound to each other. The API of sh-pipe is nearly identically to that of sh, the only notable exception being that you will pass more than one command to sh-pipe. The return value of sh-pipe is a list of promises that fulfill as each subcommand completes execution:

(def results (exec/sh-pipe ["cat"] ["wc" "-w"] {:in "Hello, world!"}))

results
;; -> (#<core$promise$reify__6310@71eed8d: {:exit 0, :out nil, :err nil}>
;;     #<core$promise$reify__6310@7f7dc7a1: {:exit 0,
;;                                           :out "       2\n",
;;                                           :err nil}>)

@(last results)
;; -> {:exit 0, :out "       2\n", :err nil}

Like any reasonable shell-process library, clj-commons-exec allows you to configure the environment in which your commands execute. To control the execution environment of either sh or sh-pipe, specify options in a map as the final argument to either function. The :dir option controls the path on which a command executes:

(println (:out @(exec/sh ["ls"] {:dir "/"})))
;; *out*
Applications
Library
# ...
usr
var

The :env and :add-env options control the environment variables available to the executing command. :add-env appends variables to the existing set of environment variables, while :env replaces the existing set with a completely new one. Each option is a map of variable names to values, like {"USER" "jeff"}:

@(exec/sh ["printenv" "HOME"])
;; -> {:exit 0, :out "/Users/ryan\n", :err nil}

@(exec/sh ["printenv" "HOME"] {:env {}})
;; -> {:exit 1, :out nil, :err nil, :exception #<ExecuteException ..)>}

@(exec/sh ["printenv" "HOME"] {:env {"HOME" "/Users/jeff"}})
;; -> {:exit 0, :out "/Users/jeff\n", :err nil}

There are a number of other options available in sh and sh-pipe:

:watchdog
The time in number of seconds to wait for a command to finish executing before terminating it
:shutdown
A flag indicating that subprocesses should be destroyed when the VM exits
:as-success and :as-successes
An integer or sequence of integers that will be considered successful exit codes, respectively
:result-handler-fn
A custom function to be used to handle results

Warning

If you initiate long-running subprocesses inside of a -main function, your application will hang until those processes complete. If this isn’t desirable, forcibly terminate your application by invoking (System/exit) directly at the end of your -main function. Additionally, set the option :shutdown to true for any subprocesses to ensure you leave your system tidy and free of rogue processes.

To check if a subprocess has returned without waiting for it to finish, invoke the realized? function on the promise returned by sh (this is especially useful for monitoring the progress of the sequence of promises returned by sh-pipe):

;; Any old long-running command
(def p (exec/sh ["sleep" "5"]))

(realized? p)
;; -> false

;; A few seconds later...
(realized? p)
;; -> true

See Also

  • If you don’t need piping or clj-common-execs advanced features, consider using clojure.java.shell

4.4. Accessing Resource Files

Problem

You want to include a resource file from the classpath in your Clojure project.

Solution

Place resource files in the resources/ directory at the top level of your Leiningen project. To follow along with this recipe, create a new project with the command lein new people.

For example, suppose you have a file resources/people.edn with the following contents:

[{:first-name "John", :last-name "McCarthy", :language "Lisp"}
 {:first-name "Guido", :last-name "Van Rossum", :language "Python"}
 {:first-name "Rich", :last-name "Hickey", :language "Clojure"}]

Pass the name of the file (relative to the resources directory) to the clojure.java.io/resource function to obtain an instance of java.io.File, which you can then read as you please (for example, using the slurp function):

(require '[clojure.java.io :as io]
         '[clojure.edn :as edn])

(->> "people.edn"
     io/resource
     slurp
     edn/read-string
     (map :language))
;; -> ("Lisp" "Python" "Clojure")

Discussion

Resources are commonly used to store any kind of file that is logically a part of your application, but is not code.

Resources are loaded via the Java classpath, just like Clojure code is. Leiningen puts the resources/ directory on the classpath automatically whenever it starts a Java process, and when packaged, the contents of resources/ are copied to the root of any emitted JAR files.

You can also specify an alternative (or additional) resource directory using the :resources-paths key in your project.clj:

:resource-paths ["my-resources" "src/other-resources"]

Using classpath-based resources is very convenient, but it does have its drawbacks.

Be aware that in the context of a web application, any change to resources is likely to require a full redeployment, because they are included wholesale in the JAR or WAR file that will be deployed. Typically, this means it’s best to use resources only for items that really are completely static. For example, though it’s possible to place your application’s configuration files in the resources/ directory and load them from there, to do so is really to make them part of your application’s source code, which rather defeats the purpose. You may wish to load that kind of (relatively) frequently changing resource in a known filesystem location and load from there instead, rather than using the classpath.

Also, there are sometimes additional reasons to not serve from the classpath. For example, consider static images on a website. If you place them in your web application’s classpath, then they will be served by your application server container (Jetty, Tomcat, JBoss, etc.). Typically, these applications are optimized for serving dynamic HTML resources, not larger binary blobs. Serving larger static files is often more suited to the HTTP server level of your architecture than the application server level, and should be delegated to Apache, Nginx, or whatever other HTTP server you’re using. Or, you might even want to split them off and serve them via a separate mechanism entirely, such as a content delivery network (CDN). In either case, it is difficult to set up the HTTP server or CDN to introspect resources inside of your application’s JAR file—it’s usually better to store them elsewhere, from the start.

See Also

4.5. Copying Files

Problem

You need to copy a file on your local filesystem.

Solution

Invoke clojure.java.io/copy, passing it the source and destination files:

(clojure.java.io/copy
  (clojure.java.io/file "./file-to-copy.txt")
  (clojure.java.io/file "./my-new-copy.txt"))
;; -> nil

If the input file is not found, a java.io.FileNotFoundException will be thrown:

(clojure.java.io/copy
  (clojure.java.io/file "./file-do-not-exist.txt")
  (clojure.java.io/file "./my-new-copy.txt"))
;; -> java.io.FileNotFoundException

The input argument to copy doesn’t have to be a file; it can be an InputStream, a Reader, a byte array, or a string. This makes it easier to copy the data you are working with directly to the output file:

(clojure.java.io/copy "some text" (clojure.java.io/file "./str-test.txt"))
;; -> nil

If required, an encoding can be specified by the :encoding option:

(clojure.java.io/copy "some text"
                      (clojure.java.io/file "./str-test.txt")
                      :encoding "UTF-8")

Discussion

Note that if the file already exists, it will be overwritten. If that is not what you want, you can put together a “safe” copy function that will catch any exceptions and optionally overwrite:

(defn safe-copy [source-path destination-path & opts]
  (let [source (clojure.java.io/file source-path)
        destination (clojure.java.io/file destination-path)
        options (merge {:overwrite false} (apply hash-map opts))] ; 1
    (if (and (.exists source)                                     ; 2
             (or (:overwrite options)
                 (= false (.exists destination))))
      (try
        (= nil (clojure.java.io/copy source destination))         ; 3
        (catch Exception e (str "exception: " (.getMessage e))))
      false)))

(safe-copy "./file-to-copy.txt" "./my-new-copy.txt")
;; -> true
(safe-copy "./file-to-copy.txt" "./my-new-copy.txt")
;; -> false
(safe-copy "./file-to-copy.txt" "./my-new-copy.txt" :overwrite true)
;; -> true

The safe-copy function takes the source and destination file paths to copy from and to. It also takes a number of key/value pairs as options.

1

These options are then merged with the default values. In this example, there is only one option, :overwrite, but with this structure for optional arguments, you can easily add your own (such as :encoding if needed).

2

After the options have been processed, the function checks whether the destination file exists, and if so, if it should be overwritten. If all is OK, it will then perform the copy inside a try-catch body.

3

Note the equality check against nil for when the file is copied. If you add this, you will always get a Boolean value from the function. This makes the function more convenient to use, since you can then conditionally check whether the operation succeed or not.

You can also use clojure.java.io/copy with a java.io.Reader and a java.io.Writer, as well as with streams:

(with-open [reader (clojure.java.io/reader "file-to-copy.txt")
            writer (clojure.java.io/writer "my-new-copy.txt")]
  (clojure.java.io/copy reader writer))

The same efficiency considerations that apply to reading and writing to a file in regard to selecting input and output sources from File, Reader, Writer, or streams should be applied to copy. See Recipe 4.9, “Reading and Writing Text Files”, for more information.

By default, a buffer size of 1,024 bytes is used when calling copy. That is the amount of data that will be read from the source and written to the destination in one pass. This is done until the complete source has been copied. The buffer size used can be changed with the :buffer-size option. Keeping this number low would cause more file access operations but would keep less data in memory. On the other hand, increasing the buffer size will lower the number of file accesses but will require more data to be loaded into memory.

See Also

4.6. Deleting Files or Directories

Problem

You need to delete a file from your local filesystem.

Solution

Use clojure.java.io/delete-file to delete the file:

(clojure.java.io/delete-file "./file-to-delete.txt")
;; -> true

If you’re trying to delete a file that does not exist, a java.io.IOException will be thrown:

(clojure.java.io/delete-file "./file-that-does-not-exist.txt")
;; -> java.io.IOException: Couldn't delete

If you do not want delete-file to throw exceptions when the given file could not be deleted for whatever reason, you can add the silently flag set to true to the arguments:

(clojure.java.io/delete-file "./file-that-does-not-exist.txt" true)
;; -> true

Discussion

For times when you want to do some custom handling of the eventual exceptions thrown, you should put the call to delete-file inside a try-catch body:

(try
  (clojure.java.io/delete-file "./file-that-does-not-exist.txt")
  (catch Exception e (str "exception: " (.getMessage e))))
;; -> "exception: Couldn't delete ./file-that-does-not-exist.txt"

java.io.File has an .exists property that simply gives you a Boolean answer as to whether a file exists or not. You can put this property together with a try-catch body to get a “safe” delete utility function. This function will first check to see if the file with the path from the argument exists before trying to delete it:

(defn safe-delete [file-path]
  (if (.exists (clojure.java.io/file file-path))
    (try
      (clojure.java.io/delete-file file-path)
      (catch Exception e (str "exception: " (.getMessage e))))
    false))

(safe-delete "./file-that-does-not-exist.txt")
;; -> false
(safe-delete "./file-to-delete.txt")
;; -> true

The clojure.java.io/delete-file function can also be used to delete directories. Directories must be empty for the deletion to be successful, so any utility function you make to delete a directory must first delete all files in the given directory:

(clojure.java.io/delete-file "./dir-to-delete")
;; -> false

(defn delete-directory [directory-path]
  (let [directory-contents (file-seq (clojure.java.io/file directory-path))
        files-to-delete (filter #(.isFile %) directory-contents)]
    (doseq [file files-to-delete]
      (safe-delete (.getPath file)))
    (safe-delete directory-path)))

(delete-directory "./dir-to-delete")
;; -> true

The delete-directory function will get a file-seq with the contents of the given path. It will then filter to only get the files of that directory. The next step is to delete all the files, and then finish up by deleting the directory itself. Note the call to doall. If you do not call doall, the deletion of the files would be lazy and then the files would still exist when the call to delete the actual directory was made, so that call would fail.

See Also

4.7. Listing Files in a Directory

Problem

Given a directory, you want to access the files inside.

Solution

Call the built-in file-seq function.

Note

To follow along with this recipe, create some sample files and folders using these commands (on Linux or Mac):

$ mkdir -p next-gen
$ touch next-gen/picard.jpg next-gen/locutus.bmp next-gen/data.txt

file-seq returns a lazy sequence of java.io.File objects:

(def tng-dir (file-seq (clojure.java.io/file "./next-gen")))

tng-dir
;; -> (#<File ./next-gen>
;;     #<File ./next-gen/picard.jpg>
;;     #<File ./next-gen/locutus.bmp>
;;     #<File ./next-gen/data.txt>)

Discussion

Sequences are one of Clojure’s more powerful abstractions; treating a directory hierarchy as a sequence allows you to leverage functions like map and filter to manipulate files and directories.

Consider, for example, the case where you would like to select only files in a directory hierarchy (and not directories). You can define such a function by taking a sequence of files and directories and filtering them by the .isFile property of java.io.File objects:

(defn only-files
  "Filter a sequence of files/directories by the .isFile property of
  java.io.File"
  [file-s]
  (filter #(.isFile %) file-s))

(only-files tng-dir)
;; -> (#<File ./next-gen/data.txt>
;;     #<File ./next-gen/locutus.bmp>
;;     #<File ./next-gen/picard.jpg>)

What if you want to display the string names of all those files? Define a names function to map the .getName property over a sequence of files, combining only-files and names to get a list of filenames in a directory:

(defn names
  "Return the .getName property of a sequence of files"
  [file-s]
  (map #(.getName %) file-s))

(-> tng-dir
    only-files
    names)
;; -> ("data.txt" "locutus.bmp" "picard.jpg")

See Also

  • The documentation for the File class for a complete list of properties and methods available on File objects.
  • Combine these techniques with utility libraries like Google Guava’s Files class or Apache Commons FilenameUtils class to exert even greater leverage over the file sequence abstraction.

4.8. Memory Mapping a File

Problem

You want to use memory mapping to access a large file as though it were fully loaded into memory, without actually loading the whole thing.

Solution

Use the clj-mmap library, which wraps the memory-mapping functionality provided by Java’s NIO (New I/O) library.

Before starting, add [clj-mmap "1.1.2"] to your project’s dependencies or start a REPL using lein-try:

$ lein try clj-mmap

To read the first and last N bytes of UTF-8 encoded text file, use the get-bytes function:

(require '[clj-mmap :as mmap])
(with-open [file (mmap/get-mmap "/path/to/file/file.txt")]
  (let [n-bytes       10
        file-size     (.size file)
        first-n-bytes (mmap/get-bytes file 0 n-bytes)
        last-n-bytes  (mmap/get-bytes file (- file-size n-bytes) n-bytes)]
    [(String. first-n-bytes "UTF-8")
     (String. last-n-bytes  "UTF-8")]))

To overwrite the first N bytes of a text file, call put-bytes:

(with-open [file (mmap/get-mmap "/path/to/file/file.txt")]
  (let [bytes-to-write (.getBytes "New text goes here" "UTF-8")
        file-size      (.size file)]
    (if (> file-size
           (alength bytes-to-write))
      (mmap/put-bytes file bytes-to-write 0))))

Discussion

Memory mapping, or mmap per the POSIX standard, is a method of leveraging the operating system’s virtual memory to perform file I/O. By mapping the file into the applications memory space, copying between buffers is reduced, and I/O performance is increased.

Memory-mapped files are especially useful when working with large files, structured binary data, or text files where Java’s String overhead may be unwelcome.

While Clojure makes it simple to work with Java’s NIO primitives directly, NIO makes working with files larger than 2 GB especially difficult. clj-mmap wraps this complexity, but it doesn’t expose all the features that NIO does. The NIO Java API is still available via interop, should it be needed.

See Also

4.9. Reading and Writing Text Files

Problem

You need to read or write a text file to the local filesystem.

Solution

Write a string to a file with the built-in spit function:

(spit "stuff.txt" "my stuff")

Read the contents of a file with the built-in slurp function:

(slurp "stuff.txt")
;; -> "all my stuff"

If required, an encoding can be specified with the :encoding option:

(slurp "stuff.txt" :encoding "UTF-8")
;; -> "all my stuff"

Append data to an existing file using the :append true option to spit:

(spit "stuff.txt" "even more stuff" :append true)

To read a file line by line, instead of loading the entire contents into memory at once, use a java.io.Reader together with the line-seq function:

(with-open [r (clojure.java.io/reader "stuff.txt")]
  (doseq [line (line-seq r)]
    (println line)))

To write a large amount of data to a file without realizing it all as a string, use a java.io.Writer:

(with-open [w (clojure.java.io/writer "stuff.txt")]
  (doseq [line some-large-seq-of-strings]
    (.write w line)
    (.newLine w)))

Discussion

When using :append, text will be appended to the end of the file. Use newlines at the end of each line by appending "\n" to the string to be printed. All lines in a text file should end with a newline, including the last one:

(defn spitn
  "Append to file with newline"
  [path text]
  (spit path (str text "\n") :append true)

When used with strings, spit and slurp deal with the entire contents of a file at a time and close the file after reading or writing. If you need to read or write a lot of data, it is more efficient (in terms of both memory and time) to use a streaming API such as java.io.Reader or java.io.Writer, since they do not require realizing the contents of the file in memory.

When using writers and streams, however, it is important to flush any writes to the underlying stream in order to ensure your data is actually written and resources are cleaned up. The with-open macro flushes and closes the stream specified in its binding after executing its body.

Warning

Be especially aware that any lazy sequences based on a stream will throw an error if the underlying stream is closed before the sequence is realized. Even when using with-open, it is possible to return an unrealized lazy sequence; the with-open macro has no way of knowing that the stream is still needed and so will close it anyway, leaving a sequence that cannot be realized.

Generally, it is best to not let lazy sequences based on streams escape the scope in which the stream is open. If you do, you must be extremely careful to ensure that the resources required for the realization of a lazy sequence are still open as long as the sequence has any readers. Typically, the latter approach involves manually tracking which streams are still open rather than relying on a try/finally or with-open block.

4.10. Using Temporary Files

Problem

You want to use a temporary file on the local filesystem.

Solution

Use the static method createTempFile of Java’s built-in java.io.File class to create a temporary file in the default temporary-file directory of the JVM, with the provided prefix and suffix:

(def my-temp-file (java.io.File/createTempFile "filename" ".txt"))

You can then write to the temporary file like you would to any other instance of java.io.File:

(with-open [file (clojure.java.io/writer my-temp-file)]
  (binding [*out* file]
    (println "Example output.")))

Discussion

Temporary files are often quite useful to interact with other programs that prefer a file-based API. Using createTempFile is important to ensure that temporary files are placed in an appropriate location on the filesystem, which can differ based on the operating system being used.

To get the full path and filename for the created temporary file:

(.getAbsolutePath my-temp-file)

You can use the File.deleteOnExit method to mark the temporary file to be deleted automatically when the JVM exits:

(.deleteOnExit my-temp-file)

Note that the file is not actually deleted until the JVM terminates and may not be deleted if the process crashes or exits abnormally. It is good practice to delete temporary files immediately when they are no longer being used:

(.delete my-temp-file)

See Also

4.11. Reading and Writing Files at Arbitrary Positions

Problem

You want to read data from a file, or write data to it, at various locations rather than sequentially.

Solution

To open a (potentially very large) file for random access, use Java’s RandomAccessFile. seek to the location you desire, then use the various write methods to write data at that location.

For example, to make a 1 GB file filled with zeros except the integer 1,234 at the end:

(import '[java.io RandomAccessFile])

(doto (RandomAccessFile. "/tmp/longfile" "rw")
  (.seek (* 1000 1000 1000))
  (.writeInt 1234)
  (.close))

Getting the length of a “normal” Java file object shows that the file is the correct size:

(require '[clojure.java.io :refer [file]])
(.length (file "/tmp/longfile"))

;; -> 1000000004

(You can also call length on a RandomAccessFile directly.)

Reading a value back from the proper location in Clojure is quite similar to writing. Again, seek a RandomAccessFile. Then use the appropriate read method:

(let [raf (RandomAccessFile. "/tmp/longfile" "r")
      _ (.seek raf (* 1000 1000 1000))
      result (.readInt raf)]
  (.close raf)
  result)

;; -> 1234

Discussion

Files written in this way are populated by zeros by default and may be treated as “sparse files” by the JVM implementation and the underlying operating system, leading to extra efficiency in reading and writing.

Examining the file we created using the Unix od program to do a hex dump from the command line shows that the file consists of zeros with our 1234 at the end:

$ od -Ad -tx4 /tmp/longfile
0000000          00000000        00000000        00000000        00000000
*
1000000000          d2040000
1000000004

At byte offset 1000000000 can be seen the value d2040000, which is the hex representation of a big-endian integer with the value 1,234. (Java integers are big-endian by default. This means that the highest-order bytes are stored at the lowest addresses.)

See Also

4.12. Parallelizing File Processing

Problem

You want to transform a text file line by line, but using all cores and without loading it into memory.

Solution

A quick win using pmap over a sequence returned by line-seq:

(require ['clojure.java.io :as 'jio])

(defn pmap-file
  "Process input-file in parallel, applying processing-fn to each row
  outputting into output-file"
  [processing-fn input-file output-file]
  (with-open [rdr (jio/reader input-file)
              wtr (jio/writer output-file)]
    (let [lines (line-seq rdr)]
      (dorun
       (map #(.write wtr %)
            (pmap processing-fn lines))))))

;; Example of calling this
(def accumulator (atom 0))

(defn- example-row-fn
  "Trivial example"
  [row-string]
  (str row-string "," (swap! accumulator inc) "\n"))

;; Call it
(pmap-file example-row-fn "input.txt" "output.txt")

Discussion

The key functions used in this example (beyond basic Clojure constructs like map or dorun) are line-seq and pmap.

line-seq, given an instance of java.io.BufferedReader (which clojure.java.io/reader returns), will return a lazy sequence of strings. Each string is a line in the input file. What constitutes a newline for the purposes of line splitting is determined by the line.separator JVM option, which will be set in a platform-specific way. Specifically, it will be a carriage return character followed by a line feed character in Windows, and a single newline character in Unix-derived systems such as Linux or Mac OS X.

pmap functions identically to map and applies a function to each item in a sequence, returning a lazy sequence of return values. The difference is that as it applies the mapping function, it does so in a separate thread for each item in the collection (up to a certain fixed number of threads related to the number of CPUs on your system). Threads realizing the sequence will block if the values are not ready yet.

pmap can yield substantial performance improvements by distributing work across multiple CPU cores and performing it concurrently, but it isn’t a magic bullet. Specifically, it incurs a certain amount of coordination overhead to schedule the multithreaded operations. Typically, it gives the most benefit when performing very heavyweight operations, where the mapping function is so computationally expensive that it makes the coordination overhead worth it. For simple functions that complete very quickly (such as basic operations on primitives), the coordination overhead is likely to be much larger than any performance gains, and pmap will actually be much slower than map in that case.

The idea is to use pmap to map over the sequence of file rows in parallel. However, you then need to pass each processed row through (map #(.write wtr %) ...) in order to ensure the rows are written one at a time (put the write in the processing function to see what happens otherwise). Finally, as these are lazy sequences, you need to realize their side effects before exiting the with-open block or the file will be closed by the time you wish to evaluate them. This is accomplished by calling dorun.

There are a couple of caveats here. Firstly, although the row ordering of the output file will match that of the input, the execution order is not guaranteed. Secondly, the process will become I/O-bound quite quickly as all the writes happen on one thread, so you may not get the speedup you expect unless the processing function is substantial. Finally, pmap is not perfectly efficient at allocating work, so the degree of speedup you see might not correspond exactly to the number of processors on your system, as you might expect.

Another drawback to the pmap approach is that the actual reading of the file is serialized, using a single java.io.Reader. Considerable gains can still be realized if the processing task is expensive compared to reading, but in lightweight tasks the bottleneck is likely to be reading the file itself, in which case parallelizing the processing work will give little to no gains in terms of total runtime (or even make it worse).

See Also

4.13. Parallelizing File Processing with Reducers

Problem

You want to use Clojure’s reducers on a file to realize parallel processing without loading the file into memory.

Solution

Use the Iota library in conjunction with the filter, map, and fold functions from the Clojure Reducers library in the clojure.core.reducers namespace. To follow along with this recipe, add [iota "1.1.1"] to your project’s dependencies, or start a REPL with lein-try:

$ lein try iota

To count the words in a very large file, for example:

(require '[iota                  :as io]
         '[clojure.core.reducers :as r]
         '[clojure.string        :as str])


;; Word-counting functions
(defn count-map
  "Returns a map of words to occurence count in the given string"
  [s]
  (reduce (fn [m w] (update-in m [w] (fnil (partial inc) 0)))
          {}
          (str/split s #" ")))

(defn add-maps
  "Returns a map where each key is the sum of vals of that key in m1 and m2."
  ([] {}) ;; Necessary base case for use as combiner in fold
  ([m1 m2]
     (reduce (fn [m [k v]] (update-in m [k] (fnil (partial + v) 0))) m1 m2)))


;; Main file processing
(defn keyword-count
  "Returns a map of the word counts"
  [filename]
  (->> (iota/seq filename)
       (r/filter identity)
       (r/map count-map)
       (r/fold add-maps)))

Discussion

The Iota library creates sequences from files on the local filesystem. Unlike the purely sequential lazy sequences produced from something like file-seq, the sequences returned by Iota are optimized for use with Clojure’s Reducers library, which uses the Java Fork/Join work-stealing framework[11] under the hood to provide efficient parallel processing.

The keyword-count function first creates a reducible sequence of lines in the file and filters out blank lines (using the identity function to eliminate nil values from the sequence). Then it applies the count-map function in parallel, and finally aggregates the results by folding with the add-maps function.

r/filter and r/map function exactly the same as their non-Reducer counterparts; the only difference is one of performance, and how the Reducers library is able to break down and combine operations. They also return reducible sequences that can be utilized efficiently by other operations from the Reducers library.

r/fold is the core function of the Reducers library, and in its basic form it is functionally very similar to the built-in reduce function. Given a function and a reducible collection, it returns a value that is the result of applying the folding function to each item in the collection and an accumulator value.

Unlike with normal reduce, however, there is no guaranteed execution order, which is why fold doesn’t take a single starting value as an argument. It wouldn’t make sense, given that the computation can “start” in several places at once, concurrently. This means that the function passed to fold (when passed a single function) must also be capable of taking zero arguments—the result of the no-arg invocation of the provided function will be used as the seed value for each branch of the computation.

If you need more flexibility than this provides, fold allows you to specify both a reduce function and a combine function, as separate arguments. Exactly what these do is inextricably tied to how Reducers themselves work, so a full explanation is beyond the scope of this recipe. See the API documentation for the fold function and the links on the Reducers page on Clojure’s website for more information.

About Reducers

Reducers is a parallel execution framework for extremely efficient parallel processing. A full explanation of how reducers work is beyond the scope of this recipe (see the blog post introducing reducers on the Clojure website for a comprehensive treatment).

In short, however, reducers provide performance by two means:

  1. They can compose operations. Wherever logically possible, the reducers framework will collapse composable operations into a single operation. For example, the preceding code performs a filter and then a map. Clojure’s standard filter and map would realize an intermediate sequence: filter would produce a sequence that would then be fed to map. The reducer versions, however, can compose themselves (if possible) to produce a single map+filter operation that can be applied in one shot.
  2. They exploit the internal tree-like data structures of the data being reduced. Regular sequences are inherently sequential (no surprise), and because their performant operation is to pull items from the beginning one at a time, it’s difficult to efficiently distribute work across their members. However, Reducers is aware of the internal structure of Clojure’s persistent data structures and can leverage that to efficiently distribute worker processes across the data.

Under the hood, Iota uses the Java NIO libraries to provide a memory-mapped view of the file being processed that provides efficient random access. Iota is also aware of the Reducers framework, and Iota sequences are structured in such a way that Reducers can effectively distribute worker processes across them.

4.14. Reading and Writing Clojure Data

Problem

You need to store and retrieve Clojure data structures on disk.

Solution

Use pr-str and spit to serialize small amounts of data:

(spit "data.clj" (pr-str [:a :b :c]))

Use read-string and slurp to read small amounts of data:

(read-string (slurp "data.clj"))
;; -> [:a :b :c]

Use pr to efficiently write large data structures to a stream:

(with-open [w (clojure.java.io/writer "data.clj")]
  (binding [*out* w]
    (pr large-data-structure)))

Use read to efficiently read large data structures from a stream:

(with-open [r (java.io.PushbackReader. (clojure.java.io/reader "data.clj"))]
  (binding [*read-eval* false]
    (read r)))

Discussion

The fact that code is data in Clojure and that you have runtime access to the same reader the language uses to load source code from files makes this a relatively simple task. However, while this is often a good way to persist data to disk, you should be aware of a few issues.

The simple case of slurp and spit becomes unusable when the data is very large, because it creates a very large string in memory all at once. For instance, serializing one million random numbers (created with rand) results in an 18 MB file and consumes much more memory than that while reading or writing:

(spit "data.clj" (pr-str (repeatedly 1e6 rand)))
;; -> OutOfMemoryError Java heap space ...

But, if you know you are only dealing with a small amount of data, this approach is perfectly suitable. It is a good way to load configuration data and other types of simple structures.

Reading and writing from streams is far more efficient because it buffers input and output, dealing with data a few bytes at a time.[12]

In addition to reading and writing a single data structure in a file, you can also append additional data structures to the same file and read them back as a sequence later:

(spit "data.clj" (prn-str [1 2 3]))
(spit "data.clj" (prn-str [:a :b :c]) :append true)
;; data.clj now contains two serialized structures

This is useful for appending small amounts of data to a file over time, such as for an event or transaction log.

However read-string will not suffice for reading multiple objects from a single string. To read a series of objects from a stream, you must continue to call read until it has reached the end:

(defn- read-one
  [r]
  (try
    (read r)
    (catch java.lang.RuntimeException e
      (if (= "EOF while reading" (.getMessage e))
        ::EOF
        (throw e)))))

(defn read-seq-from-file
  "Reads a sequence of top-level objects in file at path."
  [path]
  (with-open [r (java.io.PushbackReader. (clojure.java.io/reader path))]
    (binding [*read-eval* false]
      (doall (take-while #(not= ::EOF %) (repeatedly #(read-one r)))))))

4.15. Using edn for Configuration Files

Problem

You want to configure your application using Clojure-like data literals.

Solution

Use Clojure data structures stored in edn files to define a map that contains configuration items you care about.

For example, the edn configuration of an application that needs to know its own hostname and connection info for a relational database might look something like this:

{:hostname "localhost"
 :database {:host "my.db.server"
            :port 5432
            :name "my-app"
            :user "root"
            :password "s00p3rs3cr3t"}}

The basic function to read this data into a Clojure map is trivial using the edn reader:

(require '[clojure.edn :as edn])

(defn load-config
  "Given a filename, load & return a config file"
  [filename]
  (edn/read-string (slurp filename)))

Invoking the newly defined load-config function will now return a configuration map that you can pass around and use in your application as you would any other map.

Discussion

As can be seen from the preceding code, the basic process for obtaining a map containing configuration data is extremely trivial. A more interesting question is what to do with the config map once you have it, and there are two general schools of thought regarding the answer.

The first option prioritizes ease of development by making the configuration map ambiently available throughout the entire application. Usually this involves setting a global var to contain the configuration.

However, this is problematic for a number of reasons. First, it becomes more difficult to override the default configuration file in alternate contexts, such as tests, or when running two differently configured systems in the same JVM. (This can be worked around by using thread-local bindings, but this can lead to messy code fairly rapidly.)

More importantly, using a global configuration means that any function that reads the config (most functions, in a sizable application) cannot be pure. In Clojure, that is a lot to give up. One of the main benefits of pure Clojure code is its local transparency; the behavior of a function can be determined solely by looking at its arguments and its code. If every function reads a global variable, however, this becomes much more difficult.

The alternative is to explicitly pass around the config everywhere it is needed, like you would every other argument. Since a config file is usually supplied at application start, the config is usually established in the -main function and passed wherever else it is needed.

This sounds painful, and indeed it can be somewhat annoying to pass an extra argument to every function. Doing so, however, lends the code a large degree of self-documentation; it becomes extremely evident what parts of the application rely on the config and what parts do not. It also makes it more straightforward to modify the config at runtime or supply an alternative config in testing scenarios.

Using multiple config files

A common pattern when configuring an application is to have a number of different classes of configuration items. Some config fields are more or less constants, and don’t vary between instances of the application in the same environment. These are often committed to source control along with the application’s source code.

Other config items are fairly constant, but can’t be checked into source control due to security concerns. Examples of this include database passwords or secure API tokens, and ideally these are put into a separate config file. Still other configuration fields (such as IP addresses) will often be completely different for every instance of a deployed application, and the desire is to specify those separately from the more constant config fields.

A useful technique to handle this heterogeneity is to use multiple configuration files, each handling a different type of concern, and then merge them into a single configuration map before passing it on to the application. This typically uses a simple deep-merge function:

(defn deep-merge
  "Deep merge two maps"
  [& values]
  (if (every? map? values)
    (apply merge-with deep-merge values)
    (last values)))

This will merge two maps, merging values as well if they are all maps. If the values are not all maps, the second one “wins” and is used in the resulting map.

Then, you can rewrite the config loader to accept multiple config files, and merge them together:

(defn load-config
  [& filenames]
  (reduce deep-merge (map (comp edn/read-string slurp)
                          filenames)))

Using this approach on two separate edn config files, config-public.edn and config-private.edn, yields a merged map.

config-public.edn: 

{:hostname "localhost"
 :database {:host "my.db.server"
            :port 5432
            :name "my-app"
            :user "root"}}

config-private.edn: 

{:database {:password "s3cr3t"}}

(load-config "config-public.edn" "config-private.edn")
;; -> {:hostname "localhost", :database {:password "s3cr3t",
;;     :host "my.db.server", :port 5432, :name "my-app", :user "root"}}

Be aware that any values present in both configuration files will be overridden by the “rightmost” file passed to load-config.

Different configurations for different environments

If your system runs in multiple environments, you may want to vary your configuration based on the current running environment. For example, you may want to connect to a local database while developing your system, but a production database when running your system in production.

You can use Leiningen’s profiles feature to achieve this end. By providing different :resource-paths options for each profile in your project’s configuration, you can vary which configuration file is read per environment:[13]

(defproject my-great-app "0.1.0-SNAPSHOT"
  {;; ...
  :profiles {:dev {:resource-paths ["resources/dev"]}
             :prod {:resource-paths ["resources/prod"]}}})

With a project configuration similar to the previous one, you can then create two different configurations with the same base filename, resources/dev/config.edn and resources/prod/config.edn:

resource/dev/config.edn: 

{:database-host "localhost"}

resources/prod/config.edn: 

{:database-host "production.example.com"}

If you’re following along on your own, add the load-config function to one of your project’s namespaces:

(ns my-great-app.core
  (:require [clojure.edn :as edn]))

(defn load-config
    "Given a filename, load & return a config file"
    [filename]
    (edn/read-string  (slurp filename)))

Now, the configuration your application loads will depend on which profile your project is running in:

# "dev" is one of Leiningen's default profiles
$ lein repl
user=> (require '[my-great-app.core :refer [load-config]])
user=> (load-config (clojure.java.io/resource "config.edn"))
{:database-host "localhost"}
user=> (exit)

$ lein trampoline with-profile prod repl
user=> (require '[my-great-app.core :refer [load-config]])
user=> (load-config (clojure.java.io/resource "config.edn"))
{:database-host "production.example.com"}

4.16. Emitting Records as edn Values

Problem

You want to use Clojure records as edn values, but the edn format doesn’t support records.

Solution

You can use the tagged library to read and print records as edn tagged literal values.

Before starting, add [com.velisco/tagged "0.3.0"] to your project’s dependencies or start a REPL using lein-try:

$ lein try com.velisco/tagged

To extend Clojure’s built-in print-method multimethod to print a record in a “tagged” format, extend print-method for that record with the miner.tagged/pr-tagged-record-on helper function:

(require '[miner.tagged :as tag])

(defrecord SimpleRecord [a])

(def forty-two (->SimpleRecord 42))

(pr-str forty-two)
;; -> "#user.SimpleRecord{:a 42}" ;; Sadly, not a proper edn value

(defmethod print-method user.SimpleRecord [this w]
  (tag/pr-tagged-record-on this w))

(pr-str forty-two)
;; -> "#user/SimpleRecord {:a 42}"

At this point, you can round-trip your records between pr-str and miner.tagged/read-string using the edn tagged literal format:

(tag/read-string (pr-str forty-two))
;; -> #user/SimpleRecord {:a 42}

(= forty-two
   (tag/read-string (pr-str forty-two)))
;; -> true

The edn reader still doesn’t understand how to parse these tagged values, though. To enable this behavior, use miner.tagged/tagged-default-reader as the :default option when reading values with edn:

(require '[clojure.edn :as edn])

(edn/read-string {:default tag/tagged-default-reader}
                 (pr-str {:my-record forty-two}))
;; -> {:my-record #user/SimpleRecord {:a 42}}

Discussion

The edn format is great—it covers a useful subset of the Clojure data types and makes high-fidelity data transfer a breeze. Unfortunately, it doesn’t support records. This is easy enough to rectify, however; edn is an extensible format by name. We just need to provide tag-style printing (#tag <value>) and an appropriate reader. The tagged library makes both of these tasks quite easy.

As seen in the preceding samples, Clojure’s default printed value for records is close to, but not quite the tagged format edn expects.

Where Clojure prints "#user.SimpleRecord{:a 42}" for a SimpleRecord, what is really needed for edn is a tag-style string like ""#user/SimpleRecord {:a 42}". The miner.tagged/pr-tagged-record-on function understands how to write records in this format (to a java.io.Writer). By extending Clojure’s print-method multimethod with this function, you ensure Clojure always prints a record in a tagged format.

For reading these values back in, you need to tell the edn reader how to parse your new record tags. By design, the tagged library provides a miner.tagged/tagged-default-reader function that can be used to extend edn to read your record tags. When the edn reader can’t parse a tag, it attempts to use a function specified by its :default option to rehydrate tags. By providing tagged-default-reader as this :default option, you allow the edn reader to properly interpret your tagged record values.

4.17. Handling Unknown Tagged Literals When Reading Clojure Data

Problem

You want to read Clojure data (in an edn format) that may contain unknown tagged literals.

Solution

Use the :default option of either clojure.edn/read or clojure.edn/read-string:

(require 'clojure.edn)

(defrecord TaggedValue [tag value])

(defn read-preserving-unknown-tags [s]
  (clojure.edn/read-string {:default ->TaggedValue} s))

(read-preserving-unknown-tags "#my.example/unknown 42")
;; -> #user.TaggedValue{:tag my.example/unknown, :value 42}

Discussion

The edn format defines a print representation for a significant subset of Clojure data types and offers extensibility through tagged literals. The best way to read edn data is to use clojure.edn/read or clojure.edn/read-string. These functions consume edn-formatted data from a stream or string, respectively, and return hydrated Clojure data.

Both functions take an opts map, which allows you to control several options when reading. For tags you know about ahead of time, you can define custom readers by supplying a :readers map. This map can also be used to override the behavior of built-in types as defined by clojure.core/default-data-readers:

;; Creating a custom reader
(clojure.edn/read-string {:readers {'inc-this inc}}
                         "#inc-this 1")
;; -> 2

;; Overriding a built-in reader
;; Before..
(clojure.edn/read-string "#inst \"2013-06-08T01:00:00Z\"")
;; -> #inst "2013-06-08T01:00:00.000-00:00"

;; And after...
(clojure.edn/read-string {:readers {'inst str}}
                         "#inst \"2013-06-08T01:00:00Z\"")
;; -> "2013-06-08T01:00:00Z"

The :default option, as explored in the solution, is ideal for handling unknown tags. Whenever an unknown tag and value are encountered, the function you provide will be called with two arguments, the tag and its value.

When a :default is not provided to read, reading an unknown tag will throw a RuntimeException:

(clojure.edn/read-string "#blow-up boom")
;; -> RuntimeException No reader function for tag blow-up ...

For most applications, reading an unknown tag is an error, so an exception would be appropriate. However, it may sometimes be useful to preserve the “unknowns,” perhaps for another stage of processing.

It’s trivial to leverage the factory function defined by defrecord to capture the unknown reader literal. The order of the arguments for the factory of TaggedValue conveniently matches the specification of the :default data reader.

The TaggedValue record preserves the essential information for later use. Since all of the inbound information has been preserved, you can even print the value again in the original tagged literal format:

(defmethod print-method TaggedValue [this ^java.io.Writer w]
   (.write w "#")
   (print-method (:tag this) w)
   (.write w " ")
   (print-method (:value this) w))

;; Now, the TaggedValue will `pr` as the original tagged literal
(read-preserving-unknown-tags "#my.example/unknown 42")
;; -> #my.example/unknown 42

4.18. Reading Properties from a File

Problem

You need to read a property file and access its key/value pairs.

Solution

The most straightforward way is to use the built-in java.util.Properties class via Java interop. java.util.Properties implements java.util.Map, which can be easily consumed from Clojure, just like any other map.

Here is an example property file to load, fruitcolors.properties:

banana=yellow
grannysmith=green

Populating an instance of Properties from a file is straightforward, using its load method and passing in an instance of java.io.Reader obtained using the clojure.java.io namespace:

(require '[clojure.java.io :refer (reader)])

(def props (java.util.Properties.))

(.load props (reader "fruitcolors.properties"))
;; -> nil

props
;; -> {"banana" "yellow", "grannysmith" "green"}

Instead of using the built-in Properties API via interop, you could also use the propertea library for simpler, more idiomatic Clojure access to property files.

Include the [propertea "1.2.3"] dependency in your project.clj file, or start a REPL using lein-try:

$ lein try propertea 1.2.3

Then read the property file and access its key/value pairs:

(require '[propertea.core :refer (read-properties)])

(def props (read-properties "fruitcolors.properties"))

props
;; -> {:grannysmith "green", :banana "yellow"}

(props :banana)
;; -> "yellow"

Discussion

Although using java.util.Properties directly is more straightforward and doesn’t require the addition of a dependency, propertea does provide some convenience. It returns an actual immutable Clojure map, instead of just a java.util.Map. Although both are perfectly usable from Clojure, an immutable map is probably preferable if you intend to do any further manipulation or updates on it.

More importantly, propertea converts all string keys into keywords, which are more commonly used than strings as the keys of maps in Clojure.

Additionally, propertea has several other features, such as the capability to parse values into numbers or Booleans, and providing default values.

By default, propertea’s read-properties function treats all property values as strings. Consider the following property file with an integer and Boolean key:

intkey=42
booleankey=true

You can force these properties to be parsed into their respective types by supplying lists for the :parse-int and :parse-boolean options:

(def props (read-properties "other.properties"
                            :parse-int [:intkey]
                            :parse-boolean [:booleankey]))

(props :intkey)
;; -> 42

(class (props :intkey))
;; -> java.lang.Integer

(props :booleankey)
;; -> true

(class (props :booleankey))
;; -> java.lang.Boolean

Sometimes the property file might not contain a key/value pair, and you might want to set a reasonable default value in this case:

(def props (read-properties "other.properties" :default [:otherkey "awesome"]))

(props :otherkey)
;; -> "awesome"

You can also be strict on required properties. If an expected property is missing in your property file, you can throw an exception:

(def props (read-properties "other.properties" :required [:otherkey]))
;; -> java.lang.RuntimeException: (:otherkey) are required ...

4.19. Reading and Writing Binary Files

Problem

You need to read or write some binary data.

Solution

Use Java’s BufferedInputStream, BufferedOutputStream, and ByteBuffer classes to work directly with binary data.

Discussion

While reading and writing text files (e.g., via slurp and spit) is easy in pure Clojure, writing binary data requires a little more Java interop.

Clojure’s output-stream wraps the BufferedOutputStream Java object. BufferedOutputStream has a write method that accepts Java byte arrays. The following writes 1,000 zeros (bytes) to /tmp/zeros:

(require '[clojure.java.io :refer [file output-stream input-stream]])

(with-open [out (output-stream (file "/tmp/zeros"))]
  (.write out (byte-array 1000)))

To read the bytes in again, use the corresponding input-stream function, which wraps BufferedInputStream:

(with-open [in (input-stream (file "/tmp/zeros"))]
  (let [buf (byte-array 1000)
        n (.read in buf)]
    (println "Read" n "bytes.")))

;;=> Read 1000 bytes.

Writing zeros and reading in fixed-length blocks is obviously not very interesting. We want to prepare our byte array with some actual content. A common way to prepare byte arrays is to use a ByteBuffer, filling it with data from various types. Let’s assume we want to write “strings” in the following format:

  1. A version number (byte; 66 in our example)
  2. A string length (big-endian int)
  3. The bytes for the string (in this case, “hello world”)

The following function will “pack” the bytes into an array using an intermediate ByteBuffer:

(import '[java.nio ByteBuffer])

(defn prepare-string [strdata]
  (let [strlen (count strdata)
        version 66
        buflen (+ 1 4 (count strdata))
        bb (ByteBuffer/allocate buflen)
        buf (byte-array buflen)]
    (doto bb
      (.put (.byteValue version))
      (.putInt (.intValue strlen))
      (.put (.getBytes strdata))
      (.flip)         ;; Prepare bb for reading
      (.get buf))
    buf))

(prepare-string "hello world")
;;=> #<byte[] [B@5ccab0e8>
(into [] (prepare-string "hello world"))
;;=> [66 0 0 0 11 104 101 108 108 111 32 119 111 114 108 100]

Writing data in this format is then as simple as:

(with-open [out (output-stream "/tmp/mystring")]
  (.write out (prepare-string "hello world")))

To get the data back, ByteBuffer provides a way of unpacking multiple types out of a stream (array) of bytes:

(defn unpack-buf [n buf]
  (let [bb (ByteBuffer/allocate n)]
    (.put bb buf 0 n)                     ;; Fill ByteBuffer with array contents
    (.flip bb)                            ;; Prepare for reading
    (let [version (.get bb 0)]
      (.position bb 1)                    ;; Skip version byte
      (let [buflen (.getInt bb)
            strbytes (byte-array buflen)] ;; Prepare buffer to hold string
                                          ;; data...
        (.get bb strbytes)                ;; ... and read it.
        [version buflen (apply str (map char strbytes))]))))


(with-open [in (input-stream "/tmp/mystring")]
  (let [buf (byte-array 1024)
        n (.read in buf)]
    (unpack-buf n buf)))

;=> [66 11 "hello world"]

Note that for both writing and reading, the flip operation on the ByteBuffer resets the position to the beginning of the buffer to prepare it for reading and writing, respectively.

See Also

  • For more details on ByteBuffer, which plays a key role in Java’s NIO library, see the Java NIO documentation or Java NIO by Ron Hitchens (O’Reilly).
  • The Clojure library bytebuffer provides a thin, more idiomatic wrapper for ByteBuffer operations.
  • The more recent Buffy library provides a wrapper over the related Netty ByteBuffers.
  • Finally, the Gloss library provides a DSL for reading and writing binary streams of data (whether file-based or network-based).

4.20. Reading and Writing CSV Data

Problem

You need to read or write CSV data.

Solution

Use clojure.data.csv/read-csv to lazily read CSV data from a String or java.io.Reader:

(clojure.data.csv/read-csv "this,is\na,test" )
;; -> (["this" "is"] ["a" "test"])

(with-open [in-file (clojure.java.io/reader "in-file.csv")]
  (doall
    (clojure.data.csv/read-csv in-file)))
;; -> (["this" "is"] ["a" "test"])

Use clojure.data.csv/write-csv to write CSV data to a java.io.Writer:

(with-open [out-file (clojure.java.io/writer "out.csv")]
            (clojure.data.csv/write-csv out-file [["this" "is"] ["a" "test"]]))
;; -> nil

Discussion

The clojure.data.csv library makes it easy to work with CSV. You need to remember that read-csv is lazy; if you want to force it to read data immediately, you’ll need to wrap the call to read-csv in doall.

When reading, you can change the separator and quote delimiters, which default to \ and \", respectively. You must specify the delimiters using chars, not strings, though:

(csv/read-csv "this$-is $-\na$test" :separator \$ :quote \-)
;; -> (["this" "is $"] ["a" "test"])

When writing, as with read-csv, you can configure the separator, quote, and newline (between :lf (default) and :cr+lf), as well as the quote? predicate function, which takes a collection and returns true or false to indicate if the string representation needs to be quoted:

(with-open [out-file (clojure.java.io/writer "out.csv")]
            (clojure.data.csv/write-csv out-file [["this" "is"] ["a" "test"]]
                                        :separator \$ :quote \-))
;; -> nil

To capture CSV output as a string, use with-out-str and write to *out*:

(with-out-str (csv/write-csv *out* [["this" "is"] ["a" "test"]]))
;; -> "this,is\na,test\n"

See Also

4.21. Reading and Writing Compressed Files

Problem

You want to read or write a file compressed with gzip (i.e., a .gz file).

Solution

Wrap a normal input stream with java.util.zip.GZIPInputStream to get uncompressed data:

(with-open [in (java.util.zip.GZIPInputStream.
                (clojure.java.io/input-stream
                 "file.txt.gz"))]
  (slurp in))

Wrap a normal output stream with java.util.zip.GZIPOutputStream to compress data as it is written:

(with-open [w (-> "output.gz"
                  clojure.java.io/output-stream
                  java.util.zip.GZIPOutputStream.
                  clojure.java.io/writer)]
  (binding [*out* w]
    (println "This will be compressed on disk.")))

Discussion

gzip, based on the DEFLATE algorithm, is a common compression format on Unix-like systems and is used extensively for compression on the Web. It is a good choice for compressing text in particular and can result in huge reductions for source code, or Clojure or JSON data.

Many of Clojure’s I/O functions will accept any type of Java stream. The GZIPInputStream simply wraps any other input stream and attempts to decompress the original stream. The output variant behaves similarly.

By wrapping a normal input stream, as returned by clojure.java.io/input-stream, you can pass it to slurp or line-seq (or any other function that takes an input stream) and easily read the entire decompressed contents.

You can also leverage this technique to read a large compressed file line by line, or to read back Clojure forms written with pr or pr-str. You can also decompress data in a similar way from any other kind of stream; for example, one backed by a network socket or a byte array.

By binding an output stream to *out*, we can use println, pr, etc. to output small amounts of data at a time to the stream, which will be compressed on disk when the stream is closed.

A nearly identical approach can be used for writing data in the ZIP compression format, using the java.util.zip.ZipInputStream and java.util.zip.ZipOutputStream classes.

See Also

4.22. Working with XML Data

Problem

You need to read or write XML data.

Solution

Pass a file to clojure.xml/parse to get a Clojure map representing the structure of an XML file.

For example, to read the following file:

<simple>
  <item id="1">First</item>
  <item id="2">Second</item>
</simple>

use clojure.xml/parse:

(require '[clojure.xml :as xml])
(clojure.xml/parse (clojure.java.io/file "simple.xml"))
;; -> {:tag :simple, :attrs nil, :content [
;;    {:tag :item, :attrs {:id "1"}, :content ["First"]}
;;    {:tag :item, :attrs {:id "2"}, :content ["Second"]}]}

If you want to read an XML file as a sequence of nodes, pass the XML map to the xml-seq function from the clojure.core namespace:

(xml/xml-seq (clojure.xml/parse (clojure.java.io/file "simple.xml")))

xml-seq returns a tree sequence of nodes; that is, a sequence of each node, starting at the root and then doing a depth-first walk of the rest of the document.

To write an XML file, pass an XML structure map to clojure.xml/emit. emit spits the XML to the currently bound output stream (*out*), so to write to a file, either bind *out* to the file’s output stream or capture the output stream to a string with the with-out-str macro, which you can then spit to a file:

(spit "test.xml" (with-out-str (clojure.xml/emit simple-xml-map)))

Discussion

You can work with your XML data just as you would with any other map. Here is an example of a function that, given an id and a file, will parse the file for nodes with an attribute id that is equal to the argument:

(defn get-with-id [id xml-file]
  (for [node (xml-seq (clojure.xml/parse xml-file))
        :when (= (get-in node [:attrs :id]) id)]
    (:content node)))

(get-with-id "2" simple-xml)

;; -> (["Second"])

To modify XML, just use the normal map manipulation functions on the Clojure data representation.

If you are going to work a lot with your XML structure, you might consider using a zipper. A zipper is a purely functional data structure useful for navigating and modifying tree-like structures (such as XML) in a convenient and efficient way.

Zippers are a deep topic, and a full discussion is beyond the scope of this recipe, but see the documentation for the clojure.data.zip library for explanation and examples of how to use them effectively with XML.

See Also

4.23. Reading and Writing JSON Data

Problem

You need to read or write JSON data.

Solution

Use the clojure.data.json/read-str function to read a string of JSON as Clojure data:

(require '[clojure.data.json :as json])

(json/read-str "[{\"name\":\"Stefan\",\"age\":32}]")
;; -> [{"name" "Stefan", "age" 32}]

To write data back to JSON, use the clojure.data.json/write-str function with the original Clojure data:

(json/write-str [{"name" "Stefan", "age" 32}])
;; -> "[{\"name\":\"Stefan\",\"age\":32}]"

Discussion

Beyond reading and writing strings, clojure.data.json also provides the read and write functions to work with java.io.Reader and java.io.Writer objects, respectively. With the exception of their reader/writer parameters, these two functions share the same parameters and options as their string brethren:

(with-open [writer (clojure.java.io/writer "foo.json")]
  (json/write [{:foo "bar"}] writer))

(with-open [reader (clojure.java.io/reader "foo.json")]
  (json/read reader))
;; -> [{"foo" "bar"}]

By virtue of JavaScript’s simpler types, JSON notation has a much lower fidelity than Clojure data. As such, you may find you want to tweak the way keys or values are interpreted.

One common example of this is converting JSON’s string-only keys to proper Clojure keywords. You can apply a function to each processed key by using the :key-fn option:

;; Modifying keys on read

(json/read-str "{\"name\": \"Stefan\"}")
;; -> {"name" "Stefan"}

(json/read-str "{\"name\": \"Stefan\"}" :key-fn keyword)
;; -> {:name "Stefan"}

;; Modifying keys on write

(json/write-str {:name "Stefan"})
;; -> "{\"name\":\"Stefan\"}"

(json/write-str {:name "Stefan"} :key-fn str)
;; -> "{\":name\":\"Stefan\"}" ; Note the extra \:

You may also want to control how values are interpreted. Use the :value-fn option to specify how values are read/written. The function you provide will be invoked with two arguments, a key and its value:

;; Properly read UUID values
(defn str->uuid [key value]
  (if (= key :uuid)
    (java.util.UUID/fromString value)
    value))

(clojure.data.json/read-str
  "{\"name\": \"Stefan\", \"uuid\": \"51674ca0-eadc-4a5b-b9fb-67b05d5a71b7\"}"
  :key-fn keyword
  :value-fn str->uuid)
;; -> {:name "Stefan", :uuid #uuid "51674ca0-eadc-4a5b-b9fb-67b05d5a71b7"}

;; And similarly, write UUID values
(defn uuid->str [key value]
  (if (= key :uuid)
    (str value)
    value))

(clojure.data.json/write-str
  {:name "Stefan", :uuid #uuid "51674ca0-eadc-4a5b-b9fb-67b05d5a71b7"}
        :value-fn uuid->str)
;; -> "{\"name\":\"Stefan\",\"uuid\":\"51674ca0-eadc-4a5b-b9fb-67b05d5a71b7\"}"

As you may have inferred, when you provide both a :key-fn and a :value-fn, the value function will always be called after the key function.

It might go without saying, but the :key-fn and :value-fn options can also be used with the write and read functions.

See Also

4.24. Generating PDF Files

Problem

You need to generate a PDF from some data.

For example, you have a sequence of maps, such as those returned by a clojure.java.jdbc query, and you need to generate a PDF report.

Solution

Use the clj-pdf library to create the report.

Before starting, add [clj-pdf "1.11.6"] to your project’s dependencies or start a REPL using lein-try:

$ lein try clj-pdf

For the purpose of illustration, imagine we want to render a vector containing the following employee records:

(def employees
 [{:country "Germany",
   :place "Nuremberg",
   :occupation "Engineer",
   :name "Neil Chetty"}
  {:country "Germany",
   :place "Ulm",
   :occupation "Engineer",
   :name "Vera Ellison"}])

Create a template for rendering each record using the clj-pdf.core/template macro:

(require '[clj-pdf.core :as pdf])

(def employee-template
 (pdf/template
   [:paragraph
    [:heading (.toUpperCase $name)]
    [:chunk {:style :bold} "occupation: "] $occupation "\n"
    [:chunk {:style :bold} "place: "] $place "\n"
    [:chunk {:style :bold} "country: "] $country
    [:spacer]]))

(employee-template employees)
;; -> ([:paragraph [:heading "NEIL CHETTY"]
;;      [:chunk {:style :bold} "occupation: "] "Engineer" "\n"
;;      [:chunk {:style :bold} "place: "] "Nuremberg" "\n"
;;      [:chunk {:style :bold} "country: "] "Germany" [:spacer]]
;;     [:paragraph [:heading "VERA ELLISON"]
;;      [:chunk {:style :bold} "occupation: "] "Engineer" "\n"
;;      [:chunk {:style :bold} "place: "] "Ulm" "\n"
;;      [:chunk {:style :bold} "country: "] "Germany"
;;      [:spacer]])

Use clj-pdf.core/pdf to create the PDF using the template and data from above:

(pdf/pdf [{:title "Employee Table"}
          (employee-template employees)]
         "employees.pdf")

You’ll find an employees.pdf file in the directory where you ran your project/REPL—it looks something like Figure 4-1.

Contents of the employees.pdf file
Figure 4-1. employees.pdf

Discussion

The clj-pdf library is built on top of the iText and JFreeChart libraries. The templating syntax is inspired by the popular Hiccup HTML templating engine.

In a template, $ is used to indicate places where dynamic content will be substituted. When populating a template from a map, each substitution anchor ($name) is populated with the value of the corresponding keyword key in the map (the value of the :name key).

Beyond substituting simple values, it is also possible to perform further processing on those values. The :heading portion of the employee-template does precisely this by calling (.toUpperCase $name). In clj-pdf, a document is represented by a vector containing a map of metadata followed by the content. The content can in turn consist of strings, vectors, or collections of vectors.

A very simple PDF: 

(pdf/pdf [{:title "Hello World"} "Hello, World."] "hello-world.pdf")

Under the hood, collections of content are automatically expanded:

;; This *collection* of paragraphs...
(pdf [{} [[:paragraph "foo"] [:paragraph "bar"]]] "document.pdf")

;; is equivalent to these *individual* paragraphs
(pdf [{} [:paragraph "foo"] [:paragraph "bar"]] "document.pdf")

Apart from plain strings, each content element is represented as a vector. The first element of this vector is a keyword type, and everything that follows is the content itself. Some types clj-pdf includes are :paragraph, :phrase, :list, and :table:

[:heading "Lorem Ipsum"]
[:line]
[:list "first item"
      "second item"
      "third item"]
[:paragraph "I'm a paragraph"]
[:phrase "some text here"]
[:table
   ["foo" "bar" "baz"]
   ["foo1" "bar1" "baz1"]
   ["foo2" "bar2" "baz2"]]

Some elements accept optional styling metadata. You can provide this style information as a map immediately following the type parameter (the second item in the vector):

[:paragraph {:style :bold} "this text is bold"]

[:chunk {:style :bold
         :size 18
         :family :helvetica
         :color [0 234 123]}
 "some large green text"]

The contents of an element can consist of other elements (like an HTML document), and any style applied to a parent element will be inherited by the child elements:

[:paragraph "some content"]

[:paragraph {:style :bold}
 "Some bold text"
 [:phrase [:chunk "even more"] "bold text"]]

As with Cascading Style Sheets (CSS), child elements can augment or override their parents’ styles by specifying their own styles:

[:paragraph
 {:style :bold}
 "Bold words"
 [:phrase {:color [0 255 221]} "Bold AND teal!"]]

Images can be embedded in the document using the :image element. Image content can be one of java.net.URL, java.awt.Image, a byte array, a Base64 string, or a string representing a URL or a file:

[:image "my-image.jpg"]
[:image "http://clojure.org/space/showimage/clojure-icon.gif"]

Images larger than the page margins will automatically be scaled to fit.

See Also

  • For more information on using clj-pdf, including a complete list of element types and charting capabilities, see the clj-pdf GitHub repository

4.25. Making a GUI Window with Scrollable Text

Problem

You want to create and display a GUI window.

Solution

Though Java’s Swing library is the most common way to make Java GUIs (at least on the desktop), the Seesaw library, which wraps Swing and provides a more idiomatic and functional interface, is the best tool for creating GUIs with Clojure.

To follow along with this recipe, start a REPL using lein-try:

$ lein try seesaw

Swing implements a “programmable look and feel”: the appearance of various widgets and their behavior can be modified, though it is common to set this to match the platform one is on, for the sake of maximum usability. Setting the native look and feel is accomplished in Seesaw with the native! function:

(require '[seesaw.core :refer [native! frame show! config!
                               pack! text scrollable]])

(native!)
;; -> nil

To create your window object, use frame (which, under the covers, makes a JFrame Swing object):

(frame :title "Lyrical Clojure" :content "Hello World")
;; -> #<JFrame$Tag$a79ba523 seesaw.core.proxy$javax.swing.JFrame$Tag$a79ba523
;;    [frame0,0,22,0x0,invalid,hidden,layout=java.awt.BorderLayout,
;;    title=Lyrical Clojure,resizable,normal,
;;    defaultCloseOperation=HIDE_ON_CLOSE,
;;    rootPane=javax.swing.JRootPane[,0,0,0x0,invalid,
;;    layout=javax.swing.JRootPane$RootLayout,
;;    alignmentX=0.0,alignmentY=0.0,border=,flags=16777673,maximumSize=,
;;    minimumSize=,preferredSize=],rootPaneCheckingEnabled=true]>

Although a frame has been created, nothing appears. In order to actually display the frame (as seen in Figure 4-2), use show!:

(def f (frame :title "Lyrical Clojure"))

(show! f)
;; -> #<JFrame$Tag$a79ba523 [...]>
A simple window
Figure 4-2. A simple window

Discussion

Having created the window, you can set its size, add content, and add scroll bars, as follows.

Adding content

You can change properties of the frame using config!:

(config! f :content "Actual content!")
;; -> #<JFrame$Tag$a79ba523 [...]>

The result is shown in Figure 4-3.

A window with basic content
Figure 4-3. A window with basic content

Sizing the window

You can specify the size of the window at the time of creation:

(def f (frame :title "Lyrical Clojure" :width 300 :height 150))
;; -> #<JFrame$Tag$a79ba523 [...]>

However, it is common to instead call pack! on the resulting frame object; this assigns width and height properties according to its content:

(-> f pack! show!)
;; -> #<JFrame$Tag$a79ba523 [...]>

Adding scrollable content

Now add some text, in the form of an excerpt from the sonnets of Shakespeare, to your window:

(def sonnet-text (->> "http://www.gutenberg.org/cache/epub/1041/pg1041.txt"
                      slurp
                      (drop 20000)
                      (take 4000)
                      (apply str)))

This content is too big to fit in the current window (see Figure 4-4):

(config! f :content sonnet-text)
;; -> #<JFrame$Tag$a79ba523 [...]>
A window with more text than space
Figure 4-4. A window with more text than space

Normally, one would call pack! again to adjust the window size to the new content. However, the content will not fit comfortably on most screens, so set the size explicitly and add scroll bars, as seen in Figure 4-5:

(.setSize f 400 400)
(config! f :content (scrollable (text :multi-line? true
                                      :text sonnet-text
                                      :editable? false)))
A larger window with a scroll bar
Figure 4-5. A larger window with a scroll bar

The :multi-line? option to the text function selects JTextArea as the underlying object, rather than JTextField (JTextArea is used for multiline text; JTextField is for single-line text fields). :editable? specifies that you don’t want to allow users to edit the text (since it is, perhaps, doubtful that they would improve upon Shakespeare’s original).

Like most of the Seesaw functions that create widgets, there are several more options to text, which are best learned about by studying the API documentation.

As is always the case in Clojure, the Seesaw library functions return Java objects, which can be operated upon directly using Java methods; for example, our use of the .setSize method of the JFrame object returned by frame. This interoperability provides great power but comes at the cost of a somewhat higher burden on programmers, who must navigate not only the Seesaw API but, frequently, some aspects of the underlying Swing API as well.

Seesaw supports a wide variety of GUI tasks—creation of menus, display of text and images, scroll bars, radio buttons, checkboxes, multipaned windows, drag-and-drop, and much more. In addition to the dozen or so books that have been written about Swing, one could easily write an entire book on Seesaw. This recipe merely serves as a starting point for further investigation of the Seesaw library.

See Also



[11] For more information, see the Java tutorial on Fork/Join and work stealing.

[12] See Recipe 4.9, “Reading and Writing Text Files”, for notes on managing streams.

[13] To follow along, create your own project with lein new my-great-app.

[14] This is actually a feature—they’re functions used by the language to, well, execute code.

[15] The Clojure mailing list thread “ANN: NEVER use clojure.core/read or read-string for reading untrusted data” talks more about the vulnerabilities with clojure.core readers.

Get Clojure Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.