Chapter 4. Maps

In this chapter, you will work with maps (not to be confused with the map function, though you can use map on a map). Also, the études are designed to run on the server side with Node.js®, so you may want to see how to set that up in Appendix D.

Étude 4-1: Condiments

If you spend some time going through open datasets such as those from data.gov, you will find some fairly, shall we say, esoteric data. Among them is MyPyramid Food Raw Data from the Food and Nutrition Service of the United States Department of Agriculture.

One of the files is Foods_Needing_Condiments_Table.xml, which gives a list of foods and condiments that go with them. Here is what part of the file looks like, indented and edited to eliminate unnecessary elements, and placed in a file named test.xml:

<Foods_Needing_Condiments_Table>
  <Foods_Needing_Condiments_Row>
    <Survey_Food_Code>51208000</Survey_Food_Code>
    <display_name>100% Whole Wheat Bagel</display_name>
    <cond_1_name>Butter</cond_1_name>
    <cond_2_name>Tub margarine</cond_2_name>
    <cond_3_name>Reduced calorie spread (margarine type)</cond_3_name>
    <cond_4_name>Cream cheese (regular)</cond_4_name>
    <cond_5_name>Low fat cream cheese</cond_5_name>
  </Foods_Needing_Condiments_Row>
  <Foods_Needing_Condiments_Row>
    <Survey_Food_Code>58100100</Survey_Food_Code>
    <display_name>"Beef burrito (no beans):"</display_name>
    <cond_1_name>Sour cream</cond_1_name>
    <cond_2_name>Guacamole</cond_2_name>
    <cond_3_name>Salsa</cond_3_name>
  </Foods_Needing_Condiments_Row>
  <Foods_Needing_Condiments_Row>
    <Survey_Food_Code>58104740</Survey_Food_Code>
    <display_name>Chicken & cheese quesadilla:</display_name>
    <cond_1_name>Sour cream</cond_1_name>
    <cond_2_name>Guacamole</cond_2_name>
    <cond_3_name>Salsa</cond_3_name>
  </Foods_Needing_Condiments_Row>
</Foods_Needing_Condiments_Table>

Your task, in this étude, is to take this XML file and build a ClojureScript map whose keys are the condiments and whose values are vectors of foods that go with those condiments. Thus, for the sample file, if you run the program from the command line, the output would be this map (formatted and quotemarked for ease of reading):

[etudes@localhost nodetest]$ node condiments.js test.xml
{"Butter" ["100% Whole Wheat Bagel"],
"Tub margarine" ["100% Whole Wheat Bagel"],
"Reduced calorie spread (margarine type)" ["100% Whole Wheat Bagel"],
"Cream cheese (regular)" ["100% Whole Wheat Bagel"],
"Low fat cream cheese" ["100% Whole Wheat Bagel"],
"Sour cream" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"],
"Guacamole" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"],
"Salsa" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"]}

Parsing XML

How do you parse XML using Node.js? Install the node-xml-lite module:

[etudes@localhost ~]$ npm install node-xml-lite
npm http GET https://registry.npmjs.org/node-xml-lite
npm http 304 https://registry.npmjs.org/node-xml-lite
npm http GET https://registry.npmjs.org/iconv-lite
npm http 304 https://registry.npmjs.org/iconv-lite
node-xml-lite@0.0.3 node_modules/node-xml-lite
└── iconv-lite@0.4.8

Bring the XML parsing module into your core.cljs file:

(def xml (js/require "node-xml-lite"))

The following code will parse an XML file and return a JavaScript object:

(.parseFileSync xml "test.xml")

And here is the JavaScript object that it produces:

  {:name "Foods_Needing_Condiments_Table", :childs [
    {:name "Foods_Needing_Condiments_Row", :childs [
      {:name "Survey_Food_Code", :childs ["51208000"]}
      {:name "display_name", :childs ["100% Whole Wheat Bagel"]}
      {:name "cond_1_name", :childs ["Butter"]}
      {:name "cond_2_name", :childs ["Tub margarine"]}
      {:name "cond_3_name", :childs ["Reduced calorie spread (margarine type)"]}
      {:name "cond_4_name", :childs ["Cream cheese (regular)"]}
      {:name "cond_5_name", :childs ["Low fat cream cheese"]}
    ]}
    {:name "Foods_Needing_Condiments_Row", :childs [
      {:name "Survey_Food_Code", :childs ["58100100"]}
      {:name "display_name", :childs ["Beef burrito (no beans):"]}
      {:name "cond_1_name", :childs ["Sour cream"]}
      {:name "cond_2_name", :childs ["Guacamole"]}
      {:name "cond_3_name", :childs ["Salsa"]}
    ]}
    {:name "Foods_Needing_Condiments_Row", :childs [
      {:name "Survey_Food_Code", :childs ["58104740"]}
      {:name "display_name", :childs ["Chicken & cheese quesadilla:"]}
      {:name "cond_1_name", :childs ["Sour cream"]}
      {:name "cond_2_name", :childs ["Guacamole"]}
      {:name "cond_3_name", :childs ["Salsa"]}
    ]}
  ]}

Command-line Arguments

While you can hardcode the XML file name into your program, it makes the program less flexible. It would be much nicer if (as in the description of the étude) you could specify the file name to process on the command line.

To get command-line arguments, use the arg property of the global js/process variable. Element 0 is "node", element 1 is the name of the JavaScript file, and element 2 is where your command line arguments begin. Thus, you can get the file name with:

(nth (.-argv js/process) 2)

Mutually Recursive Functions

In my solution, I created two separate functions: the process-children function iterates through all the childs, calling the process-child function for each of them. However, a child element could itself have children, so process-child had to be able to call process-children. The term for this sort of situtation is that you have mutually recursive functions. Here’s the problem: ClojureScript requires you to define a function before you can use it, so you would think that you can’t have mutually recursive functions. Luckily, the inventor of Clojure foresaw this sort of situation and created the declare form, which lets you declare a symbol that you will define later. Thus, I was able to write code like this:

(declare process-child)
  
(defn process-children [...]
   (process-child ...))

(defn process-child [...]
   (process-children ...))

Just because I used mutually recursive functions to solve the problem doesn’t mean you have to. If you can find a way to do it with a single recursive function, go for it. I was following the philosophy of “the first way you think of doing it that works is the right way.”

There’s a lot of explanation in this étude, and you are probably thinking this is going to be a huge program. It sure seemed that way to me while I was writing it, but it turned out that was mostly because I was doing lots of tests in the REPL and looking things up in documentation. When I looked at the resulting program, it was only 45 lines. Here it is: “Solution 4-1”.

Étude 4-2: Condiment Server

Now that you have the map from the previous étude, what can you do with it? Well, how many times have you been staring at that jar of mustard and asking yourself “What food would go well with this?” This étude will cure that indecision once and for all. You will write a server using Express, which, as the website says, is a “minimalist web framework for Node.js.” This article about using ClojureScript and Express was very helpful when I was first learning about the subject; I strongly suggest you read it.

Let’s set up a simple server that you can use as a basis for this étude. The server presents a form with an input field for the user’s name. When the user clicks the submit button, the data is submitted back to the server and it echoes back the form and a message: “Pleased to meet you, username.”

Setting Up Express

You will need to do the following:

  1. Add [express "4.11.1"] to the :node-dependencies in your project.clj file.
  2. Add [cljs.nodejs :as nodejs] to the (:require...) clause of the namespace declaration at the beginning of core.cljs.
  3. Add (def express (nodejs/require "express")) in your core.cljs file
  4. Make your main function look like this:

    (defn -main []
      (let [app (express)]
        (.get app "/" generate-page!)
        (.listen app 3000
                 (fn []
                   (println "Server started on port 3000")))))

    This starts a server on port 3000, and when it receives a get request, calls the generate-page! function. (You can also set up the server to accept post requests and route them to other URLs than the server root, but that is beyond the scope of this book.)

Generating HTML from ClojureScript

To generate the HTML dynamically, you will use the html function of the hiccups library. The function takes as its argument a vector that has a keyword as an element name, an optional map of attributes and values, and the element content. Here are some examples:

HTMLHiccup
<h1>Heading</h1> (html [:h1 “Heading"])
<p id="intro">test</p> (html [:p {:id “intro"} test])
<p>Click to <a href="page2.html">go to page two</a>.</p> (html [:p “Click to " [:a {:href “page2.html"} “go to page two"] “."])

You add [hiccups "0.3.0"] to your project.clj dependencies and modify your core.cljs file to require hiccups:

(ns servertest.core
  (:require-macros [hiccups.core :as hiccups])
  (:require [cljs.nodejs :as nodejs]
            [hiccups.runtime :as hiccupsrt]))

You are now ready to write the generate-page! function, which has two parameters: the HTTP request that the server received, and the HTTP response that you will send back to the client. The property (.-query request) is a JavaScript object with the form names as its properties. Consider a form entry like this:

<input type="text" name="userName"/>

You would access the value via (.-userName (.-query request)).

The generate-page! function creates the HTML page as a string to send back to the client; you send it back by calling (.send response html-string). The HTML page will contain a form whose action URL is the server root (/). The form will have an input area for the user name and a submit button. This will be followed by a paragraph that has the text “Pleased to meet you, username.” (or an empty paragraph if there’s no username). You can either figure out this code on your own or see a suggested solution. I’m giving you the code here because the purpose of this étude is to process the condiment map in the web page context rather than setting up the web page in the first place. (Of course, I strongly encourage you to figure it out on your own; you will learn a lot—I certainly did!)

Putting the Étude Together

Your program will use the previous étude’s code to build the map of condiments and compatible foods from the XML file. Then use the same framework that was developed in “Generating HTML from ClojureScript”, with the generated page containing:

  • A form with a <select> menu that gives the condiment names (the keys of the map). You may want to add an entry with the text “Choose a condiment” at the beginning of the menu to indicate “no choice yet.” When you create the menu, remember to select the selected="selected" attribute for the current menu choice.
  • A submit button for the form.
  • An unordered list that gives the matching foods for that condiment (the value from the map), or an empty list if no condiment has been chosen.

Your code should alphabetize the condiment names and compatible foods. Some of the foods begin with capital letters, others with lowercase. You will want to do a case-insensitive form. (Hint: use the form of sort that takes a comparison function.)

See a suggested solution: “Solution 4-2B”. To make the program easier to read, I put the code for creating the map into a separate file with its own namespace.

Étude 4-3: Maps—Frequency Table

This étude uses an excerpt of the Montgomery County, Maryland (USA) traffic violation database, which you may find at this URL. I have taken only the violations for July 2014, removed several of the columns of the data, and put the result into a tab-separated value file named traffic_july_2014_edited.csv, which you may find in the GitHub repository. (Yes, I know CSV should be comma-separated, but using the Tab key makes life much easier.)

Here are the column headings:

  • Date of Stop, in format mm/dd/yyyy
  • Time of Stop, in format hh:mm:ss
  • Description
  • Accident (Yes/No)
  • Personal Injury (Yes/No)
  • Property Damage (Yes/No)
  • Fatal (Yes/No)
  • State (two-letter abbreviation)
  • Vehicle Type
  • Year
  • Make
  • Model
  • Color
  • Violation Type (Warning/Citation/ESERO [Electronic Safety Equipment Repair Order])
  • Charge (Maryland Government traffic code section)
  • Race
  • Gender
  • Driver’s State (two-letter abbreviation)
  • Driver’s License State (two-letter abbreviation)

As you can see, you have a treasure trove of data here. For example, one reason I chose July is that I was interested in seeing if the number of traffic violations was greater around the July 4 holiday (in the United States) than during the rest of the month.

If you look at the data, you will notice the “Make” (vehicle manufacturer) column would need some cleaning up to be truly useful. For example, there are entries such as TOYOTA, TOYT, TOYO, and TOUOTA. Various other creative spellings and abbreviations abound in that column. Also, the Scion is listed as both a make and a model. Go figure.

In this étude, you are going to write a Node.js project named frequency. It will contain a function that reads the CSV file and creates a data structure (I suggest a vector of maps) for each row. For example:

[{:date "07/31/2014", :time "22:08:00" ... :gender "F", :driver-state "MD"},
  {:date "07/31/2014", :time "21:27:00" ... :gender "F", :driver-state "MD"}, 
   ...]

Hints:

  • For the map, define a vector of heading keywords, such as:

    (def headings [:date :time ... :gender :driver-state])

    If there are columns you don’t want or need in the map, enter nil in the vector.

  • Use zipmap to make it easy to construct a map for each row. You will have to get rid of the nil entry; dissoc is your friend here.

You will then write a function named frequency-table with two parameters:

  1. The data structure from the CSV file
  2. A column specifier

You can take advantage of ClojureScript’s higher-order functions here. The specifier is a function that takes one entry (a “row”) in the data structure and returns a value. So, if you wanted a frequency table to figure out how many violations there are in each hour of the day, you would write code like this:

(defn hour [csv-row]
  (.substr (csv-row :time) 0 2))

(defn frequency-table [all-data col-spec]
  ;; your code here
)
  
;; now you do a call like this:
(frequency-table traffic-data hour)

Note that, because keyword access to maps works like a function, you could get the frequency of genders by doing this call:

(frequency-table traffic-data :gender)

The return value from frequency-table will be a vector that consists of:

  • A vector of labels (the values from the specified column), sorted
  • A vector giving the frequency counts for each label
  • The total count

The return value from the call for gender looks like this: [["F" "M" "U"] [6732 12776 7] 19515]. Hint: build a map whose keys are labels and whose values are their frequency, then use seq.

Some frequency tables that might be interesting include the color of car (which colors are most likely to have a violation?) and the year of car manufacture (are older cars more likely to have a violation?). To be sure, there are other factors at work here. Car colors are not equally common, and there are fewer cars on the road that were manufactured in 1987 than were made last year. This étude is meant to teach you to use maps, not to make rigorous, research-ready hypotheses.

Reading the CSV File

Reading a file one line at a time from Node.js is a nontrivial matter. Luckily for you and me, Jonathan Boston (Twitter/GitHub: bostonou), author of the ClojureScript Made Easy blog, posted a wonderful solution just days before I wrote this étude. He has kindly given me permission to use the code, which you can get at this GitHub gist. Follow the instructions in the gist, and separate the Clojure and ClojureScript code. Your src directory will look like this:

src
├── cljs_made_easy
│   ├── line_seq.clj
│   └── line_seq.cljs
└── traffic
    └── core.cljs

Inside the core.cljs file, you will have these requirements:

(ns traffic.core
  (:require [cljs.nodejs :as nodejs]
            [clojure.string :as str]
            [cljs-made-easy.line-seq :as cme]))
 
(def filesystem (js/require "fs")) ;;require nodejs lib

You can then read a file like this, using with-open and line-seq very much as they are used in Clojure. In the following code, the call to .openSync has three arguments: the filesystem defined earlier, the filename, and the file mode, with "r" for reading:

(defn example [filename]
  (cme/with-open [file-descriptor (.openSync filesystem filename "r")]
             (println (cme/line-seq file-descriptor))))

Note: you may want to use a smaller version of the file for testing. The code repository contains a file named small_sample.csv with 14 entries.

See a suggested solution: “Solution 4-3”.

Étude 4-4: Complex Maps—Cross-Tabulation

Add to the previous étude by writing a function named cross-tab; it creates frequency cross-tabluations. It has these parameters:

  • The data structure from the CSV file
  • A row specifier
  • A column specifier

Again, the row and column specifiers are functions. So, if you wanted a cross-tabulation with hour of day as the rows and gender as the columns, you might write code like this:

(defn hour [csv-row]
  (.substr (csv-row :time) 0 2))
  
(defn cross-tab [all-data row-spec col-spec]
  ;; your code here
  )
  
;; now you do a call like this:
(crosstab traffic-data hour :gender)

The return value from cross-tab will be a vector that consists of:

  • A vector of row labels, sorted
  • A vector of column labels, sorted
  • A vector of vectors that gives the frequency counts for each row and column
  • A vector of row totals
  • A vector of column totals

The previous search on the full data set returns this result, reformatted to avoid excessively long lines:

(cross-tab traffic-data hour :gender)
[["00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12"
"13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23"] ["F" "M" "U"]
[[335 719 0] [165 590 0] [141 380 0] [96 249 0] [73 201 0] [63 119 0]
[129 214 2] [380 625 0] [564 743 1] [481 704 0] [439 713 1] [331 527 0]
[243 456 0] [280 525 0] [344 515 0] [276 407 0] [307 514 1] [317 553 0]
[237 434 1] [181 461 0] [204 553 1] [289 657 0] [424 961 0] [433 956 0]]
[1054 755 521 345 274 182 345 1005 1308 1185 1153 858 699 805 859 683
822 870 672 642 758 946 1385 1389] [6732 12776 7] 19515]

Here are some of the cross-tabulations that might be interesting:

  • Day by hour: the marginal totals will tell you which days and hours have the most violations. Are the days around July 4, 2014 (a US holiday) more active than other days? Which hours are the most and least active?
  • Gender by color of vehicle: (although the driver might not be the person who purchased the car).
  • Driver’s state by property damage: are out-of-state drivers more likely to damage property than in-state drivers?

Bonus points: write the code such that if you give cross-tab a nil for the column specifier, it will still work, returning only the totals for the row specifier. Then, re-implement frequency-table by calling cross-tab with nil for the column specifier. Hint: you will have to take the vector of vectors for the “cross-tabulation” totals and make it a simple vector. Either map or flatten will be useful here.

See a suggested solution: “Solution 4-4”.

Étude 4-5: Cross-Tabulation Server

Well, as you can see, the output from the previous étude is ugly to the point of being nearly unreadable. This rather open-ended étude aims to fix that. Your mission, should you decide to accept it, is to set up the code in an Express server to deliver the results in a nice, readable HTML table. Here are some of the things I found out while coming up with a solution, a screenshot of which appears in Figure 4-1:

Screenshot showing traffic
Figure 4-1. Screenshot of traffic cross-tabulation table
  • I wanted to use as much of the code from “Étude 4-2: Condiment Server” as possible, so I decided on drop-down menus to choose the fields. However, a map was not a good choice for generating the menu. In the condiment server, it made sense to alphabetize the keys of the food map. In this étude, the field names are listed by conceptual groups; it doesn’t make sense to alphabetize them, and the keys of a map are inherently unordered. Thus, I ended up making a vector of vectors.

  • I used map-indexed to create the option menu such that each option has a numeric value. However, when the server reads the value from the request, it gets a string, and 5 is not equal to "5". The fix was easy, but I lost a few minutes figuring out why my selected item wasn’t coming up when I came back from a request.

  • The source file felt like it was getting too big, so I put the cross-tabulation code into a separate file named crosstab.cljs in the src/traffic directory.

  • I wanted to include a CSS file, so I put the specification in the header of the hiccups code. However, to make it work, I had to tell Express how to serve static files, using "." for the root directory in:

    (.use app (.static express "path/to/root/directory"))
  • Having the REPL is really great for testing.

  • I finished the program late at night. Again, “the first way you think of doing it that works is the right way,” but I am unhappy with the solution. I would really like to unify the cases of one-dimensional and two-dimensional tables, and there seems to be a dreadful amount of unnecessary duplication. To paraphrase Don Marquis, my solution “isn’t moral, but it might be expedient.”

See a suggested solution (which I put in a project named traffic): “Solution 4-5”.

Get Etudes for ClojureScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.