September 2015
Beginner to intermediate
608 pages
13h 43m
English
To calculate the Euclidean distance, let's first create a vector from our dictionary and document. This will allow us to easily compare the term frequencies between documents because they will occupy the same index of the vector.
(defn term-id [dict term]
(get-in @dict [:terms term]))
(defn term-frequencies [dict terms]
(->> (map #(term-id dict %) terms)
(remove nil?)
(frequencies)))
(defn map->vector [dictionary id-counts]
(let [zeros (vec (replicate (:count @dictionary) 0))]
(-> (reduce #(apply assoc! %1 %2) (transient zeros) id-counts)
(persistent!))))
(defn tf-vector [dict document]
(map->vector dict (term-frequencies dict document)))The term-frequencies function creates a map of term ID to frequency count for ...
Read now
Unlock full access