August 2019
Beginner
482 pages
12h 56m
English
Another (arguably the best one, in general) way to make things more performant is to make use of the right data structures and algorithms—in other words, we need to design our code better and use the right tools for the job in the first place. In our case, any spatial query, especially for a large dataset, will gain from the use of a spatial index. Essentially, this creates a hierarchical index, based on the spatial distribution itself. It allows it to measure the distances within a small subset of records. Let's try to make use of it in our model:
from scipy.spatial import cKDTreeclass kdNearestNeighbor: _kd = None y = None def __init__(self, N=3): self.N=N def fit(self, X, y): self._kd = cKDTree(X, ...