August 2019
Beginner
482 pages
12h 56m
English
NumPy is a library that's used for fast numeric computation and serves as a foundation for Python's scientific ecosystem. It's also the backbone for SciPy and Pandas. Since we have slow, numeric code, NumPy is a great place to start with your optimization attempts.
The algorithm is mostly written in NumPy already—we couldn't perform a true closest-N search in pandas since it doesn't support multidimensional indexing. However, there is one low-hanging fruit: our naive model uses argsort to pick the N closest records, which does sort the whole dataset. We don't need sorting, even for those N closest ones—let alone any other element. Here, we can swap the np.argsort method with np.argpartition. This function does ...