Preparing Data

You might think that an ML engineer spends her time dreaming up and training sophisticated algorithms. Just like programming, however, the job comes with a less glamorous and more time-consuming side. In the case of ML, that grindwork usually involves preparing data.

If you’re not convinced that preparing data is a big time sink, think of the effort that went into MNIST. Somebody had to collect and scan 60,000 handwritten digits. They probably hand-checked all those digits to remove the examples that were not representative of real-life digits, maybe because they were too garbled. They also had to center, crop, and scale those images to the same resolution, taking care to avoid graphical artifacts such as jagged edges. I’d wager ...

