Repurposing Data Sources

All language is but a poor translation.

–Franz Kafka

Sometimes, data lives in formats that take extra work to ingest. For common and explicitly data-oriented formats, common libraries already have readers built into them. Data frame libraries, for example, read a huge number of different file types. At worst, slightly less common formats have their own more specialized libraries that provide a relatively straightforward path between the original format and the general purpose data processing library you wish to use.

A greater difficulty often arises because a given format is not per se a data format, but exists for a different purpose. Nonetheless, often there is data somehow embedded or encoded in the format that ...

Get Cleaning Data for Effective Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.