Chapter 8. What’s in a file?
This chapter covers
- File formats
- Extracting content from files
- How file storage impacts data extraction
By now, your Tika-fu is strong, and you’re feeling like there’s not much that you can’t do with your favorite tool for file detection, metadata extraction, and language identification. Believe it or not, there’s plenty more to learn!
One thing we’ve purposefully stayed away from is telling you what’s in those files that Tika makes sense of.[1] That’s because files are a source of rich information, recording not only text or metadata, but also things like detailed descriptions of scenery, such as a bright image of a soccer ball on a grass field; waveforms representing music recorded in stereo sound; all ...
Get Tika in Action now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.