Summary

In this chapter, we discussed a large number of elements of data analysis. We've looked at how we have to disentangle physical format from logical layout and conceptual content. We covered the gzip module as an example of how we can handle one particularly complex physical format issue.

We focused a lot of attention on using the re module to write regular expressions that help us parse complex text files. This addresses a number of logical layout considerations. Once we've parsed the text, we can then do data conversions to create proper Python objects so that we have useful conceptual content.

We also saw how we can use a collections.Counter object to summarize data. This helps us find the most common items, or create complete histograms ...

Get Python for Secret Agents - Volume II now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.