Working with Stack Exchange data dumps

The Stack Exchange network also provides complete dumps of their data, available for download through the Internet Archive (https://archive.org/details/stackexchange). The data is available in 7Z, a compressed data format with a high-compression ratio (http://www.7-zip.org). In order to read and extract this format, the 7-zip utility for Windows, or one of its ports for Linux/Unix and macOS, must be downloaded.

At the time of writing, the data dumps for Stack Overflow are provided as separate compressed files, with each file representing an entity or table in their dataset. For example, the stackoverflow.com-Posts.7z file contains the dump for the Posts table (that is, questions and answers). The size of the ...

Get Mastering Social Media Mining with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.