Book description
Unlock the internet’s treasure trove of public interest data with Hacks, Leaks, and Revelations by Micah Lee, an investigative reporter and security engineer. This hands-on guide blends real-world techniques for researching large datasets with lessons on coding, data authentication, and digital security. All of this is spiced up with gripping stories from the front lines of investigative journalism.
Dive into exposed datasets from a wide array of sources: the FBI, the DHS, police intelligence agencies, extremist groups like the Oath Keepers, and even a Russian ransomware gang. Lee’s own in-depth case studies on disinformation-peddling pandemic profiteers and neo-Nazi chatrooms serve as blueprints for your research.
Gain practical skills in searching massive troves of data for keywords like “antifa” and pinpointing documents with newsworthy revelations. Get a crash course in Python to automate the analysis of millions of files.
You will also learn how to:
- Master encrypted messaging to safely communicate with whistleblowers.
- Secure datasets over encrypted channels using Signal, Tor Browser, OnionShare, and SecureDrop.
- Harvest data from the BlueLeaks collection of internal memos, financial records, and more from over 200 state, local, and federal agencies.
- Probe leaked email archives about offshore detention centers and the Heritage Foundation.
- Analyze metadata from videos of the January 6 attack on the US Capitol, sourced from the Parler social network.
We live in an age where hacking and whistleblowing can unearth secrets that alter history. Hacks, Leaks, and Revelations is your toolkit for uncovering new stories and hidden truths. Crack open your laptop, plug in a hard drive, and get ready to change history.
Publisher resources
Table of contents
- Praise for Hacks, Leaks, and Revelations
- Title Page
- Copyright
- Dedication
- About the Author and Technical Reviewer
- Acknowledgments
- Introduction
-
Part I: Sources and Datasets
-
1. Protecting Sources and Yourself
- Safely Communicating with Sources
- Secure Storage for Datasets
- Authenticating Datasets
- Redaction
- Making Requests for Comment
- Password Managers
- Disk Encryption
- Exercise 1-1: Encrypt Your Internal Disk
- Exercise 1-2: Encrypt a USB Disk
- Protecting Yourself from Malicious Documents
- Exercise 1-3: Install and Use Dangerzone
- Summary
-
2. Acquiring Datasets
- The End of WikiLeaks
- Distributed Denial of Secrets
- Downloading Datasets with BitTorrent
- The Origins of BlueLeaks
- Exercise 2-1: Download the BlueLeaks Dataset
- Communicating with Encrypted Messaging Apps
- Exercise 2-2: Install and Practice Using Signal
- Encrypting Messages with PGP
- Staying Anonymous Online with Tor and OnionShare
- Exercise 2-3: Play with Tor and OnionShare
- Communicating with My Tea Party Patriots Source
- Other Options for Acquiring Datasets from Sources
- Whistleblower Submission Systems
- Summary
-
1. Protecting Sources and Yourself
-
Part II: Tools of the Trade
-
3. The Command Line Interface
- Introducing the Command Line
- Exercise 3-1: Install Ubuntu in Windows
- Basic Command Line Usage
- Tips for Navigating the Terminal
- Installing and Uninstalling Software with Package Managers
- Exercise 3-2: Manage Packages with Homebrew on macOS
- Exercise 3-3: Manage Packages with apt on Windows or Linux
- Exercise 3-4: Practice Using the Command Line with cURL
- Text Files vs. Binary Files
- Exercise 3-5: Install the VS Code Text Editor
- Exercise 3-6: Write Your First Shell Script
- Exercise 3-7: Clone the Book’s GitHub Repository
- Summary
-
4. Exploring Datasets in the Terminal
- Introducing for Loops
- Exercise 4-1: Unzip the BlueLeaks Dataset
- How the Hacker Obtained the BlueLeaks Data
- Exercise 4-2: Explore BlueLeaks on the Command Line
- Exercise 4-3: Find Revelations in BlueLeaks with grep
- Encrypted Data in the BlueLeaks Dataset
- Data Analysis with Servers in the Cloud
- Exercise 4-4: Set Up a VPS
- Exercise 4-5: Explore the Oath Keepers Dataset Remotely
- Summary
-
5. Docker, Aleph, and Making Datasets Searchable
- Introducing Docker and Linux Containers
- Exercise 5-1: Initialize Docker Desktop on Windows and macOS
- Exercise 5-2: Initialize Docker Engine on Linux
- Running Containers with Docker
- Exercise 5-3: Run a WordPress Site with Docker Compose
- Introducing Aleph
- Exercise 5-4: Run Aleph Locally in Linux Containers
- Using Aleph’s Web and Command Line Interfaces
- Indexing Data in Aleph
- Exercise 5-5: Index a BlueLeaks Folder in Aleph
- Explore BlueLeaks with Aleph
- Additional Aleph Features
- Dedicated Aleph Servers
- Summary
-
6. Reading Other People’s Email
- The Email Protocol and Message Structure
- File Formats for Email Dumps
- Exercise 6-1: Download Email Dumps from Three Datasets
- Researching Email Dumps with Thunderbird
- Exercise 6-2: Configure Thunderbird for Email Dumps
- Reading Individual EML Files with Thunderbird
- Exercise 6-3: Import the Nauru Police Force EML Email Dump
- Searching Email in Thunderbird
- Exercise 6-4: Import the Oath Keepers MBOX Email Dump
- Exercise 6-5: Import the Heritage Foundation PST Email Dump
- Other Tools for Researching Email Dumps
- Summary
-
3. The Command Line Interface
-
Part III: Python Programming
- 7. An Introduction to Python
-
8. Working with Data in Python
- Modules
- Python Script Template
- Exercise 8-1: Traverse the Files in BlueLeaks
- Traverse Folders with os.walk()
- Exercise 8-2: Find the Largest Files in BlueLeaks
- Third-Party Modules
- Exercise 8-3: Practice Command Line Arguments with Click
- Avoiding Hardcoding with Command Line Arguments
- Exercise 8-4: Find the Largest Files in Any Dataset
- Navigating Dictionaries and Lists in the Conti Chat Logs
- Exercise 8-5: Map Out the CSVs in BlueLeaks
- Reading and Writing Files
- Exercise 8-6: Practice Reading and Writing Files
- Summary
-
Part IV: Structured Data
-
9. Blueleaks, Black Lives Matter, and the CSV File Format
- Installing Spreadsheet Software
- Introducing the CSV File Format
- Exploring CSV Files with Spreadsheet Software and Text Editors
- My BlueLeaks Investigation
- Reading and Writing CSV Files in Python
- Exercise 9-1: Make BlueLeaks CSVs More Readable
- How to Read Bulk Email from Fusion Centers
- A Brief HTML Primer
- Exercise 9-2: Make Bulk Email Readable
- Discovering the Names and URLs of BlueLeaks Sites
- Exercise 9-3: Make a CSV of BlueLeaks Sites
- Summary
- 10. Blueleaks Explorer
-
11. Parler, the January 6 Insurrection, and the JSON File format
- The Origins of the Parler Dataset
- Exercise 11-1: Download and Extract Parler Video Metadata
- The JSON File Format
- Tools for Exploring JSON Data
- Exercise 11-2: Write a Script to Filter for Videos with GPS from January 6, 2021
- Working with GPS Coordinates
- Exercise 11-3: Update the Script to Filter for Insurrection Videos
- Plotting GPS Coordinates on a Map with simplekml
- Exercise 11-4: Create KML Files to Visualize Location Data
- Visualizing Location Data with Google Earth
- Viewing Metadata with ExifTool
- Summary
-
12. Epik Fail, Extremism Research, and SQL Databases
- The Structure of SQL Databases
- Exercise 12-1: Create and Test a MySQL Server Using Docker and Adminer
- Exercise 12-2: Query Your SQL Database
- Introducing the MySQL Command Line Client
- Exercise 12-3: Install and Test the Command Line MySQL Client
- MySQL-Specific Queries
- The History of Epik
- Exercise 12-4: Download and Extract Part of the Epik Dataset
- Exercise 12-5: Import Epik Data into MySQL
- Exploring Epik’s SQL Database
- Working with Epik Data in the Cloud
- Summary
-
9. Blueleaks, Black Lives Matter, and the CSV File Format
-
Part V: Case Studies
- 13. Pandemic Profiteers and Covid-19 Disinformation
- 14. Neo-Nazis and their Chatrooms
- Afterword
- A. Solutions to Common WSL Problems
- B. Scraping the Web
- Index
Product information
- Title: Hacks, Leaks, and Revelations
- Author(s):
- Release date: January 2024
- Publisher(s): No Starch Press
- ISBN: 9781718503120
You might also like
book
Hacking: The Art of Exploitation, 2nd Edition
Hacking is the art of creative problem solving, whether that means finding an unconventional solution to …
book
Evasive Malware
We’re all aware of Stuxnet, ShadowHammer, Sunburst, and similar attacks that use evasion to remain hidden …
book
The Hardware Hacking Handbook
Embedded devices are chip-size microcomputers small enough to be included in the structure of the object …
book
Adversary Emulation with MITRE ATT&CK
By incorporating cyber threat intelligence, adversary emulation provides a form of cybersecurity assessment that mimics advanced …