O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Network Security Through Data Analysis, 2nd Edition

Book Description

Traditional intrusion detection and logfile analysis are no longer enough to protect today’s complex networks. In the updated second edition of this practical guide, security researcher Michael Collins shows InfoSec personnel the latest techniques and tools for collecting and analyzing network traffic datasets. You’ll understand how your network is used, and what actions are necessary to harden and defend the systems within it.

In three sections, this book examines the process of collecting and organizing data, various tools for analysis, and several different analytic scenarios and techniques. New chapters focus on active monitoring and traffic manipulation, insider threat detection, data mining, regression and machine learning, and other topics.

You’ll learn how to:

  • Use sensors to collect network, service, host, and active domain data
  • Work with the SiLK toolset, Python, and other tools and techniques for manipulating data you collect
  • Detect unusual phenomena through exploratory data analysis (EDA), using visualization and mathematical techniques
  • Analyze text data, traffic behavior, and communications mistakes
  • Identify significant structures in your network with graph analysis
  • Examine insider threat data and acquire threat intelligence
  • Map your network and identify significant hosts within it
  • Work with operations to develop defenses and analysis techniques

Table of Contents

  1. Preface
    1. Audience
    2. Contents of This Book
    3. Changes Between Editions
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Safari
    7. How to Contact Us
    8. Acknowledgments
  2. I. Data
  3. 1. Organizing Data: Vantage, Domain, Action, and Validity
    1. Domain
    2. Vantage
      1. Choosing Vantage
    3. Actions: What a Sensor Does with Data
    4. Validity and Action
      1. Internal Validity
      2. External Validity
      3. Construct Validity
      4. Statistical Validity
      5. Attacker and Attack Issues
    5. Further Reading
  4. 2. Vantage: Understanding Sensor Placement in Networks
    1. The Basics of Network Layering
      1. Network Layers and Vantage
    2. Network Layers and Addressing
      1. MAC Addresses
      2. IPv4 Format and Addresses
      3. IPv6 Format and Addresses
      4. Validity Challenges from Middlebox Network Data
    3. Further Reading
  5. 3. Sensors in the Network Domain
    1. Packet and Frame Formats
      1. Rolling Buffers
      2. Limiting the Data Captured from Each Packet
      3. Filtering Specific Types of Packets
      4. What If It’s Not Ethernet?
    2. NetFlow
      1. NetFlow v5 Formats and Fields
      2. NetFlow Generation and Collection
    3. Data Collection via IDS
      1. Classifying IDSs
      2. IDS as Classifier
    4. Improving IDS Performance
      1. Enhancing IDS Detection
      2. Configuring Snort
      3. Enhancing IDS Response
      4. Prefetching Data
    5. Middlebox Logs and Their Impact
      1. VPN Logs
      2. Proxy Logs
      3. NAT Logs
    6. Further Reading
  6. 4. Data in the Service Domain
    1. What and Why
    2. Logfiles as the Basis for Service Data
    3. Accessing and Manipulating Logfiles
    4. The Contents of Logfiles
      1. The Characteristics of a Good Log Message
      2. Existing Logfiles and How to Manipulate Them
      3. Stateful Logfiles
    5. Further Reading
  7. 5. Sensors in the Service Domain
    1. Representative Logfile Formats
      1. HTTP: CLF and ELF
    2. Simple Mail Transfer Protocol (SMTP)
      1. Sendmail
      2. Microsoft Exchange: Message Tracking Logs
    3. Additional Useful Logfiles
      1. Staged Logging
      2. LDAP and Directory Services
      3. File Transfer, Storage, and Databases
    4. Logfile Transport: Transfers, Syslog, and Message Queues
      1. Transfer and Logfile Rotation
      2. Syslog
    5. Further Reading
  8. 6. Data and Sensors in the Host Domain
    1. A Host: From the Network’s View
    2. The Network Interfaces
    3. The Host: Tracking Identity
    4. Processes
      1. Structure
    5. Filesystem
    6. Historical Data: Commands and Logins
    7. Other Data and Sensors: HIPS and AV
    8. Further Reading
  9. 7. Data and Sensors in the Active Domain
    1. Discovery, Assessment, and Maintenance
    2. Discovery: ping, traceroute, netcat, and Half of nmap
      1. Checking Connectivity: Using ping to Connect to an Address
      2. Tracerouting
      3. Using nc as a Swiss Army Multitool
      4. nmap Scanning for Discovery
    3. Assessment: nmap, a Bunch of Clients, and a Lot of Repositories
      1. Basic Assessment with nmap
    4. Using Active Vantage Data for Verification
    5. Further Reading
  10. II. Tools
  11. 8. Getting Data in One Place
    1. High-Level Architecture
      1. The Sensor Network
      2. The Repository
      3. Query Processing
      4. Real-Time Processing
      5. Source Control
    2. Log Data and the CRUD Paradigm
    3. A Brief Introduction to NoSQL Systems
    4. Further Reading
  12. 9. The SiLK Suite
    1. What Is SiLK and How Does It Work?
    2. Acquiring and Installing SiLK
      1. The Datafiles
    3. Choosing and Formatting Output Field Manipulation: rwcut
    4. Basic Field Manipulation: rwfilter
      1. Ports and Protocols
      2. Size
      3. IP Addresses
      4. Time
      5. TCP Options
      6. Helper Options
      7. Miscellaneous Filtering Options and Some Hacks
    5. rwfileinfo and Provenance
    6. Combining Information Flows: rwcount
    7. rwset and IP Sets
    8. rwuniq
    9. rwbag
    10. Advanced SiLK Facilities
      1. PMAPs
    11. Collecting SiLK Data
      1. YAF
      2. rwptoflow
      3. rwtuc
      4. rwrandomizeip
    12. Further Reading
  13. 10. Reference and Lookup: Tools for Figuring Out Who Someone Is
    1. MAC and Hardware Addresses
    2. IP Addressing
      1. IPv4 Addresses, Their Structure, and Significant Addresses
      2. IPv6 Addresses, Their Structure, and Significant Addresses
      3. IP Intelligence: Geolocation and Demographics
    3. DNS
      1. DNS Name Structure
      2. Forward DNS Querying Using dig
      3. The DNS Reverse Lookup
      4. Using whois to Find Ownership
      5. DNS Blackhole Lists
    4. Search Engines
      1. General Search Engines
      2. Scanning Repositories, Shodan et al
    5. Further Reading
  14. III. Analytics
    1. An Overview of Attacker Behavior
    2. Further Reading
  15. 11. Exploratory Data Analysis and Visualization
    1. The Goal of EDA: Applying Analysis
    2. EDA Workflow
    3. Variables and Visualization
    4. Univariate Visualization
      1. Histograms
      2. Bar Plots (Not Pie Charts)
      3. The Five-Number Summary and the Boxplot
      4. Generating a Boxplot
    5. Bivariate Description
      1. Scatterplots
    6. Multivariate Visualization
      1. Other Visualizations and Their Role
      2. Operationalizing Security Visualization
    7. Fitting and Estimation
      1. Is It Normal?
      2. Simply Visualizing: Projected Values and QQ Plots
      3. Fit Tests: K-S and S-W
    8. Further Reading
  16. 12. On Analyzing Text
    1. Text Encoding
      1. Unicode, UTF, and ASCII
      2. Encoding for Attackers
    2. Basic Skills
      1. Finding a String
      2. Manipulating Delimiters
      3. Splitting Along Delimiters
      4. Regular Expressions
    3. Techniques for Text Analysis
      1. N-Gram Analysis
      2. Jaccard Distance
      3. Hamming Distance
      4. Levenshtein Distance
      5. Entropy and Compressibility
      6. Homoglyphs
    4. Further Reading
  17. 13. On Fumbling
    1. Fumbling: Misconfiguration, Automation, and Scanning
      1. Lookup Failures
      2. Automation
      3. Scanning
    2. Identifying Fumbling
      1. IP Fumbling: Dark Addresses and Spread
      2. TCP Fumbling: Failed Sessions
      3. ICMP Messages and Fumbling
    3. Fumbling at the Service Level
      1. HTTP Fumbling
      2. SMTP Fumbling
      3. DNS Fumbling
    4. Detecting and Analyzing Fumbling
      1. Building Fumbling Alarms
      2. Forensic Analysis of Fumbling
      3. Engineering a Network to Take Advantage of Fumbling
  18. 14. On Volume and Time
    1. The Workday and Its Impact on Network Traffic Volume
    2. Beaconing
    3. File Transfers/Raiding
    4. Locality
      1. DDoS, Flash Crowds, and Resource Exhaustion
      2. DDoS and Routing Infrastructure
    5. Applying Volume and Locality Analysis
      1. Data Selection
      2. Using Volume as an Alarm
      3. Using Beaconing as an Alarm
      4. Using Locality as an Alarm
      5. Engineering Solutions
    6. Further Reading
  19. 15. On Graphs
    1. Graph Attributes: What Is a Graph?
    2. Labeling, Weight, and Paths
    3. Components and Connectivity
    4. Clustering Coefficient
    5. Analyzing Graphs
      1. Using Component Analysis as an Alarm
      2. Using Centrality Analysis for Forensics
      3. Using Breadth-First Searches Forensically
      4. Using Centrality Analysis for Engineering
    6. Further Reading
  20. 16. On Insider Threat
    1. Insider Threat Versus Other Classes of Attacks
    2. Avoiding Toxicity
    3. Modes of Attack
      1. Data Theft and Exfiltration
      2. Credential Theft
      3. Sabotage
    4. Insider Threat Data: Logistics and Collection
      1. Applying Sector-Based Workflow to Insider Threat
      2. Physical Data Sources
      3. Keeping Track of User Identity
    5. Further Reading
  21. 17. On Threat Intelligence
    1. Defining Threat Intelligence
      1. Data Types
    2. Creating a Threat Intelligence Program
      1. Identifying Goals
      2. Starting with Free Sources
      3. Determining Data Output
      4. Purchasing Sources
    3. Brief Remarks on Creating Threat Intelligence
    4. Further Reading
  22. 18. Application Identification
    1. Mechanisms for Application Identification
      1. Port Number
      2. Application Identification by Banner Grabbing
      3. Application Identification by Behavior
      4. Application Identification by Subsidiary Site
    2. Application Banners: Identifying and Classifying
      1. Non-Web Banners
      2. Web Client Banners: The User-Agent String
    3. Further Reading
  23. 19. On Network Mapping
    1. Creating an Initial Network Inventory and Map
      1. Creating an Inventory: Data, Coverage, and Files
      2. Phase I: The First Three Questions
      3. Phase II: Examining the IP Space
      4. Phase III: Identifying Blind and Confusing Traffic
      5. Phase IV: Identifying Clients and Servers
      6. Identifying Sensing and Blocking Infrastructure
    2. Updating the Inventory: Toward Continuous Audit
    3. Further Reading
  24. 20. On Working with Ops
    1. Ops Environments: An Overview
    2. Operational Workflows
      1. Escalation Workflow
      2. Sector Workflow
      3. Hunting Workflow
      4. Hardening Workflow
      5. Forensic Workflow
      6. Switching Workflows
    3. Further Readings
  25. 21. Conclusions
  26. Index