book

Network Security Through Data Analysis, 2nd Edition

Name: Network Security Through Data Analysis, 2nd Edition
Author: Michael Collins
ISBN: 9781491962794

by Michael Collins

September 2017

Beginner to intermediate

428 pages

11h 40m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
AudienceContents of This BookChanges Between EditionsConventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
I. Data
1. Organizing Data: Vantage, Domain, Action, and Validity
DomainVantageChoosing VantageActions: What a Sensor Does with DataValidity and ActionInternal ValidityExternal ValidityConstruct ValidityStatistical ValidityAttacker and Attack IssuesFurther Reading
2. Vantage: Understanding Sensor Placement in Networks
The Basics of Network LayeringNetwork Layers and VantageNetwork Layers and AddressingMAC AddressesIPv4 Format and AddressesIPv6 Format and AddressesValidity Challenges from Middlebox Network DataFurther Reading
3. Sensors in the Network Domain
Packet and Frame FormatsRolling BuffersLimiting the Data Captured from Each PacketFiltering Specific Types of PacketsWhat If It’s Not Ethernet?NetFlowNetFlow v5 Formats and FieldsNetFlow Generation and CollectionData Collection via IDSClassifying IDSsIDS as ClassifierImproving IDS PerformanceEnhancing IDS DetectionConfiguring SnortEnhancing IDS ResponsePrefetching DataMiddlebox Logs and Their ImpactVPN LogsProxy LogsNAT LogsFurther Reading
4. Data in the Service Domain
What and WhyLogfiles as the Basis for Service DataAccessing and Manipulating LogfilesThe Contents of LogfilesThe Characteristics of a Good Log MessageExisting Logfiles and How to Manipulate ThemStateful LogfilesFurther Reading
5. Sensors in the Service Domain
Representative Logfile FormatsHTTP: CLF and ELFSimple Mail Transfer Protocol (SMTP)SendmailMicrosoft Exchange: Message Tracking LogsAdditional Useful LogfilesStaged LoggingLDAP and Directory ServicesFile Transfer, Storage, and DatabasesLogfile Transport: Transfers, Syslog, and Message QueuesTransfer and Logfile RotationSyslogFurther Reading
6. Data and Sensors in the Host Domain
A Host: From the Network’s ViewThe Network InterfacesThe Host: Tracking IdentityProcessesStructureFilesystemHistorical Data: Commands and LoginsOther Data and Sensors: HIPS and AVFurther Reading
7. Data and Sensors in the Active Domain
Discovery, Assessment, and MaintenanceDiscovery: ping, traceroute, netcat, and Half of nmapChecking Connectivity: Using ping to Connect to an AddressTraceroutingUsing nc as a Swiss Army Multitoolnmap Scanning for DiscoveryAssessment: nmap, a Bunch of Clients, and a Lot of RepositoriesBasic Assessment with nmapUsing Active Vantage Data for VerificationFurther Reading
II. Tools

8. Getting Data in One Place
High-Level ArchitectureThe Sensor NetworkThe RepositoryQuery ProcessingReal-Time ProcessingSource ControlLog Data and the CRUD ParadigmA Brief Introduction to NoSQL SystemsFurther Reading
9. The SiLK Suite
What Is SiLK and How Does It Work?Acquiring and Installing SiLKThe DatafilesChoosing and Formatting Output Field Manipulation: rwcutBasic Field Manipulation: rwfilterPorts and ProtocolsSizeIP AddressesTimeTCP OptionsHelper OptionsMiscellaneous Filtering Options and Some Hacksrwfileinfo and ProvenanceCombining Information Flows: rwcountrwset and IP SetsrwuniqrwbagAdvanced SiLK FacilitiesPMAPsCollecting SiLK DataYAFrwptoflowrwtucrwrandomizeipFurther Reading
10. Reference and Lookup: Tools for Figuring Out Who Someone Is
MAC and Hardware AddressesIP AddressingIPv4 Addresses, Their Structure, and Significant AddressesIPv6 Addresses, Their Structure, and Significant AddressesIP Intelligence: Geolocation and DemographicsDNSDNS Name StructureForward DNS Querying Using digThe DNS Reverse LookupUsing whois to Find OwnershipDNS Blackhole ListsSearch EnginesGeneral Search EnginesScanning Repositories, Shodan et alFurther Reading
III. Analytics
An Overview of Attacker BehaviorFurther Reading
11. Exploratory Data Analysis and Visualization
The Goal of EDA: Applying AnalysisEDA WorkflowVariables and VisualizationUnivariate VisualizationHistogramsBar Plots (Not Pie Charts)The Five-Number Summary and the BoxplotGenerating a BoxplotBivariate DescriptionScatterplotsMultivariate VisualizationOther Visualizations and Their RoleOperationalizing Security VisualizationFitting and EstimationIs It Normal?Simply Visualizing: Projected Values and QQ PlotsFit Tests: K-S and S-WFurther Reading
12. On Analyzing Text
Text EncodingUnicode, UTF, and ASCIIEncoding for AttackersBasic SkillsFinding a StringManipulating DelimitersSplitting Along DelimitersRegular ExpressionsTechniques for Text AnalysisN-Gram AnalysisJaccard DistanceHamming DistanceLevenshtein DistanceEntropy and CompressibilityHomoglyphsFurther Reading
13. On Fumbling
Fumbling: Misconfiguration, Automation, and ScanningLookup FailuresAutomationScanningIdentifying FumblingIP Fumbling: Dark Addresses and SpreadTCP Fumbling: Failed SessionsICMP Messages and FumblingFumbling at the Service LevelHTTP FumblingSMTP FumblingDNS FumblingDetecting and Analyzing FumblingBuilding Fumbling AlarmsForensic Analysis of FumblingEngineering a Network to Take Advantage of Fumbling
14. On Volume and Time
The Workday and Its Impact on Network Traffic VolumeBeaconingFile Transfers/RaidingLocalityDDoS, Flash Crowds, and Resource ExhaustionDDoS and Routing InfrastructureApplying Volume and Locality AnalysisData SelectionUsing Volume as an AlarmUsing Beaconing as an AlarmUsing Locality as an AlarmEngineering SolutionsFurther Reading
15. On Graphs
Graph Attributes: What Is a Graph?Labeling, Weight, and PathsComponents and ConnectivityClustering CoefficientAnalyzing GraphsUsing Component Analysis as an AlarmUsing Centrality Analysis for ForensicsUsing Breadth-First Searches ForensicallyUsing Centrality Analysis for EngineeringFurther Reading
16. On Insider Threat
Insider Threat Versus Other Classes of AttacksAvoiding ToxicityModes of AttackData Theft and ExfiltrationCredential TheftSabotageInsider Threat Data: Logistics and CollectionApplying Sector-Based Workflow to Insider ThreatPhysical Data SourcesKeeping Track of User IdentityFurther Reading
17. On Threat Intelligence
Defining Threat IntelligenceData TypesCreating a Threat Intelligence ProgramIdentifying GoalsStarting with Free SourcesDetermining Data OutputPurchasing SourcesBrief Remarks on Creating Threat IntelligenceFurther Reading
18. Application Identification
Mechanisms for Application IdentificationPort NumberApplication Identification by Banner GrabbingApplication Identification by BehaviorApplication Identification by Subsidiary SiteApplication Banners: Identifying and ClassifyingNon-Web BannersWeb Client Banners: The User-Agent StringFurther Reading
19. On Network Mapping
Creating an Initial Network Inventory and MapCreating an Inventory: Data, Coverage, and FilesPhase I: The First Three QuestionsPhase II: Examining the IP SpacePhase III: Identifying Blind and Confusing TrafficPhase IV: Identifying Clients and ServersIdentifying Sensing and Blocking InfrastructureUpdating the Inventory: Toward Continuous AuditFurther Reading
20. On Working with Ops
Ops Environments: An OverviewOperational WorkflowsEscalation WorkflowSector WorkflowHunting WorkflowHardening WorkflowForensic WorkflowSwitching WorkflowsFurther Readings
21. Conclusions
Index

Content preview from Network Security Through Data Analysis, 2nd Edition

Chapter 12. On Analyzing Text

This chapter is about the general problem of analyzing security data consisting of text. Text analysis, particularly log and packet payload analysis, is a consistent unstructured task for security analysts. This chapter provides tools, techniques, and a basic workflow for dealing with the problem of semistructured text analysis.

I use the term semistructured to refer to data such as DNS records and logs. This contrasts with unstructured text (text for human consumption, like this book) in that there are well-defined rules for creating the text. With semistructured text, some enterprising developer wrote a series of logical statements and templates for generating every conceivable result. However, in comparison to fully structured data, those logical statements and templates are often opaque to the security analyst.

This chapter is divided into three main sections. The first section discusses text encoding and its impact on security data. The second section discusses basic skills that an analyst should expect to have for processing this data—this is primarily represented as a set of Unix utilities and the corresponding mechanisms in Python. The third section discusses techniques for analyzing and comparing text; these are standard text processing techniques, largely focused on the problem of finding similarity. This section also discusses security-specific text encoding problems: in particular, obfuscation and homoglyphs.

Text Encoding

Encoding refers ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491962831Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Network Security Through Data Analysis, 2nd Edition

by Michael Collins

Chapter 12. On Analyzing Text

Text Encoding

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.